Skip to content

Process per file chunk

Problem

You need to split one or more input files into chunks and execute a task for each of them.

Solution

Use the splitText operator to split a file into chunks of a given size. Then use the resulting channel as input for the process implementing your task.

Warning

Chunks are kept in memory by default. When splitting big files, specify the parameter file: true to save the chunks into files. See the documentation for details.

Splitter for specific file formats are available, e.g. splitFasta and splitFastq.

Code

params.infile = "$baseDir/data/poem.txt"
params.size = 5

process foo {
  debug true
  input: 
  file x

  script:
  """
  rev $x | rev
  """
}

workflow {
  Channel.fromPath(params.infile) \
    | splitText(by: params.size) \
    | foo
}

Run it

Use the the following command to execute the example:

nextflow run nextflow-io/patterns/process-per-file-chunk.nf