Skip to content

Process per file pairs

Problem

You need to process the files in a directory, grouping them by pairs.

Solution

Use the Channel.fromFilePairs method to create a channel that emits file pairs matching a glob pattern. The pattern must match a common prefix in the paired file names.

The matching files are emitted as tuples in which the first element is the grouping key of the matching files and the second element is the file pair itself.

Code

process foo {
  debug true

  input:
  tuple val(sampleId), file(reads)

  script:
  """
  echo your_command --sample $sampleId --reads $reads
  """
}

workflow {
  Channel.fromFilePairs("$baseDir/data/reads/*_{1,2}.fq.gz", checkIfExists:true) \
    | foo
}

Run it

nextflow run nextflow-io/patterns/process-per-file-pairs.nf

Custom grouping strategy

When necessary, it is possible to define a custom grouping strategy. A common use case is for alignment BAM files (sample1.bam) that come along with their index file. The difficulty is that the index is sometimes called sample1.bai and sometimes sample1.bam.bai depending on the software used. The following example can accommodate both cases.

process foo {
  debug true
  tag "$sampleId"

  input:
  tuple val(sampleId), file(bam)

  script:
  """
  echo your_command --sample ${sampleId} --bam ${sampleId}.bam
  """
}

workflow {
  Channel.fromFilePairs("$baseDir/data/alignment/*.{bam,bai}", checkIfExists:true) { file -> file.name.replaceAll(/.bam|.bai$/,'') } \
    | foo
}

Run it

nextflow run nextflow-io/patterns/process-per-file-pairs-custom.nf