Skip to content

Store outputs matching a glob pattern

Problem

A task in your workflow creates many output files that are required by a downstream task. You want to store some of those files into separate directories depending on the file name.

Solution

Use two or more publishDir directives to publish the output files into separate paths. For each directive specify a different glob pattern using the pattern option to store into each directory only the files that match the provided pattern.

Code

params.reads = "$baseDir/data/reads/*_{1,2}.fq.gz"
params.outdir = 'my-results'

process foo {
  publishDir "$params.outdir/$sampleId/counts", pattern: "*_counts.txt"
  publishDir "$params.outdir/$sampleId/outlooks", pattern: '*_outlook.txt'
  publishDir "$params.outdir/$sampleId/", pattern: '*.fq'

  input: 
    tuple val(sampleId), file('sample1.fq.gz'), file('sample2.fq.gz')
  output: 
    path "*"
  script:
    """
    < sample1.fq.gz zcat > sample1.fq
    < sample2.fq.gz zcat > sample2.fq

    awk '{s++}END{print s/4}' sample1.fq > sample1_counts.txt
    awk '{s++}END{print s/4}' sample2.fq > sample2_counts.txt

    head -n 50 sample1.fq > sample1_outlook.txt
    head -n 50 sample2.fq > sample2_outlook.txt
    """
}

workflow {
  Channel.fromFilePairs(params.reads, checkIfExists: true, flat: true) \
    | foo
}

Run it

Run the script with the following command:

nextflow run nextflow-io/patterns/publish-matching-glob.nf