Create a channel from a sample sheet

`fromSamplesheet`

This function validates and converts a samplesheet to a ready-to-use Nextflow channel. This is done using information encoded within a sample sheet schema (see the docs).

The function has one mandatory argument: the name of the parameter which specifies the input samplesheet. The parameter specified must have the format file-path and include additional field schema:

{
  "type": "string",
  "format": "file-path",
  "schema": "assets/foo_schema.json"
}

The path specified in the schema key determines the JSON used for validation of the samplesheet.

When using the .fromSamplesheet channel factory, some additional optional arguments can be used:

parameters_schema: File name for the pipeline parameters schema. (Default: nextflow_schema.json)
skip_duplicate_check: Skip the checking for duplicates. Can also be skipped with the --validationSkipDuplicateCheck parameter. (Default: false)

Channel.fromSamplesheet('input')

Channel.fromSamplesheet(
  'input',
  parameters_schema: 'custom_nextflow_schema.json',
  skip_duplicate_check: false
)

Basic example

In this example, we create a simple channel from a CSV samplesheet.

N E X T F L O W  ~  version 23.04.0
Launching `pipeline/main.nf` [distraught_marconi] DSL2 - revision: 74f697a0d9
[mysample1, input1_R1.fq.gz, input1_R2.fq.gz, forward]
[mysample2, input2_R1.fq.gz, input2_R2.fq.gz, forward]

main.nfsamplesheet.csvnextflow.confignextflow_schema.jsonassets/schema_input.json

include { fromSamplesheet } from 'plugin/nf-validation'

ch_input = Channel.fromSamplesheet("input")

ch_input.view()

sample,fastq_1,fastq_2,strandedness
mysample1,input1_R1.fq.gz,input1_R2.fq.gz,forward
mysample2,input2_R1.fq.gz,input2_R2.fq.gz,forward

plugins {
  id 'nf-validation@0.2.1'
}

params {
  input = "samplesheet.csv"
  output = "results"
}

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/testpipeline/master/nextflow_schema.json",
    "title": "nf-core/testpipeline pipeline parameters",
    "description": "this is a test",
    "type": "object",
    "definitions": {
        "input_output_options": {
            "title": "Input/output options",
            "type": "object",
            "fa_icon": "fas fa-terminal",
            "description": "Define where the pipeline should find input data and save output data.",
            "required": ["input", "outdir"],
            "properties": {
                "input": {
                    "type": "string",
                    "format": "file-path",
                    "mimetype": "text/csv",
                    "schema": "assets/schema_input.json",
                    "pattern": "^\\S+\\.(csv|tsv|yaml)$",
                    "description": "Path to comma-separated file containing information about the samples in the experiment.",
                    "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/testpipeline/usage#samplesheet-input).",
                    "fa_icon": "fas fa-file-csv"
                },
                "outdir": {
                    "type": "string",
                    "format": "directory-path",
                    "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.",
                    "fa_icon": "fas fa-folder-open"
                }
            }
        }
    },
    "allOf": [
        {
            "$ref": "#/definitions/input_output_options"
        }
    ]
}

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id": "https://raw.githubusercontent.com/nf-validation/example/master/assets/schema_input.json",
  "title": "nf-validation example - params.input schema",
  "description": "Schema for the file provided with params.input",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "sample": {
        "type": "string",
        "pattern": "^\\S+$",
        "errorMessage": "Sample name must be provided and cannot contain spaces"
      },
      "fastq_1": {
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$",
        "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
      },
      "fastq_2": {
        "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
        "anyOf": [
          {
            "type": "string",
            "pattern": "^\\S+\\.f(ast)?q\\.gz$"
          },
          {
            "type": "string",
            "maxLength": 0
          }
        ]
      },
      "strandedness": {
        "type": "string",
        "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
        "enum": ["forward", "reverse", "unstranded"]
      }
    },
    "required": ["sample", "fastq_1", "strandedness"]
  }
}

Order of fields

This example demonstrates that the order of columns in the sample sheet file has no effect.

Danger

It is the order of fields in the sample sheet JSON schema which defines the order of items in the channel returned by fromSamplesheet(), not the order of fields in the CSV file.

N E X T F L O W  ~  version 23.04.0
Launching `pipeline/main.nf` [elated_kowalevski] DSL2 - revision: 74f697a0d9
[forward, mysample1, input1_R2.fq.gz, input1_R1.fq.gz]
[forward, mysample2, input2_R2.fq.gz, input2_R1.fq.gz]

samplesheet.csvassets/schema_input.jsonmain.nfnextflow.confignextflow_schema.json

sample,fastq_1,fastq_2,strandedness
mysample1,input1_R1.fq.gz,input1_R2.fq.gz,forward
mysample2,input2_R1.fq.gz,input2_R2.fq.gz,forward

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id": "https://raw.githubusercontent.com/nf-validation/example/master/assets/schema_input.json",
  "title": "nf-validation example - params.input schema",
  "description": "Schema for the file provided with params.input",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "strandedness": {
        "type": "string",
        "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
        "enum": ["forward", "reverse", "unstranded"]
      },
      "sample": {
        "type": "string",
        "pattern": "^\\S+$",
        "errorMessage": "Sample name must be provided and cannot contain spaces"
      },
      "fastq_2": {
        "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
        "anyOf": [
          {
            "type": "string",
            "pattern": "^\\S+\\.f(ast)?q\\.gz$"
          },
          {
            "type": "string",
            "maxLength": 0
          }
        ]
      },
      "fastq_1": {
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$",
        "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
      }
    },
    "required": ["sample", "fastq_1", "strandedness"]
  }
}

include { fromSamplesheet } from 'plugin/nf-validation'

ch_input = Channel.fromSamplesheet("input")

ch_input.view()

plugins {
  id 'nf-validation@0.2.1'
}

params {
  input = "samplesheet.csv"
  output = "results"
}

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/testpipeline/master/nextflow_schema.json",
    "title": "nf-core/testpipeline pipeline parameters",
    "description": "this is a test",
    "type": "object",
    "definitions": {
        "input_output_options": {
            "title": "Input/output options",
            "type": "object",
            "fa_icon": "fas fa-terminal",
            "description": "Define where the pipeline should find input data and save output data.",
            "required": ["input", "outdir"],
            "properties": {
                "input": {
                    "type": "string",
                    "format": "file-path",
                    "mimetype": "text/csv",
                    "schema": "assets/schema_input.json",
                    "pattern": "^\\S+\\.(csv|tsv|yaml)$",
                    "description": "Path to comma-separated file containing information about the samples in the experiment.",
                    "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/testpipeline/usage#samplesheet-input).",
                    "fa_icon": "fas fa-file-csv"
                },
                "outdir": {
                    "type": "string",
                    "format": "directory-path",
                    "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.",
                    "fa_icon": "fas fa-folder-open"
                }
            }
        }
    },
    "allOf": [
        {
            "$ref": "#/definitions/input_output_options"
        }
    ]
}

Channel with meta map

In this example, we use the schema to mark two columns as meta fields. This returns a channel with a meta map.

N E X T F L O W  ~  version 23.04.0
Launching `pipeline/main.nf` [romantic_kare] DSL2 - revision: 74f697a0d9
[[my_sample_id:mysample1, my_strandedness:forward], input1_R1.fq.gz, input1_R2.fq.gz]
[[my_sample_id:mysample2, my_strandedness:forward], input2_R1.fq.gz, input2_R2.fq.gz]

assets/schema_input.jsonmain.nfsamplesheet.csvnextflow.confignextflow_schema.json

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id": "https://raw.githubusercontent.com/nf-validation/example/master/assets/schema_input.json",
  "title": "nf-validation example - params.input schema",
  "description": "Schema for the file provided with params.input",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "sample": {
        "type": "string",
        "pattern": "^\\S+$",
        "errorMessage": "Sample name must be provided and cannot contain spaces",
        "meta": ["my_sample_id"]
      },
      "fastq_1": {
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$",
        "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
      },
      "fastq_2": {
        "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
        "anyOf": [
          {
            "type": "string",
            "pattern": "^\\S+\\.f(ast)?q\\.gz$"
          },
          {
            "type": "string",
            "maxLength": 0
          }
        ]
      },
      "strandedness": {
        "type": "string",
        "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
        "enum": ["forward", "reverse", "unstranded"],
        "meta": ["my_strandedness"]
      }
    },
    "required": ["sample", "fastq_1", "strandedness"]
  }
}

include { fromSamplesheet } from 'plugin/nf-validation'

ch_input = Channel.fromSamplesheet("input")

ch_input.view()

sample,fastq_1,fastq_2,strandedness
mysample1,input1_R1.fq.gz,input1_R2.fq.gz,forward
mysample2,input2_R1.fq.gz,input2_R2.fq.gz,forward

plugins {
  id 'nf-validation@0.2.1'
}

params {
  input = "samplesheet.csv"
  output = "results"
}

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/testpipeline/master/nextflow_schema.json",
    "title": "nf-core/testpipeline pipeline parameters",
    "description": "this is a test",
    "type": "object",
    "definitions": {
        "input_output_options": {
            "title": "Input/output options",
            "type": "object",
            "fa_icon": "fas fa-terminal",
            "description": "Define where the pipeline should find input data and save output data.",
            "required": ["input", "outdir"],
            "properties": {
                "input": {
                    "type": "string",
                    "format": "file-path",
                    "mimetype": "text/csv",
                    "schema": "assets/schema_input.json",
                    "pattern": "^\\S+\\.(csv|tsv|yaml)$",
                    "description": "Path to comma-separated file containing information about the samples in the experiment.",
                    "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/testpipeline/usage#samplesheet-input).",
                    "fa_icon": "fas fa-file-csv"
                },
                "outdir": {
                    "type": "string",
                    "format": "directory-path",
                    "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.",
                    "fa_icon": "fas fa-folder-open"
                }
            }
        }
    },
    "allOf": [
        {
            "$ref": "#/definitions/input_output_options"
        }
    ]
}