Create a list from a sample sheet

`samplesheetToList()`

This function validates and converts a sample sheet to a Groovy list. This is done using information encoded within a sample sheet schema (see the docs).

The function has two required arguments:

The path to the samplesheet
The path to the JSON schema file corresponding to the samplesheet.

These can be either a string with the relative path (from the root of the pipeline) or a file object of the schema.

samplesheetToList("path/to/samplesheet", "path/to/json/schema")

Note

All data points in the CSV and TSV samplesheets will be converted to their derived type. (e.g. "true" will be converted to the Boolean true and "2" will be converted to the Integer 2). You can still convert these types back to a String if this is not the expected behaviour with .map { val -> val.toString() }

This function can be used together with existing channel factories/operators to create one channel entry per samplesheet entry.

Use as a channel factory

The function can be used with the .fromList channel factory to generate a queue channel:

Channel.fromList(samplesheetToList("path/to/samplesheet", "path/to/json/schema"))

Note

This will mimic the fromSamplesheet channel factory, found in the previous nf-validation.

Use as a channel operator

Alternatively, the function can be used with the .flatMap channel operator to create a channel from samplesheet paths that are already in a channel:

Channel.of("path/to/samplesheet").flatMap { samplesheetToList(it, "path/to/json/schema") }

Basic example

In this example, we create a simple channel from a CSV sample sheet.

N E X T F L O W  ~  version 23.04.0
Launching `pipeline/main.nf` [distraught_marconi] DSL2 - revision: 74f697a0d9
[mysample1, input1_R1.fq.gz, input1_R2.fq.gz, forward]
[mysample2, input2_R1.fq.gz, input2_R2.fq.gz, forward]

main.nfsamplesheet.csvnextflow.configassets/schema_input.json

include { samplesheetToList } from 'plugin/nf-schema'

ch_input = Channel.fromList(samplesheetToList(params.input, "assets/schema_input.json"))

ch_input.view()

sample,fastq_1,fastq_2,strandedness
mysample1,input1_R1.fq.gz,input1_R2.fq.gz,forward
mysample2,input2_R1.fq.gz,input2_R2.fq.gz,forward

plugins {
  id 'nf-schema@2.0.0'
}

params {
  input = "samplesheet.csv"
  output = "results"
}

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://raw.githubusercontent.com/nf-schema/example/master/assets/schema_input.json",
  "title": "nf-schema example - params.input schema",
  "description": "Schema for the file provided with params.input",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "sample": {
        "type": "string",
        "pattern": "^\\S+$",
        "errorMessage": "Sample name must be provided and cannot contain spaces"
      },
      "fastq_1": {
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$",
        "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
      },
      "fastq_2": {
        "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$"
      },
      "strandedness": {
        "type": "string",
        "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
        "enum": ["forward", "reverse", "unstranded"]
      }
    },
    "required": ["sample", "fastq_1", "strandedness"]
  }
}

Order of fields

This example demonstrates that the order of columns in the sample sheet file has no effect.

Danger

It is the order of fields in the sample sheet JSON schema which defines the order of items in the channel returned by samplesheetToList(), not the order of fields in the sample sheet file.

N E X T F L O W  ~  version 23.04.0
Launching `pipeline/main.nf` [elated_kowalevski] DSL2 - revision: 74f697a0d9
[forward, mysample1, input1_R2.fq.gz, input1_R1.fq.gz]
[forward, mysample2, input2_R2.fq.gz, input2_R1.fq.gz]

samplesheet.csvassets/schema_input.jsonmain.nfnextflow.config

sample,fastq_1,fastq_2,strandedness
mysample1,input1_R1.fq.gz,input1_R2.fq.gz,forward
mysample2,input2_R1.fq.gz,input2_R2.fq.gz,forward

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://raw.githubusercontent.com/nf-schema/example/master/assets/schema_input.json",
  "title": "nf-schema example - params.input schema",
  "description": "Schema for the file provided with params.input",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "strandedness": {
        "type": "string",
        "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
        "enum": ["forward", "reverse", "unstranded"]
      },
      "sample": {
        "type": "string",
        "pattern": "^\\S+$",
        "errorMessage": "Sample name must be provided and cannot contain spaces"
      },
      "fastq_2": {
        "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$"
      },
      "fastq_1": {
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$",
        "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
      }
    },
    "required": ["sample", "fastq_1", "strandedness"]
  }
}

include { samplesheetToList } from 'plugin/nf-schema'

ch_input = Channel.fromList(samplesheetToList(params.input, "assets/schema_input.json"))

ch_input.view()

plugins {
  id 'nf-schema@2.0.0'
}

params {
  input = "samplesheet.csv"
  output = "results"
}

Channel with meta map

In this example, we use the schema to mark two columns as meta fields. This returns a channel with a meta map.

N E X T F L O W  ~  version 23.04.0
Launching `pipeline/main.nf` [romantic_kare] DSL2 - revision: 74f697a0d9
[[my_sample_id:mysample1, my_strandedness:forward], input1_R1.fq.gz, input1_R2.fq.gz]
[[my_sample_id:mysample2, my_strandedness:forward], input2_R1.fq.gz, input2_R2.fq.gz]

assets/schema_input.jsonmain.nfsamplesheet.csvnextflow.config

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://raw.githubusercontent.com/nf-schema/example/master/assets/schema_input.json",
  "title": "nf-schema example - params.input schema",
  "description": "Schema for the file provided with params.input",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "sample": {
        "type": "string",
        "pattern": "^\\S+$",
        "errorMessage": "Sample name must be provided and cannot contain spaces",
        "meta": ["my_sample_id"]
      },
      "fastq_1": {
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$",
        "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
      },
      "fastq_2": {
        "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
        "type": "string",
        "pattern": "^\\S+\\.f(ast)?q\\.gz$"
      },
      "strandedness": {
        "type": "string",
        "errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
        "enum": ["forward", "reverse", "unstranded"],
        "meta": ["my_strandedness"]
      }
    },
    "required": ["sample", "fastq_1", "strandedness"]
  }
}

include { samplesheetToList } from 'plugin/nf-schema'

ch_input = Channel.fromList(samplesheetToList(params.input, "assets/schema_input.json"))

ch_input.view()

sample,fastq_1,fastq_2,strandedness
mysample1,input1_R1.fq.gz,input1_R2.fq.gz,forward
mysample2,input2_R1.fq.gz,input2_R2.fq.gz,forward

plugins {
  id 'nf-schema@2.0.0'
}

params {
  input = "samplesheet.csv"
  output = "results"
}