Create a list from a sample sheet
samplesheetToList()
This function validates and converts a sample sheet to a Groovy list. This is done using information encoded within a sample sheet schema (see the docs).
The function has two required arguments:
- The path to the samplesheet
- The path to the JSON schema file corresponding to the samplesheet.
These can be either a string with the relative path (from the root of the pipeline) or a file object of the schema.
Note
All data points in the CSV and TSV samplesheets will be converted to their derived type. (e.g. "true"
will be converted to the Boolean true
and "2"
will be converted to the Integer 2
). You can still convert these types back to a String if this is not the expected behaviour with .map { val -> val.toString() }
This function can be used together with existing channel factories/operators to create one channel entry per samplesheet entry.
Use as a channel factory
The function can be used with the .fromList
channel factory to generate a queue channel:
Note
This will mimic the fromSamplesheet
channel factory, found in the previous nf-validation.
Use as a channel operator
Alternatively, the function can be used with the .flatMap
channel operator to create a channel from samplesheet paths that are already in a channel:
Basic example
In this example, we create a simple channel from a CSV sample sheet.
N E X T F L O W ~ version 23.04.0
Launching `pipeline/main.nf` [distraught_marconi] DSL2 - revision: 74f697a0d9
[mysample1, input1_R1.fq.gz, input1_R2.fq.gz, forward]
[mysample2, input2_R1.fq.gz, input2_R2.fq.gz, forward]
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-schema/example/master/assets/schema_input.json",
"title": "nf-schema example - params.input schema",
"description": "Schema for the file provided with params.input",
"type": "array",
"items": {
"type": "object",
"properties": {
"sample": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces"
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$"
},
"strandedness": {
"type": "string",
"errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
"enum": ["forward", "reverse", "unstranded"]
}
},
"required": ["sample", "fastq_1", "strandedness"]
}
}
Order of fields
This example demonstrates that the order of columns in the sample sheet file has no effect.
Danger
It is the order of fields in the sample sheet JSON schema which defines the order of items in the channel returned by samplesheetToList()
, not the order of fields in the sample sheet file.
N E X T F L O W ~ version 23.04.0
Launching `pipeline/main.nf` [elated_kowalevski] DSL2 - revision: 74f697a0d9
[forward, mysample1, input1_R2.fq.gz, input1_R1.fq.gz]
[forward, mysample2, input2_R2.fq.gz, input2_R1.fq.gz]
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-schema/example/master/assets/schema_input.json",
"title": "nf-schema example - params.input schema",
"description": "Schema for the file provided with params.input",
"type": "array",
"items": {
"type": "object",
"properties": {
"strandedness": {
"type": "string",
"errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
"enum": ["forward", "reverse", "unstranded"]
},
"sample": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces"
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$"
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
}
},
"required": ["sample", "fastq_1", "strandedness"]
}
}
Channel with meta map
In this example, we use the schema to mark two columns as meta fields. This returns a channel with a meta map.
N E X T F L O W ~ version 23.04.0
Launching `pipeline/main.nf` [romantic_kare] DSL2 - revision: 74f697a0d9
[[my_sample_id:mysample1, my_strandedness:forward], input1_R1.fq.gz, input1_R2.fq.gz]
[[my_sample_id:mysample2, my_strandedness:forward], input2_R1.fq.gz, input2_R2.fq.gz]
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-schema/example/master/assets/schema_input.json",
"title": "nf-schema example - params.input schema",
"description": "Schema for the file provided with params.input",
"type": "array",
"items": {
"type": "object",
"properties": {
"sample": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces",
"meta": ["my_sample_id"]
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$"
},
"strandedness": {
"type": "string",
"errorMessage": "Strandedness must be provided and be one of 'forward', 'reverse' or 'unstranded'",
"enum": ["forward", "reverse", "unstranded"],
"meta": ["my_strandedness"]
}
},
"required": ["sample", "fastq_1", "strandedness"]
}
}