Sample sheet schema specification
Sample sheet schema files are used by the nf-validation plugin for validation of sample sheet contents and type conversion / channel generation.
The Nextflow schema syntax is based on the JSON schema standard. You can find more information about JSON Schema here:
- Official docs: https://json-schema.org
- Excellent "Understanding JSON Schema" docs: https://json-schema.org/understanding-json-schema
Schema structure
Validation by the plugin works by parsing the supplied file contents into a groovy object, then passing this to the JSON schema validation library. As such, the structure of the schema must match the structure of the parsed file.
Typically, sample sheets are CSV files, with fields represented as columns and samples as rows. TSV and simple unnested YAML files are also supported by the plugin.
Warning
Nested YAML files can be validated with the validateParameters()
function, but cannot be converted to a channel with .fromSamplesheet()
.
In this case, the parsed object will be an array
(see JSON schema docs).
The array type is associated with an items
key which in our case contains a single object
.
The object has properties
, where the keys must match the headers of the CSV file.
So, for CSV sample sheets, the top-level schema should look something like this:
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "array",
"items": {
"type": "object",
"properties": {
"field_1": { "type": "string" },
"field_2": { "type": "string" }
}
}
}
If your sample sheet has a different format (for example, a simple YAML file), you will need to build your schema to match the parsed structure.
Properties
Every array object will contain keys for each field.
Each field should be described as an element in the object properties
section.
The keys of each property must match the header text used in the sample sheet.
Fields that are present in the sample sheet, but not in the schema will be ignored and produce a warning.
Tip
The order of columns in the sample sheet is not relevant, as long as the header text matches.
Warning
The order of properties in the schema is important.
This order defines the order of output channel properties when using the fromSamplesheet
channel factory.
Common keys
The majority of schema keys for sample sheet schema validation are identical to the Nextflow schema.
For example: type
, pattern
, format
, errorMessage
, exists
and so on.
Please refer to the Nextflow schema specification docs for details.
Tip
Sample sheets are commonly used to define input file paths.
Be sure to set "type": "string"
and "format": "file-path"
for these properties,
so that nf-validation correctly returns this sample sheet field as a Nextflow.file
object.
When using the file-path-pattern
format for a globbing pattern, a list will be created with all files found by the globbing pattern. See here for more information.
Sample sheet keys
Below are the properties that are specific to sample sheet schema. These exist in addition to those described in the Nextflow schema specification.
meta
Type: List
The current field will be considered a meta value when this parameter is present. This parameter should contain a list of the meta fields to assign this value to. The default is no meta for each field.
For example:
will convert the field
value to a meta value, resulting in the channel [[id:value, sample:value]...]
See here for an example in the sample sheet.
unique
Type: Boolean
or List
Whether or not the field should contain a unique value over the entire sample sheet.
Default: false
- Can be
true
, in which case the value for this field should be unique for all samples in the sample sheet. - Can be supplied with a list of field names, containing other field names that should be unique in combination with the current field.
Example
Consider the following example:
field1
needs to be unique in this example. field2
needs to be unique in combination with field1
. So for a sample sheet like this:
..both checks will fail.
field1
isn't unique sincevalue1
has been found more than once.field2
isn't unique in combination withfield1
because thevalue1,value2
combination has been found more than once.
See schema_input.json#L48-L55
for an example in one of the plugin test-fixture sample sheets.
deprecated
Type: Boolean
A boolean variable stating that the field is deprecated and will be removed in the nearby future. This will throw a warning to the user that the current field is deprecated. The default value is false
.
Example:
will show a warning stating that the use of field
is deprecated:
The 'field' field is deprecated and
will no longer be used in the future.
Please check the official documentation
of the pipeline for more information.
dependentRequired
Type: List
- See JSON Schema docs
A list containing names of other fields. The validator will check if these fields are filled in and throw an error if they aren't, but only when the field dependentRequired
belongs to is filled in.
Example
will check if field2
is given when field1
has a value. So for example:
- The first row will pass the check because both fields are set.
- The second row will fail because
field1
is set, butfield2
isn't andfield1
is dependent onfield2
. - The third row will pass the check because
field1
isn't set.
See here for an example in the sample sheet.