Configuration
Most of the basic configuration is in a nomad closure in the nextflow.config file
i.e.
nomad{
client{
address = "http://localhost:4646"
token = "YOUR_NOMAD_TOKEN"
connectionTimeout = 6000
readTimeout = 6000
writeTimeout = 6000
pollInterval = '1s'
submitThrottle = '100ms'
retryConfig = {
delay = 500
maxDelay = 90
maxAttempts = 10
jitter = 0.25
}
}
jobs{
namespace = 'nf-nomad'
deleteOnCompletion = false
cleanup = 'onSuccess' // always | never | onSuccess
privileged = true
cpuMode = 'cores' // or 'cpu'
acceleratorAutoDevice = true // map Nextflow accelerator directive to Nomad resources.device
acceleratorDeviceName = 'nvidia/gpu'
volumes = [
{ type "host" name "scratchdir" },
{ type "csi" name "nextflow-fs-volume" },
{ type "csi" name "nextflow-fs-volume" path "/var/data" readOnly true}
]
constraints = {
node {
unique = [ name: 'nomad01' ]
}
}
spreads = {
spread = [ name:'node.datacenter', weight: 50 ]
}
secrets = [enabled: true]
// Fail jobs that cannot be placed due to insufficient resources
failOnPlacementFailure = true
placementFailureTimeout = '2m' // Wait 2 minutes before failing
}
}
Client configuration
-
address: The URL for the nomad server node. -
token: If the cluster is protected you must to provide a token. -
connectionTimeout: The maximum time to wait before giving up on establishing a connection with the cluster (default6000ms). -
readTimeout: The maximum time to wait before indicating inability to read from the connection (default6000ms). -
writeTimeout: The maximum time to wait before indicating inability to write to the connection (default6000ms). -
pollInterval: Frequency for polling Nomad task state updates (default1s). Can also be set viaNF_NOMAD_POLL_INTERVAL. -
submitThrottle: Minimum delay between Nomad job submissions (default0s). Can be set viaNF_NOMAD_SUBMIT_THROTTLEto reduce API bursts. -
retryConfig.delay: Delay when retrying failed API requests (default: 500ms). -
retryConfig.jitter: Jitter value when retrying failed API requests (default: 0.25) -
retryConfig.maxAttempts: Max attempts when retrying failed API requests (default: 10) -
retryConfig.maxDelay: Max delay when retrying failed API requests (default: 90s) -
Retries apply to transient failures (
408,429,5xx, IO errors, and timeouts). -
retryConfigandsubmitThrottleare complementary:retryConfigapplies after request failures, whilesubmitThrottleproactively spaces out new submissions.
Jobs configuration
-
deleteOnCompletion: A boolean indicating if the job will be removed once completed -
cleanup: Cleanup policy for completed jobs. Allowed values arealways,never, andonSuccess. If omitted, it derives fromdeleteOnCompletionfor backward compatibility. -
datacenters: A list of datacenters for the job submission. -
region: The region for job submission. -
namespace: The namespace to be used for all Nextflow jobs. -
privileged: Run Docker tasks in privileged mode (defaulttrue). -
cpuMode: How taskcpusmaps to Nomad resources when process-level overrides are not set. Usecores(default) orcpu. -
acceleratorAutoDevice: Whentrue(default), map Nextflowacceleratorrequests to Nomadresources.deviceautomatically. -
acceleratorDeviceName: Device name to use for automatic accelerator mapping (defaultnvidia/gpu). -
volumeSpec: The volumes which should be accessible to the jobs. -
affinitiesSpec: The affinities which should be attached to the job spec. -
constraintsSpec: The constraints which should be attached to the job spec. -
spreadsSpec: The spreads spec which should be used with all generated jobs. -
rescheduleAttempts: Number of rescheduling (to a different node) attempts for the generated jobs. -
restartAttempts: Number of restart (on the same node) attempts for the generated jobs. -
failOnPlacementFailure: A boolean flag to automatically fail jobs that cannot be placed on any node due to insufficient resources (defaultfalse). When enabled, jobs that remain unscheduled (have no node assignment) beyond theplacementFailureTimeoutthreshold will be marked as failed instead of indefinitely waiting. -
placementFailureTimeout: The time to wait before considering a job as failed due to placement failure (default60s). Supports Nextflow duration format:20s,2m,5m,1h,2d, etc. Can also be set via theNF_NOMAD_PLACEMENT_FAILURE_TIMEOUTenvironment variable. -
Failed tasks include enriched Nomad state messages; memory/OOM signals from Nomad task events are surfaced explicitly when available.
-
When debug JSON dumping is enabled (
nomad.debug.jsonornomad.debug.path), dumped job JSON files include Nomad metadata fields:nomad_job_id,nomad_alloc_id,nomad_node_id,nomad_node_name, andnomad_datacenter. -
Failure messages include Nomad inspection hints (job/allocation/node identifiers and allocation API URL when available).
-
secretOpts: The configuration for Nomad Secret Store. -
dockerVolume, DEPRECATED -
affinitySpec, DEPRECATED -
constraintSpec, DEPRECATED === Process directives
The plugin supports process-level Nomad directives in two forms:
-
Legacy directives:
-
datacenters -
constraints -
secret -
spread -
Preferred map-based directive:
-
nomadOptions
nomadOptions accepts a map and currently supports:
-
datacenters: list of strings -
namespace: string namespace override for the process -
constraints: closure using the existing constraints DSL -
secrets: list of secret names -
secretsPath: per-process Nomad secret path override (string) -
spread: spread map (name,weight, optionaltargets) -
affinity: affinity map (attribute, optionaloperator,value, optionalweight) -
volumes: list of safe volume maps (type,name, optionalpath, optionalworkDir, optionalreadOnly) -
priority: priority alias or number (critical,high,normal,low,min, or0..100) -
meta: map of metadata merged with globalnomad.jobs.meta(process keys override) -
shutdownDelay: duration string (e.g.15s,2m) -
failures: map with optionalrestartandreschedulemaps -
resources: map of resource options -
memoryMax: memory limit for Nomadmemory_max(defaults to taskmemorywhen not set) -
cpu: Nomad CPU shares (MHz) override -
cores: Nomad CPU cores override -
device: list of Nomad requested devices (e.g. GPUs) -
when
resources.deviceis not set andacceleratorAutoDeviceis enabled, Nextflowacceleratoris mapped automatically
process {
withName: sayHello {
nomadOptions = [
datacenters: ['dc1', 'dc2'],
namespace: 'bio',
constraints: {
node {
unique = [name: params.RUN_IN_NODE]
}
},
affinity: [attribute: '${meta.workload}', operator: '=', value: 'batch', weight: 25],
meta: [owner: 'team-x', step: 'align'],
shutdownDelay: '15s',
failures: [
restart: [attempts: 1, delay: '5s', mode: 'fail'],
reschedule: [attempts: 2, delay: '10s']
],
secretsPath: 'secret/projects/team-x',
secrets: ['MY_ACCESS_KEY', 'MY_SECRET_KEY'],
spread: [name: 'node.datacenter', weight: 50, targets: ['us-east1': 70, 'us-east2': 30]],
priority: 'high',
volumes: [[type: 'host', name: 'ref-data', path: '/ref', readOnly: true]],
resources: [memoryMax: '64 GB', cores: 4, device: [[name: 'nvidia/gpu', count: 1]]]
]
}
}
When both nomadOptions.<key> and a legacy directive are set for the same process, nomadOptions.<key> takes precedence for that key only.
For list-valued options such as datacenters, global and process values are concatenated in order (global first, then process) and deduplicated.
nomadOptions values are validated strictly before submission; invalid shapes or conflicting options (for example, setting both resources.cpu and resources.cores) fail fast.
When nomadOptions.secretsPath is set, it overrides nomad.jobs.secrets.path for that process only.
When global and process volume specs are merged, only one workDir volume is allowed and readOnly flags are preserved on generated task mounts.
Nomad task failures are surfaced as recoverable process errors so Nextflow errorStrategy and maxRetries policies remain in control.