Solution
process '4_rnaseq_gatk_recalibrate' {
tag "$replicateId" (1)
input:
file genome from genome_file (2)
file index from genome_index_ch (3)
file dict from genome_dict_ch (4)
set replicateId, file(bam), file(bai) from splitted_bam_ch (5)
set file(variants_file), file(variants_file_index) from prepared_vcf_ch (6)
output:
set sampleId, file("${replicateId}.final.uniq.bam"), file("${replicateId}.final.uniq.bam.bai") into (final_output_ch, bam_for_ASE_ch) (7)
script:
sampleId = replicateId.replaceAll(/[12]$/,'')
"""
# Indel Realignment and Base Recalibration
java -jar $GATK -T BaseRecalibrator \
--default_platform illumina \
-cov ReadGroupCovariate \
-cov QualityScoreCovariate \
-cov CycleCovariate \
-knownSites ${variants_file} \
-cov ContextCovariate \
-R ${genome} -I ${bam} \
--downsampling_type NONE \
-nct ${task.cpus} \
-o final.rnaseq.grp
java -jar $GATK -T PrintReads \
-R ${genome} -I ${bam} \
-BQSR final.rnaseq.grp \
-nct ${task.cpus} \
-o final.bam
# Select only unique alignments, no multimaps
(samtools view -H final.bam; samtools view final.bam| grep -w 'NH:i:1') \
|samtools view -Sb - > ${replicateId}.final.uniq.bam (8)
# Index BAM files
samtools index ${replicateId}.final.uniq.bam (9)
"""
}
| 1 | tag line with the using the replicate id as the tag. |
| 2 | the genome fasta file. |
| 3 | the genome index from the genome_index_ch channel created in the process 1A_prepare_genome_samtools. |
| 4 | the genome dictionary from the genome_dict_ch channel created in the process 1B_prepare_genome_picard. |
| 5 | the set containing the split reads from the splitted_bam_ch channel created in the process 3_rnaseq_gatk_splitNcigar. |
| 6 | the set containing the filtered/recoded VCF file and the tab index (TBI) file from the prepared_vcf_ch channel created in the process 1D_prepare_vcf_file. |
| 7 | the set containing the replicate id, the unique bam file and the unique bam index file which goes into two channels. |
| 8 | line specifying the filename of the output bam file <9> |