Problem 8

Solution

process '4_rnaseq_gatk_recalibrate' {
  tag "$replicateId" (1)

  input:
      file genome from genome_file (2)
      file index from genome_index_ch (3)
      file dict from genome_dict_ch (4)
      set replicateId, file(bam), file(bai) from splitted_bam_ch (5)
      set file(variants_file), file(variants_file_index) from prepared_vcf_ch (6)

  output:
      set sampleId, file("${replicateId}.final.uniq.bam"), file("${replicateId}.final.uniq.bam.bai") into (final_output_ch, bam_for_ASE_ch) (7)

  script:
  sampleId = replicateId.replaceAll(/[12]$/,'')
  """
  # Indel Realignment and Base Recalibration
  java -jar $GATK -T BaseRecalibrator \
                  --default_platform illumina \
                  -cov ReadGroupCovariate \
                  -cov QualityScoreCovariate \
                  -cov CycleCovariate \
                  -knownSites ${variants_file} \
                  -cov ContextCovariate \
                  -R ${genome} -I ${bam} \
                  --downsampling_type NONE \
                  -nct ${task.cpus} \
                  -o final.rnaseq.grp

  java -jar $GATK -T PrintReads \
                  -R ${genome} -I ${bam} \
                  -BQSR final.rnaseq.grp \
                  -nct ${task.cpus} \
                  -o final.bam

  # Select only unique alignments, no multimaps
  (samtools view -H final.bam; samtools view final.bam| grep -w 'NH:i:1') \
  |samtools view -Sb -  > ${replicateId}.final.uniq.bam (8)

  # Index BAM files
  samtools index ${replicateId}.final.uniq.bam (9)
  """
}

1	`tag` line with the using the replicate id as the tag.
2	the genome fasta file.
3	the genome index from the `genome_index_ch` channel created in the process `1A_prepare_genome_samtools`.
4	the genome dictionary from the `genome_dict_ch` channel created in the process `1B_prepare_genome_picard`.
5	the set containing the split reads from the `splitted_bam_ch` channel created in the process `3_rnaseq_gatk_splitNcigar`.
6	the set containing the filtered/recoded VCF file and the tab index (TBI) file from the `prepared_vcf_ch` channel created in the process `1D_prepare_vcf_file`.
7	the set containing the replicate id, the unique bam file and the unique bam index file which goes into two channels.
8	line specifying the filename of the output bam file <9>