Problem 7

Solution

process '3_rnaseq_gatk_splitNcigar' {
  tag "$replicateId" (1)

  input:
      file genome from genome_file  (2)
      file index from genome_index_ch  (3)
      file genome_dict from genome_dict_ch  (4)
      set replicateId, file(bam), file(bai) from aligned_bam_ch  (5)

  output:
      set replicateId, file('split.bam'), file('split.bai') into splitted_bam_ch  (6)

  script:
  """
  # SplitNCigarReads and reassign mapping qualities
  java -jar $GATK -T SplitNCigarReads \
                  -R $genome -I $bam \(7)
                  -o split.bam \(8)
                  -rf ReassignOneMappingQuality \
                  -RMQF 255 -RMQT 60 \
                  -U ALLOW_N_CIGAR_READS \
                  --fix_misencoded_quality_scores

  """
}

1	`tag` line with the using the replicate id as the tag.
2	the genome fasta file
3	the genome index from the `genome_index_ch` channel created in the process `1A_prepare_genome_samtools`
4	the genome dictionary from the `genome_dict_ch` channel created in the process `1B_prepare_genome_picard`
5	the set containing the aligned reads from the `aligned_bam_ch` channel created in the process `2 _rnaseq_mapping_star`
6	a set containing the sample id, the split bam file and the split bam index
7	specifies the input file names $genome and $bam to GATK
8	specifies the output file names to GATK