Single-strand consensus sequences (SSCS)¶
SSCS_maker.py
Function: To generate single strand consensus sequences for strand based error suppression.
- Consensus sequence from most common base with quality score >= Q30 and greater than <cutoff> representation
- Consensus quality score from addition of quality scores (i.e. product of error probabilities)
(Written for Python 3.5.1)
- Usage:
- python3 SSCS_maker.py [–cutoff CUTOFF] [–infile INFILE] [–outfile OUTFILE] [–bedfile BEDFILE]
Arguments:
–cutoff CUTOFF |
|
–infile INFILE | Input BAM file |
–outfile OUTFILE | Output BAM file |
–bedfile BEDFILE | Bedfile containing coordinates to subdivide the BAM file (Recommendation: cytoband.txt) |
- Inputs:
- A position-sorted BAM file containing paired-end reads with duplex barcode in the header
- A BED file containing coordinates subdividing the entire ref genome for more manageable data processing
- Outputs:
- A SSCS BAM file containing paired single stranded consensus sequences - “sscs.bam”
- A singleton BAM file containing single reads - “singleton.bam”
- A bad read BAM file containing unpaired, unmapped, and multiple mapping reads - “badReads.bam”
- A text file containing summary statistics (Total reads, Unmmaped reads, Secondary/Supplementary reads, SSCS reads, and singletons) - “stats.txt”
- A tag family size distribution plot (x-axis: family size, y-axis: number of reads) - “tag_fam_size.png”
- A text file tracking the time to complete each genomic region (based on bed file) - “time_tracker.txt”
- Concepts:
- Read family: reads that share the same molecular barcode, genome coordinates for Read1 and Read2, cigar string, strand, flag, and read number
- Singleton: a read family containing only one member (a single read)