Singleton Correction¶
singleton_correction.py
Function: To correct single reads with its complementary (SSCS/singleton) strand and enable error suppression
- Traditionally, consensus sequences can only be made from 2 or more reads
(Written for Python 3.5.1)
Usage: Python3 singleton_correction.py [–singleton Singleton BAM] [–bedfile BEDFILE]
Arguments:
–singleton SingletonBAM | input singleton BAM file |
–bedfile BEDFILE | Bedfile containing coordinates to subdivide the BAM file (Recommendation: cytoband.txt) |
- Inputs:
- A position-sorted BAM file containing paired-end single reads with barcode identifiers in the header/query name
- A BED file containing coordinates subdividing the entire ref genome for more manageable data processing
- Outputs:
- A BAM file containing paired singletons error corrected by its complementary SSCS - “sscs.correction.bam”
- A BAM file containing paired singletons error corrected by its complementary singleton - “singleton.correction.bam”
- A BAM file containing the remaining singletons that cannot be corrected as its missing a complementary strand - “uncorrected.bam”
- A text file containing summary statistics (Total singletons, Singleton Correction by SSCS, % Singleton Correction by SSCS, Singleton Correction by Singletons, % Singleton Correction by Singletons, Uncorrected Singletons) - “stats.txt” (Stats pended to same stats file as SSCS)
- Concepts:
- Read family: reads that share the same molecular barcode, chr, and start coordinates for Read1 and Read2
- Singleton: single read with no PCR duplicates (family size = 1)