“Applying Machine Learning and the “unreasonable effectiveness of data” to Structural Variant Genotyping in Whole Genome Sequencing”
Structural variants (SVs), large (>50bp) insertions or deletions in the DNA sequence, can cause numerous genetic diseases. However, due to their larger size, SVs are difficult to directly detect and accurately genotype (determine whether a person has 0, 1, or 2 copies of an SV) with widely used genome sequencers. Existing SV genotypers can only partially account for the sample, SV- and analysis pipeline-specific biases that reduce genotyping accuracy. Instead of trying to define a model for those complex and interconnected effects, we use simulation to generate the data we should expect. This talk will present a series of simulation-based, machine learning tools for improving SV genotyping developed with Middlebury Computer Science students that apply what Halevy et al. called the “unreasonable effectiveness of data” to genome analysis.
Vaccinations and masks required.
- Sponsored by:
- Academic Affairs