Tech Talk: Achieve sensitive and precise CNV calling with SOPHiA DDM™ Platform

Published on 13/05/22
Tags: ,
Learn about the SOPHiA DDM™ Platform’s CNV detection algorithm from our Senior Algorithm Researcher, Bita Khalili.
Home breadcrumb-arrow Tech Talk: Achieve sensitive and precise CNV calling with SOPHiA DDM™ Platform
Learn about the SOPHiA DDM™ Platform’s CNV detection algorithm from our Senior Algorithm Researcher, Bita Khalili.

Bita Khalili is a Senior Algorithm Researcher in our SOPHiA GENETICS Data Science team. She joined the team after completing her PhD in Physics and a post-doctoral research position in Bioinformatics. For the last two years, Bita has been analyzing NGS data at SOPHiA GENETICS and developing copy number variation (CNV) detection modules.

We invite you to spend a few moments with Bita to learn about the challenges associated with CNV detection and how SOPHiA GENETICS' CNV detection algorithm was developed to overcome these challenges.

Why is CNV detection important when analyzing next-generation sequencing data?

Next-generation sequencing (NGS) is a high-throughput technique that generates high-resolution genomic data which allows for simultaneous detection of many genomic variants, such as SNVs, Indels, and CNVs. CNVs are a structural variation in which DNA segments of one kilobase or larger are present at a variable copy number (duplications or deletions) compared to a reference genome. They have clinical and diagnostic relevance as they have been associated with cancers and rare genetic disorders. Although microarray (or SNP-array) comparative genomic hybridization (aCGH) and multiplex ligation-dependent probe amplification (MLPA) are the gold standards for CNV detection, neither can detect small variations such as SNVs and Indels. The decreasing cost of NGS and the ability to simultaneously detect multiple genomic alterations in a single run have encouraged the widespread use of NGS for CNV detection.

Why are CNVs generally difficult to detect using NGS?

CNVs are challenging to detect via targeted capture because the relationship between sequencing depth and copy number is affected by many sources of bias, e.g., GC content and target region length, capture efficiency, amplification efficiency, DNA concentration, hybridization temperature, nature of capture, batch effects, and so on. These biases result in coverage heterogeneity, even for diploid regions (copy number of 2) and must be accounted for to accurately infer copy number from coverage data.

What challenges are associated with CNV detection in exome data?

On top of overcoming the biases mentioned above, when analyzing the human exome we have the cumulative challenge of sequencing only the protein-coding regions (exons). This results in sparse coverage, as the targeted regions only cover about 1% of the whole genome. Lack of coverage across the entire genomic profile causes us to miss most breakpoints, leaving read depth as the only available information source for CNV detection. Other challenges with detecting CNVs in exome data include the presence of many polymorphic regions for which the normal copy number is already higher or lower than two, and the presence of homologous regions, which is problematic for short read alignment.

How are CNVs detected using the SOPHiA DDM™ Platform?

CNV analysis by SOPHiA DDM™ Platform is performed based on coverage analysis of targeted regions. Our CNV algorithm automatically selects reference samples among the samples within the same run to perform normalization. We apply a double normalization to account for both sample-specific and region-specific biases. CNV detection is performed by using a hidden-Markov-model algorithm to find CNVs spanning adjacent regions. Additionally, the algorithm provides quality measures for each sample based on the residual noise.

CNV detection by SOPHiA DDM™ Platform

What is the reasoning behind SOPHiA GENETICS’ approach?

Our normalization approach corrects for read-depth variations among regions by leveraging information from different samples in the same run. Assuming that all samples are processed in parallel, the double-normalization step corrects for all sources of targeted sequencing bias mentioned earlier. We also use our knowledge of the genome to curate target regions for each specific exome panel so that regions that would be problematic for our CNV detection algorithm are excluded, e.g., regions with systematically low coverage, high noise, or polymorphic or homologous regions.

What parameters does the exome sequencing panel need to achieve for good quality results?

Datasets with high coverage and low capture bias achieve high-quality results.

What resolution of CNVs can be achieved?

It depends on the exome panel, but with high-quality panels (good probe design) and deep sequencing depth (~600x), we can achieve even single-exon resolution.

What sets SOPHiA GENETICS’ CNV-calling algorithm apart from others?

Four key features set the SOPHiA GENETICS CNV-calling algorithm apart from others. The algorithm…

  1. efficiently normalizes coverage without relying on predefined parameters
  2. uses the Hidden-Markov-Model with optimized parameters to call CNVs while considering CNV frequency and length
  3. provides quality measures for each sample
  4. curates target regions for each exome panel by excluding regions not appropriate for our CNV detection algorithm.

These four features ensure that we achieve good sensitivity and precision in CNV calling with the SOPHiA DDM™ Platform for Rare and Inherited Diseases, including inherited forms of cancer.

Sign Up To Our Newsletter

Enter your email address to join our mailing list and receive the latest news and updates from SOPHiA GENETICS. You can unsubscribe by using the link integrated with the communication at any time. By clicking to “sign up”, you agree that SOPHiA GENETICS will process your data in accordance with its Privacy Policy.

Related Posts

SOPHiA GENETICS products are for Research Use Only and not for use in diagnostic procedures unless specified otherwise.

SOPHiA DDM™ Dx Hereditary Cancer Solution, SOPHiA DDM™ Dx RNAtarget Oncology Solution and SOPHiA DDM™ Dx Homologous Recombination Deficiency Solution are available as CE-IVD products for In Vitro Diagnostic Use in the European Economic Area (EEA), the United Kingdom and Switzerland. SOPHiA DDM™ Dx Myeloid Solution and SOPHiA DDM™ Dx Solid Tumor Solution are available as CE-IVD products for In Vitro Diagnostic Use in the EEA, the United Kingdom, Switzerland, and Israel. Information about products that may or may not be available in different countries and if applicable, may or may not have received approval or market clearance by a governmental regulatory body for different indications for use. Please contact us at [email protected] to obtain the appropriate product information for your country of residence.

All third-party trademarks listed by SOPHiA GENETICS remain the property of their respective owners. Unless specifically identified as such, SOPHiA GENETICS’ use of third-party trademarks does not indicate any relationship, sponsorship, or endorsement between SOPHiA GENETICS and the owners of these trademarks. Any references by SOPHiA GENETICS to third-party trademarks is to identify the corresponding third-party goods and/or services and shall be considered nominative fair use under the trademark law.

SOPHiA DDM™ Overview
Unlocking Insights, Transforming Healthcare
Learn About SOPHiA DDM™ 
SOPHiA DDM™ for Genomics

Oncology 

Rare and Inherited Disorders

Add-On Modules

SOPHiA DDM™ for Radiomics
Unlock entirely novel insights from your radiology images
Learn About SOPHiA DDM™ for Radiomics 
SOPHiA DDM™ for Multimodal
Explore new frontiers in biology and disease through novel insights
Learn About SOPHiA DDM™ for Multimodal
Professional Services
Accelerate breakthroughs with our tailored enablement services
Learn About our Professional Services