Bita Khalili is a Senior Algorithm Researcher in our SOPHiA GENETICS Data Science team. She joined the team after completing her PhD in Physics and a post-doctoral research position in Bioinformatics. For the last two years, Bita has been analyzing NGS data at SOPHiA GENETICS and developing copy number variation (CNV) detection modules.
We invite you to spend a few moments with Bita to learn about the challenges associated with CNV detection and how SOPHiA GENETICS' CNV detection algorithm was developed to overcome these challenges.
Why is CNV detection important when analyzing next-generation sequencing data?
Next-generation sequencing (NGS) is a high-throughput technique that generates high-resolution genomic data which allows for simultaneous detection of many genomic variants, such as SNVs, Indels, and CNVs. CNVs are a structural variation in which DNA segments of one kilobase or larger are present at a variable copy number (duplications or deletions) compared to a reference genome. They have clinical and diagnostic relevance as they have been associated with cancers and rare genetic disorders. Although microarray (or SNP-array) comparative genomic hybridization (aCGH) and multiplex ligation-dependent probe amplification (MLPA) are the gold standards for CNV detection, neither can detect small variations such as SNVs and Indels. The decreasing cost of NGS and the ability to simultaneously detect multiple genomic alterations in a single run have encouraged the widespread use of NGS for CNV detection.
Why are CNVs generally difficult to detect using NGS?
CNVs are challenging to detect via targeted capture because the relationship between sequencing depth and copy number is affected by many sources of bias, e.g., GC content and target region length, capture efficiency, amplification efficiency, DNA concentration, hybridization temperature, nature of capture, batch effects, and so on. These biases result in coverage heterogeneity, even for diploid regions (copy number of 2) and must be accounted for to accurately infer copy number from coverage data.
What challenges are associated with CNV detection in exome data?
On top of overcoming the biases mentioned above, when analyzing the human exome we have the cumulative challenge of sequencing only the protein-coding regions (exons). This results in sparse coverage, as the targeted regions only cover about 1% of the whole genome. Lack of coverage across the entire genomic profile causes us to miss most breakpoints, leaving read depth as the only available information source for CNV detection. Other challenges with detecting CNVs in exome data include the presence of many polymorphic regions for which the normal copy number is already higher or lower than two, and the presence of homologous regions, which is problematic for short read alignment.
How are CNVs detected using the SOPHiA DDM™ Platform?
CNV analysis by SOPHiA DDM™ Platform is performed based on coverage analysis of targeted regions. Our CNV algorithm automatically selects reference samples among the samples within the same run to perform normalization. We apply a double normalization to account for both sample-specific and region-specific biases. CNV detection is performed by using a hidden-Markov-model algorithm to find CNVs spanning adjacent regions. Additionally, the algorithm provides quality measures for each sample based on the residual noise.
What is the reasoning behind SOPHiA GENETICS’ approach?
Our normalization approach corrects for read-depth variations among regions by leveraging information from different samples in the same run. Assuming that all samples are processed in parallel, the double-normalization step corrects for all sources of targeted sequencing bias mentioned earlier. We also use our knowledge of the genome to curate target regions for each specific exome panel so that regions that would be problematic for our CNV detection algorithm are excluded, e.g., regions with systematically low coverage, high noise, or polymorphic or homologous regions.
What parameters does the exome sequencing panel need to achieve for good quality results?
Datasets with high coverage and low capture bias achieve high-quality results.
What resolution of CNVs can be achieved?
It depends on the exome panel, but with high-quality panels (good probe design) and deep sequencing depth (~600x), we can achieve even single-exon resolution.
What sets SOPHiA GENETICS’ CNV-calling algorithm apart from others?
Four key features set the SOPHiA GENETICS CNV-calling algorithm apart from others. The algorithm…
These four features ensure that we achieve good sensitivity and precision in CNV calling with the SOPHiA DDM™ Platform for Rare and Inherited Diseases, including inherited forms of cancer.
Several RNA alterations have been described in Oncology, including gene fusions, recognized driver mutations in neoplasia1. More than 10,000 gene fusions have already been identified in human cancers1; it is estimated that up to 80% of solid tumors could benefit from gene fusion testing2. The number of new drugs specifically targeting gene fusion-positive cancers keeps growing: the advantages that proper gene fusion detection could bring to clinical cancer management are noticeable.
Transcriptome sequencing has emerged as an effective method to identify gene fusions and has become a routine task in cancer research and precision medicine3. However, although a variety of computational tools have been developed over the years, an optimal solution with high analytical performance for fusion detection and the ability to maximize the insights from small precious RNA samples has been lacking.
SOPHiA DDM™ RNAtarget Technology addresses these requirements by combining powerful novel (partner-agnostic) fusion detection capabilities as well as SNV/Indels detection in selected genes and expression changes assessment. Powered by a deep learning algorithm, the Technology works with a very low sample input, a fully customizable gene panel, and a streamlined automated workflow that supports all stages of the analysis with high sensitivity. Finally, a convenient yet powerful and fast results visualization and interpretation are ensured by the associated SOPHiA DDM™ Platform.
To better understand how SOPHiA GENETICS developed the SOPHiA DDM™ RNAtarget Technology and its features, we sat down with Mikhail Pertziger, the Clinical Application Product Manager for SOPHiA DDM™ RNAtarget Technology at SOPHiA GENETICS.
After studying Biomedical Science for my undergraduate degree, I continued with a PhD in the Molecular Biology of Breast and Colon Cancers, so Cancer and Molecular Diagnostics is very much the area where I have spent a lot of my academic and industry years. They say that the 21st century is the era of biology, and I will have to add that precision medicine is the future of cancer management. Biomarker-guided therapies are introducing dramatic differences in how oncology conditions are managed. Fusions are the latest frontier to receive broad applicability in the clinic with more and more drugs being introduced – this is bound to grow and accelerate. I'm excited to be working on a product that allows our customers to have access to a technology that takes on the learnings of the previous years and enables them to be more confident about detecting fusions.
I have worked at SOPHIA GENETICS for just over three years and currently lead the SOPHiA DDM™ RNAtarget Technology development and launch. The development team includes experts in a broad range of applications, including the core development of BioInformatics, NGS, programming, as well as logistics, Regulatory, Legal, and Marketing.
As I mentioned previously, gene fusions are the latest type of biomarker to receive broad applicability in cancer management. The results of targeting fusions reported in clinical trials and now being seen in routine care are fascinating. The number of new drug approvals in fusion-positive cancers has been continuously increasing over the last decade – up to 80% of solid tumors NGS tests could benefit from the inclusion of fusion testing. With histology-agnostic approvals, this number approaches 100%2. In parallel, more clinical trials are being rolled out to target fusion-positive cancers, hopefully leading to further improvements in treatment options in the near future.
Up to 80% of solid tumors NGS tests could benefit from the inclusion of fusion testing2
SOPHiA DDM™ RNAtarget Technology came about from our users' feedback on the need to have an application that allows them to detect novel fusions without sacrificing sensitivity in smaller biopsy samples. The work on this Technology started more than a year ago and has involved many feasibility and optimization studies to ensure that we're not just providing a regular solution, but a product that really helps users achieve more.
What we wanted to provide with this product was the ability for users to have a high-performance fusion detection that could be run in a very small amount of material, a streamlined (but robust) workflow, as well as the ability to not only detect fusions but also be able to extract as much information as possible from that small sample with detection of SNVs and Expression changes. For convenience, the gene content can be customized to fit the lab's individual needs, automated workflow to reduce resource constraints, and, finally, the product runs on the industry-leading SOPHiA DDM™ Platform, providing convenient visualization, annotation, and reporting of the results.
SNV detection in RNA is an intriguing area of development. There are several applications where SNV detection is beneficial, including the ability to run an RNA-only workflow in cases where genes of interest are sufficiently highly expressed. SNV detection in RNA also opens up the possibility of running RNA and DNA workflow sequentially, where the initial RNA workflow will likely detect the majority of relevant variants, leaving only a subset of samples that need to be also processed through the DNA workflow.
Another benefit of detecting SNVs in RNA is the ability to use them as an additional data point in calling SNVs in DNA or using RNA-based SNVs as a backup in case of issues with DNA.
Finally, having the information on SNV VF% in RNA, is like adding an additional dimension to the molecular profile created by DNA SNVs – as this provides more dynamic, rich, and potentially more insightful information on the state of things in the tumor.
Overall, there are many novel and unique ways this feature of SOPHiA DDM™ RNAtarget Technology could be used, and I'm excited to see how our users will utilize it.
One of the primary applications for the product is, of course, Lung cancer because of the limited amount of biopsy material that is generally available and a high number of clinically relevant fusions in this pathology. At the same time, the solution can be deployed to test any solid tumor, and we're looking at the possibility of running it in blood tumors as well. Moreover, because the gene content is entirely customizable, users can tailor the gene content to the needs of their labs, clinical research, or clinical trials that they want to be part of. This makes the application very versatile while removing the obstacle of manually optimizing the pipeline's performance because the SOPHiA GENETICS BioIT team will take care of that. Given the proliferation of tumor-agnostic biomarkers and, in particular, tumor agnostic fusions, the applicability of this Technology is only going to grow and expand in the future, and the fusion detection would become an integral part of genomic profiling of any cancer, together with SNV and CNVs.
SOPHiA DDM™ RNAtarget Technology can be deployed to test any solid tumor
It's really the combination of 3 main features that make for a cohesive and well-rounded product that offers a lot of value from several perspectives - Novel fusion detection, High sensitivity at low input amounts, and Customizability of the panel. These features provide an excellent foundation for a future-proof, high-performance solution. If you look at the market, there are very functional solutions that offer one or two of these features, but not all 3.
In addition to those three features, we included other functionality that I refer to as "two data points for every variant type," where 5’-3' imbalance serves as an additional data point for calling fusions, SNV detection in RNA can be used together with SNV calls in DNA, and expression changes provide further details and reassurance for the CNVs.
Finally, the streamlined protocol that can also be automated further refines the convenience factor of this solution.
"How" is straightforward: SOPHiA DDM™ RNAtarget Technology utilizes a hybrid-capture approach, which targets the key clinically relevant kinases. This protocol is augmented by a careful probe design process to make sure the panel performs at the highest level in the hands of our users.
On the other hand, it is worth highlighting that the detection of novel (or partner-agnostic) fusions is becoming a more and more prominent feature requested by labs striving to provide the highest level of care. This is underpinned by the higher inherent sensitivity, which is becoming more important in the rise in approval of fusion targeting therapies in a partner agnostic manner.
I would say it comes down to the challenges of needing to have high sensitivity in low sample input, the ability to detect novel fusions, and having a tailored solution that perfectly fits the needs of the lab, plus addressing the need for a convenient yet powerful visualization and interpretation platform – the SOPHiA DDM™ Platform.
Inhibition of poly (ADP-ribose) polymerase (PARP) activity induces synthetic lethality in BRCA-mutated tumors by selectively targeting tumor cells that fail to repair DNA double-strand breaks.1 The term ‘BRCAness’ has been coined to describe tumors that are BRCA wild-type but are still sensitive to treatment with PARP inhibitors (PARPi). Like BRCA-mutated tumors, BRCAness tumors have HRD but have mutations in other genes involved in the homologous recombination repair (HRR) pathway (for example, RAD51, CHEK1, CHEK2, BRIP1, ATM). Detecting aberrations in HRR relevant genes, not only BRCA1 and BRCA2, is important to optimize the use of PARPi.1
In a single assay, the SOPHiA DDM™ HRD Solution combines identification of mutations in 28 HRR genes with a measure of genomic integrity called the Genomic Integrity Index. Powered by deep learning algorithms, the Genomic Integrity Index reveals the extent of genomic scarring as a result of HRR gene mutations across the entire genome.
To gain a better understanding of how SOPHiA GENETICS™ developed the HRD technology, we sat down with Dr. Christian Pozzorini, the Technical Product Manager for the SOPHiA DDM™ HRD Solution and the Director of Biostatistics Research at SOPHiA GENETICS™.
After studying life sciences and technology with a focus on genetics and molecular biology during my undergraduate years, I went on to do a PhD in computational neuroscience, a field closely related to machine learning. I have worked at SOPHiA GENETICS™ for nearly 6 years and currently lead the Biostatistics Research team. The team has extensive expertise in statistical techniques and machine learning. We investigate different methods by which to extract the maximum value out of next generation sequencing (NGS) data, to best support clinical researchers. Two years ago, we recognized HRD and genomic scarring as a key topic in which machine learning technologies could be beneficial to exploit information hidden in NGS data.
HRD is a hot topic in cancer research and is particularly relevant in clinical practice since novel therapies, such as PARPi, have been developed to target HRD tumors. These drugs were initially prescribed to patients with BRCA-deficient tumors, but it is now clear that some BRCA wild-type patients can benefit from the same treatments. The drugs were indeed conceived to target tumors in which the HRR pathway is deficient, and BRCA mutations are only one subset of mutations leading to HRD. HRD testing is particularly relevant in ovarian cancer patients where HRD is frequent, and where HRD testing has been associated with clinical response to PARPi. Additionally, research has shown that other cancer types may have HRD and so HRD testing may also become standard practice when determining treatment options for these cancers.
HRD testing is particularly relevant in ovarian cancer patients where HRD is frequent, and where HRD testing has been associated with clinical response to PARPi.
Detecting HRD is made complicated by the fact that multiple genomic mutations may lead to HRD. If a mutation in a known HRR gene is identified, it is difficult to predict its impact on HRD status. For example, does a particular mutation in PTEN impair the HRR pathway?
To overcome this challenge the scientific community proposed a different, yet complementary, approach. Instead of detecting HRD by looking for genomic aberrations known to cause HRD, rather look for genomic aberrations (or genomic scars) that occur as a result of HRD. The challenge of genomic scarring is that it does not happen in specific locations of the human genome. Scars can potentially happen anywhere at the whole genome level, and so genomic scarring interpretation requires NGS data that covers the entire genome.
The most promising genomic scarring approaches rely on high coverage whole genome sequencing (WGS) data and often require sequencing both the tumor sample and a normal tissue-matched control. While these approaches are effective, they are not suitable for routine testing in the clinical setting. The cost of high coverage WGS is still prohibitive and often sequencing paired tumor and control tissues is not feasible.
We hypothesized that WGS sequencing performed at low coverage (<1x) could contain sufficient information to detect HRD in samples. To test this hypothesis, we worked on publicly available datasets. These datasets contained high coverage matched tumor-normal WGS, for which the HRD status was established via genomic scars. We conceived a machine learning approach aimed at predicting HRD status by only looking at 1% of the original data in the tumor-only samples, still sequenced genome-wide but at a coverage of 1x or less. Surprisingly, we found that 1% of the data was sufficient to achieve excellent performance in predicting the HRD status that was previously established using 100% of the data.
In collaboration with the SOPHiA GENETICS™ genomic research laboratory, we explored the possibility of extending our library preparation and sequencing pipeline to generate a single workflow that includes: 1) the generation of low coverage WGS data for the detection of genomic scars, and 2) high coverage data for the targeted detection of mutations in HRR genes.
SOPHiA GENETICS’™ standard solutions rely on a targeted sequencing approach whereby an NGS-based WGS library is prepared, enriched, and sequenced to obtain high coverage NGS data in specific genomic regions of interest where variant calling will be performed. One might say that the technology already existed. The only modification was to load the sequencer with both the enriched library and a small amount of the WGS library (before enrichment). Thus, the SOPHiA DDM™ HRD Solution can simultaneously, and economically, generate WGS and targeted sequencing data to detect HRD via a genomic scarring approach (machine learning on WGS data), as well as via a traditional approach (variant calling on targeted data).
The SOPHiA DDM™ HRD Solution can simultaneously, and economically, generate WGS and targeted sequencing data to detect HRD.
Variant calling in HRR genes cannot entirely solve the HRD challenge. The Genomic Integrity Index is computed using our deep learning (a type of machine learning) approach and is a measure of a tumor that while being BRCA wild-type, is HRD positive. Our solution thus allows for the identification of additional HRD positive cancers that can benefit from PARPi.
Deep learning is particularly efficient in solving image classification tasks. A classical textbook example is, given an image, tell me if the image is of a cat or a dog. Convolutional neural networks solve this problem by learning from data which, in this example, is a dataset of images showing cats or dogs. The features present in the images that are best suited to distinguish dogs from cats are selected to train the data. After optimizing the parameters of the neural network, the convolutional neural network can take an image, that was not seen before, and determine whether it is of a cat or a dog.
Our technology works in the same way. The only difference is that the image used as input shows the coverage profile obtained from low coverage WGS data. The task of the convolutional neural network is to establish if the image comes from an HRD-positive or HRD-negative tumor sample.
After training and testing our deep learning algorithm on public data, we started working on clinical FFPE samples from ovarian cancer patients. This work was done in collaboration with some prestigious labs, including Diagnosticos da America SA (DASA) in Brazil. These samples were simultaneously tested with our HRD solution and gold standard methods used for HRD testing. The excellent concordance observed in these studies confirmed the validity of our solution.
Using a single workflow, our solution allows for the simultaneous generation of targeted sequencing and WGS data. Our deep learning algorithm allows us to make accurate predictions on the HRD status of a tumor sample, without requiring a matched normal sample, making the approach more feasible. Given the power of our deep learning algorithm, only a limited amount of WGS data is required. This allows for multiplexing up to 24 samples in a single Illumina NextSeq® 550, making the solution economically viable for routine testing.
Our deep learning algorithm allows us to make accurate predictions on the HRD status of a tumor sample.
The ESMO guidelines recommend the incorporation of HRD testing in addition to BRCA testing for women with newly diagnosed advanced ovarian cancer.2 Approved solutions for HRD tests fail to reliably identify patients who will not respond to PARPi. This is compounded by the fact that current solutions for HRD detection are expensive and sometimes infeasible (e.g. require paired tumor-normal samples and use of send-out services). Clinical researchers are thus looking for a cost-effective and reliable test to determine HRD status in ovarian cancers and identify those that could benefit from PARPi in the first-line or maintenance therapies. That is exactly what the SOPHiA DDM™ HRD Solution provides – it allows clinical researchers to independently conduct an accurate and comprehensive 2-in-1 HRD test in-house, in a cost-effective and time-saving manner.
SOPHiA GENETICS products are for Research Use Only and not for use in diagnostic procedures unless specified otherwise.
SOPHiA DDM™ Dx Hereditary Cancer Solution, SOPHiA DDM™ Dx RNAtarget Oncology Solution and SOPHiA DDM™ Dx Homologous Recombination Deficiency Solution are available as CE-IVD products for In Vitro Diagnostic Use in the European Economic Area (EEA), the United Kingdom and Switzerland. SOPHiA DDM™ Dx Myeloid Solution and SOPHiA DDM™ Dx Solid Tumor Solution are available as CE-IVD products for In Vitro Diagnostic Use in the EEA, the United Kingdom, Switzerland, and Israel. Information about products that may or may not be available in different countries and if applicable, may or may not have received approval or market clearance by a governmental regulatory body for different indications for use. Please contact us at [email protected] to obtain the appropriate product information for your country of residence.
All third-party trademarks listed by SOPHiA GENETICS remain the property of their respective owners. Unless specifically identified as such, SOPHiA GENETICS’ use of third-party trademarks does not indicate any relationship, sponsorship, or endorsement between SOPHiA GENETICS and the owners of these trademarks. Any references by SOPHiA GENETICS to third-party trademarks is to identify the corresponding third-party goods and/or services and shall be considered nominative fair use under the trademark law.