Tell us a bit about your role.
I am a Senior Clinical Genomic Variant Analyst at the Broad Clinical Labs – a CLIA/CAP certified lab at Broad Institute of MIT and Harvard. Broad Clinical Labs offers fee-for-service clinical genomic testing, and I am part of the clinical interpretation team for whole genome sequencing. Clinical testing can be ordered from our laboratory by clinicians; our lab performs whole genome sequencing and provides technical genome data and an interpretative report. Our team’s goal is to review the genome data and find reportable variant(s) to share with clinicians.
What type of cases do you work with?
Some of the testing we perform is panel based, but mostly we work with whole genome sequencing data. Because we do not bill insurance, we create relationships with individual institutions or organizations and serve as their sequencing and interpretation partner. Currently, some of the projects we are partnering on include cases focused on rare disease, a CICU cohort, and neurodevelopmental disorders. We use Alamut™ Visual Plus as an add-on tool to our existing variant interpretation workflow when we need additional information about an identified variant.
What are the biggest benefits of Alamut™ Visual Plus that help you overcome your variant interpretation challenges?
Seeing and understanding the genomic context that a variant exists within can be critical for interpretation. Alamut™ Visual Plus makes it easy to see the context at a glance, providing link-outs to other databases for even more detailed information.
Alamut™ Visual Plus is especially helpful when I’m trying to assess the potential impact of splicing. Alamut™ Visual Plus provides calculated splice predictor scores, but I also want to see exactly where in the gene the variant falls in relation to the exon and to visualize the potential impact of exon skipping and whether it’s likely to result in an in-frame or out-of-frame change. Alamut™ Visual Plus makes this easy to visualize.
Can you share an example where this made a difference?
I was recently assessing a splice variant in a gene known to cause a disease that fit with the given phenotype. The variant was +3bp away from a codon, but this alone wasn’t sufficient to make it reportable. After visualizing the splicing impact of the variant within Alamut™ Visual Plus to confirm the potential for out-of-frame exon skipping, I felt more confident to report the variant as a VUS of interest.
Do you have another example where Alamut™ Visual Plus helped you assess a variant?
In another recent case, I had found what looked to be a potentially relevant variant. This variant had been reported using different historical nomenclature in several publications. I came across one paper with a series of cases and variants, that included a variant at both the historic nomenclature and the current nomenclature position. Alamut™ Visual Plus let me easily view and scan the context of the amino acid sequence across different transcripts to appropriately anchor myself to the right reference. This allowed me to confirm I was using correct variant information from this publication in my interpretation.
We’d like to thank Katherine Lafferty for her time and for sharing her experience. We look forward to continuing our conversations with Broad Clinical Labs! Click here to learn more about Alamut™ Visual Plus and request a free trial.
The HGVS nomenclature guidelines are used worldwide for genetic variant interpretation but can seem complicated and difficult to understand and apply. That is why we have created this beginner’s guide to mutation nomenclature using the HGVS recommendations, with clear visual examples that break down the process into bitesize pieces.
1. What is HGVS nomenclature?
2. How to read mutation nomenclature: Breaking down the variant description
2.1 Reference sequence e.g., NM
2.2 Description of variant e.g., c.4375C>T
2.3 Predicted consequence e.g., p.(Arg1459*)
3. The 3 prime rule for mutation
4. Final thoughts and helpful tool
The Human Genome Variation Society (HGVS) nomenclature standard was developed to prevent the misinterpretation of variants in DNA, RNA, and protein sequences. The HGVS nomenclature standard is used worldwide, especially in clinical diagnostics, and is authorized by the Human Genome Organisation (HUGO).1,2
HGVS General Terminology Recommendations1
X Do not use | ✔️ Recommended terminology |
Mutation or polymorphism | Variant, change, allelic variant Can be used for cancer tissue: Mutation load and tumor mutation burden |
Pathogenic | Affects function, disease-associated, phenotype-associated |
HGVS follow recognized standards for the nomenclature of DNA and RNA nucleotides, the genetic code, amino acid descriptions, and cytogenetic band position in chromosomes.3
The HGVS recommendations for mutation nomenclature state that the format of a complete variant description should first include the reference sequence, followed by the variant description, and then the predicted consequence in parentheses. For example, NM-004006.2:c.4375C>T p.(Arg1459*) (Figure 1).
The HGVS nomenclature recommendations for sequence variants state that a complete variant description should begin with the reference sequence.1 The reference sequence accession number begins with a two-letter abbreviation (explained in Table 1), followed by a multi-digit number, and finally a version number.
Table 1. Meaning of the two-letter abbreviation at the beginning of a reference sequence accession number.
Abbreviation | Reference sequence based on a: |
NC | Chromosome |
NG | Gene or genomic region |
LRG | Locus Reference Genomic sequence: Gene or genomic region, used in a diagnostic setting |
NM | Protein-coding RNA (mRNA) |
NR | Non-protein-coding RNA |
NP | Protein (amino acid) sequence |
The variant description begins by depicting the type of reference sequence used (c = coding DNA sequence, g = genomic reference sequence). When a protein-coding reference sequence is used (c), the nucleotide numbering begins with a 1, which represents the first position in the protein-coding region (the A of the translation-initiating ATG), and ends at the last position of the stop codon. Thus, if you divide the position number by 3, you can identify the affected amino acid in the protein sequence e.g., using the same example as above, 4375/3 = 1459, indicating that the predicted consequence affects amino acid 1459, which is an arginine. Different variants are indicated using different notations (explained in Table 2).
Table 2. HGVS notation and examples for the most common types of mutations2
Notation | Example | Explanation |
> | c.4375C>T | Substitution of the C nucleotide at position c.4375 with a T |
del | c.4375_4379del or c.4375_4379delCGATT | Nucleotides from position c.4375 to c.4379 deleted |
dup | c.4375_4385dup or c.4375_4385dupCGATTATTCCA | Nucleotides from position c.4375 to c.4385 duplicated |
ins | c.4375_4376insACCT | ACCT inserted between positions c.4375 and c.4376 |
delins | c.4375_4376delinsACTT or c.4375_4376delCGinsAGTT | Nucleotides from position c.4375 to c.4376 (CG) are deleted and replaced by ACTT |
When only DNA has been analyzed, the RNA- and protein-level consequences of the variant can only be predicted, and should thus be reported in parentheses e.g., p.(Arg1459*) is the predicted effect at protein level (p) for the example described above.
For all variant descriptions using HGVS nomenclature, the nucleotide at the most 3’ position of the variation in the reference sequence is arbitrarily assigned to have changed (see how to apply this rule in Figure 2).4 The exception is for deletions/duplications around exon junctions for which shifting the variant 3’ would place it in the next exon.5
Although the HGVS recommendations can be difficult to understand and might take a bit of getting used to, if you break them down and refer to the examples in this guide, you are on the road to success!
If you want to accelerate your variant annotation and interpretation, Alamut™ Visual Plus is a comprehensive, full genome browser for efficient and user-friendly variant interpretation. The software accelerates the complex and time-consuming assessment of variants thanks to its user-friendly interface and integrated features for variant annotation and analysis.
Find out how Alamut™ Visual Plus applies the HGVS nomenclature recommendations to ensure that variant annotation follows the universally applied standards for variant analysis, interpretation, and reporting in our dedicated Technical Note.
Alamut™️ Visual Plus is for Research Use Only. Not for use in diagnostic procedures.
References
Dr. Mohamed Z Alimohamed kindly summarized his upcoming peer-reviewed publication in Gene:
“Current splice prediction algorithms have limited sensitivity and specificity, therefore many potential splice variants are classified as variants of uncertain significance (VUSs).
However, functional assessment of VUSs to test splicing is labor-intensive and time-consuming. We have developed a decision tree, SEPT-GD, by setting thresholds for the splice prediction programs implemented in Alamut™️ to prioritize potential splice variants associated with cardiomyopathies for functional studies, and functionally verified the outcome of the decision tree.
SEPT-GD outperforms the tools commonly used for RNA splicing prediction and improves prioritization of variants in cardiomyopathy genes for functional splicing analysis.”
Click here to read the full publication.
Alimohamed MZ, Boven LG, van Dijk KK, Vos YJ, Hoedemaekers YM, van der Zwaag PA, Sijmons RH, Jongbloed JDH, Sikkema-Raddatz B, Westers H. SEPT-GD: A decision tree to prioritise potential RNA splice variants in cardiomyopathy genes for functional splicing assays in diagnostics. Gene. 2023 Jan 30;851:146984.
Alamut™️ is for Research Use Only. Not for use in diagnostic procedures.
Our Technical Note outlines the guidelines and standards behind the nomenclature convention deployed in Alamut™ Visual Plus. We also explain how Alamut™ Visual Plus applies them to ensure that variant annotation follows the universally applied standards for variant analysis.
[divi_shortcode id="10869"]
The advent of whole-genome sequencing (WGS) ushered in a whole new world for biological research. Since the Human Genome Project was (partially) completed 20 years ago, the cost of WGS has plummeted, now sitting at $1000 or less (vs more than $1,000,000 at its height). However, sequencing a whole human genome is data-heavy and can take 100GB to upwards of 1TB of hard drive space for a single sequencing run, generating a vast amount of data to analyze. Enter the exome.
Whole exome sequencing (WES) sequences all the protein-coding regions (exons) of the genome but avoids the non-coding introns. Coming into vogue in the 2010s, the technique has increased exponentially in popularity in the past decade.1 There are several advantages to WES over other technologies. The exome is only about 1.5-3% of the genome, so focusing on the exome rather than the genome substantially reduces the size and cost of a sequencing run. WES runs come in at a scant few gigabytes of hard drive space, and while the cost is not 1.5-3% of WGS, WES generally costs around a quarter of a full genome.2 Exome sequencing is highly efficient, with diagnostic yields ranging from 20-40% in the clinic,3 compared with yields in the single digits for comparative genomic hybridization arrays (aCGH). Though specific figures vary from study to study, exome sequencing tends to beat microarray-based sequencing methods and targeted next-generation sequencing (NGS) gene panels by a factor of 3-4x in terms of diagnostic yield.4 When using NGS techniques to find pathogenic variants, exome sequencing shines, since 89% of variants reported as pathogenic in NCBI’s ClinVar come from protein-coding regions (at least in Mendelian disorders). That number skyrockets to 99% when neighboring regions are included.5
Starting with exome sequencing can shorten diagnostic time and increase diagnostic yield.
In a diagnostic situation, exome sequencing offers many advantages over targeted panels. A targeted panel is often selected from a phenotype-first point-of-view, where a patient’s clinical presentation guides the selection of a specific set of genes to examine. This approach can, however, be unsuccessful, resulting in either needing a whole new panel or reflexing to exome sequencing. Exome sequencing is the preferred method of diagnosis in several common scenarios: when patients have generalizable symptoms that could have arisen from a variety of underlying conditions; when a disease is genetically heterogeneous or has substantial phenotypic variability (like hydrops fatalis); or if a rare disease has a genetic cause that simply lies outside of the known genes included in targeted panels.6,7 What’s more, starting with exome sequencing can speed up diagnostic time, increasing diagnostic yield by more than 40% in one clinic’s hands. Shortening the time from presentation to diagnosis can radically reduce clinical costs and improve patient outcomes, by quickly pinpointing treatment and care options.
Peculiarities in sequence features can be a problem for any kind of exome sequencing analysis. Both disproportionately high GC or TA content can decrease the accuracy of exome sequencing.8 Why this should be the case is not always particularly clear, though blame commonly falls on PCR or polymerase-based issues (due to the higher melting temperature for GC-rich regions for instance). Careful probe design can help avoid this problem. It can also help attenuate the challenge of capturing sequences located within highly ordered, difficult structures — particularly approaches that use overlapping probes rather than end-to-end or even gapped probes.9
Copy number variations (CNVs), mutations that duplicate or delete segments of DNA, arise in a variety of diseases. Until recently, WES was not the preferred method for CNV detection, since chromosomal microarrays performed better (with a yield of around 20% for chromosomal aberrations reported in 2010)10. WES datasets have, in the past, lacked the consistency between methods, and the high-quality reference material needed for some clinical applications. However, recent work proposed changing the order of workflow and incorporating sequencing data from outside WES databases,11 making WES a powerful technique for detecting pathogenic CNVs.
For a wide variety of rare diseases such as metabolic, neurological, and developmental disorders, the precise causative genetic mutation has often only ever been seen once – 69.6% of cases in Orphanet (a rare disease database) have only one documentation.12 Many rare diseases can arise from a variety of contributing genes along a signaling pathway. In these situations, past efforts like those using traditional Sanger sequencing can often fail to pick up a causative variant, but there is a litany of examples of exome sequencing succeeding in their stead.13,14
Although exome data requires heavy processing before analysis, the researcher does not necessarily require a bioinformatics skill set. Much of the processing, like trimming low-quality bases at the end of reads or detecting and deleting adapters, can be achieved automatically using specialized analytic software.
Determining which variants are pathogenic and which are not can be a difficult task. The drawback, of course, of WES versus a targeted gene panel, is that there are much more data to deal with. WES runs commonly find tens of thousands of variants - in extreme cases, hundreds of thousands.15,16 This can introduce an order of magnitude of noise into a dataset. Applying a few key rules and variant filtering strategies can reduce the number of variants by 90-95% so that researchers might then be faced with 150-500 candidate mutations, a much more manageable number.16 Further investigation requires consulting scientific databases and repositories to determine whether the detected exonic variants are pathogenic, benign, or variants of unknown significance (VUS). Unfortunately, there is no single, all-encompassing database that contains all the information needed to interpret variants. That means, unless researchers have access to software that taps into tens of different databases, it will take substantial time to manually check through one by one. In addition, despite the vast amount of information available, a significant proportion of detected variants are still VUS, highlighting the ever-growing need for more research in the field.
The SOPHiA DDM™ Platform for rare and inherited disorders accurately detects a range of variant classes, with high sequence coverage uniformity even in complex and GC-rich regions. Where multiple combinations of NGS technologies can introduce artifacts and inconsistencies, the SOPHiA DDM™ Platform filters the noise and bias to deliver advanced analytical performance independent of the input. This high-quality, noise-filtered output is used to accurately detect CNVs with exon-level resolution. The platform even covers ∼200 variants in non-coding regions and the entire mitochondrial genome to ensure comprehensive exome analysis. No matter the variant type — SNV or CNV — SOPHiA GENETICS’ analytics provide optimized variant detection in a single experiment.
Where pinpointing pathogenic mutations can be difficult, SOPHiA DDM™ complemented by Alamut™ Visual Plus can help cut through the noise and enable a deep exploration of variants. Together, the two technologies annotate variants with information from more than 55 world-renowned biological databases and repositories, including missense and splicing predictors. The SOPHiA DDM™ Platform offers additional filtering features such as Virtual Panels to limit interpretation to genes associated with specific disorders, Cascading Filters to reduce analysis to variants with specific characteristics, and Familial Variant Analysis for consideration of parental samples and inheritance mode. For on-the-ground level accuracy, users benefit from becoming members of the SOPHiA GENETICS Community, where experts flag variant pathogenicity to improve interpretation, even of VUS. Finally, for efficient and user-friendly interpretation, Alamut™ Visual Plus enhances the visualization of variants in a comprehensive full genome browser.
Innovations in sequencing and analysis aren’t slowing, and researchers need tools to keep afloat in the flood of data. Analytical technologies like SOPHiA DDM™ and Alamut™ Visual Plus are ideal parts of a researcher’s arsenal to find pathogenic variants and get clear answers from complex WES datasets.
SOPHiA GENETICS products are for Research Use Only and not for use in diagnostic procedures unless specified otherwise.
SOPHiA DDM™ Dx Hereditary Cancer Solution, SOPHiA DDM™ Dx RNAtarget Oncology Solution and SOPHiA DDM™ Dx Homologous Recombination Deficiency Solution are available as CE-IVD products for In Vitro Diagnostic Use in the European Economic Area (EEA), the United Kingdom and Switzerland. SOPHiA DDM™ Dx Myeloid Solution and SOPHiA DDM™ Dx Solid Tumor Solution are available as CE-IVD products for In Vitro Diagnostic Use in the EEA, the United Kingdom, Switzerland, and Israel. Information about products that may or may not be available in different countries and if applicable, may or may not have received approval or market clearance by a governmental regulatory body for different indications for use. Please contact us to obtain the appropriate product information for your country of residence.
All third-party trademarks listed by SOPHiA GENETICS remain the property of their respective owners. Unless specifically identified as such, SOPHiA GENETICS’ use of third-party trademarks does not indicate any relationship, sponsorship, or endorsement between SOPHiA GENETICS and the owners of these trademarks. Any references by SOPHiA GENETICS to third-party trademarks is to identify the corresponding third-party goods and/or services and shall be considered nominative fair use under the trademark law.