Alamut™ Visual Plus (36)
gnomAD data are displayed in the Allele Frequency Databases track. Right-clicking on the variant of interest reveals the variant panel with details from gnomAD shown in a box within AlamutTM Visual Plus. At the top of the gnomAD box you will see Genome and Exome tabs, which, if selected, are highlighted in blue and information is shown in the box. Clicking on the hyperlinked “gnomAD (vx.x.x)” tab will take you to the gnomAD page for the variant. Note that depending on the type of data and which filters are used, the text will be different, but it will start with gnomAD.
By default, AlamutTM Visual Plus includes all available variants from GnomAD, irrespective of the filters or the filter cut-off.
There are three filters – the “PASS”, RF, and AC0 filters. Each of these filters has cut-offs defined by gnomAD, for both Exome and Genome data. If you only want to view variants that have passed through all the quality filters, tick the box “PASS only” on the gnomAD track for AlamutTM Visual Plus, deciding if you want to see Exome or Genome data, or both.
gnomAD provides the following recommendation on which version to use: “gnomAD v2 is still our recommended dataset for most coding regions analyses.”
AlamutTM Visual Plus currently uses Genome Aggregation Database Version 2.1. The link to this version of gnomAD is: https://gnomad.broadinstitute.org/.
The new version of gnomAD (version 3.1) was released at the end of 2020 and is not currently available in AlamutTM Visual Plus. gnomAD 3.1 is currently only available for the GRCh38/hg38 genome build and only for whole genome data. We are assessing this dataset but do not have a defined deadline for implementation in AlamutTM Visual Plus.
The gnomAD v2 call-set contains fewer whole genomes than v3.1, but also contains a very large number of exomes that substantially increase its power as a reference in coding regions. Therefore, gnomAD v2 is still our recommended dataset for most coding regions analyses. However, gnomAD v3.1 represents a very large increase in the number of genomes and may be more suitable if your primary interest is in non-coding regions, or if your coding region of interest is poorly captured in the gnomAD exomes. This can be assessed using the coverage plots in the gnomAD browser. Most genomes in v2.1.1 are included in v3.1 and should therefore not be considered independent sample sets. We are currently assessing the v3.1 dataset, but do not have a defined deadline for implementation in AlamutTM Visual Plus.
Another consideration when choosing which gnomAD dataset to use is the ancestry of the samples that you are interested in. gnomAD v3.1 contains a substantially larger number of African American samples than v2 (exomes and genomes) and for the first time provides allele frequencies for the Amish population. gnomAD v3.1 also has a fully genotyped call-set available from the Human Genome Diversity Project and the 1000 Genomes Project, representing >60 distinct populations.
Finally, gnomAD v3.1 was mapped to GRCh38. So, if your data is on this build, it probably makes sense to switch to v3.1. There is also a liftover version of gnomAD v2.1.1 onto GRCh38 available. gnomAD plans to produce a larger GRCh38-aligned exome call set in 2022.
gnomAD links are built based on the position, ref, and alt of the variant. In AlamutTM Visual Plus, if we apply the 3′ rule, those values may change, which directly affects the url functionality.
We compute Genotype Count based on values from a gnomAD VCF. In some cases, we end up with negative counts due to discrepancies in the initial values provided in the VCF. This is the way gnomAD is handling these variants.
RefSeq displays GRCh38 as a default genome for all rsID links. On dbSNP variant pages, links for the GRCh37 genome build are usually available.
Why are there sometimes mismatches between genes and transcripts (nucleotides highlighted in red on the transcript track)?
This is due to occasional genome/transcript sequence discrepancies, where the genome reference includes polymorphism minor alleles, but the transcript includes corresponding major alleles. This means that some genomic variants are seen as ‘non-variants’ if analyzed at the transcript level.
Basically, at the positions highlighted in red, the nucleotide of the transcript differs from the nucleotide of the genome build (GRCh37 or GRCh38). For these nucleotides, it is more difficult to definitively determine whether a variant is indeed a variant.
These discrepancies mainly occur in RefSeq transcripts (Beginning with “NM_”), as RefSeq does not correct the transcript to the genome build, while ENSEMBL transcripts (beginning with “ENST”) are corrected to match the nucleotides present in the genome build.
Two different conventions are used for exon naming.
Systematic Exon Numbering starts at 1 and counts and numbers each exon numerically. So, if there are 10 exons, for example, they will be numbered from 1 to 10.
Whereas Custom Exon Numbering is the historical numbering that was determined when the gene was first sequenced. The Custom Exon Numbering originally included splicing variants. For example, if there was a splicing difference, you could have an exon numbered 10a in one transcript and 10b in another transcript. With Systemic Exon Numbering, this exon would just be numbered 10 if it was the 10th exon. The Custom Exon Numbering usually comes from the original paper and/or the scientist that determined the sequence. This had previously been supported by NCBI but was discontinued several years ago in favor of Systematic Exon Numbering. However, researchers still use Custom Exon Numbering for genes such as BRCA1 and BRCA2.
In the AlamutTM Visual Plus toolbar, if you click on “Exon Naming” you can change between the two naming conventions. Alternatively, you can set the program to “use systematic exon numbering by default”. To do this, open the “Alamut Visual Plus” menu > “Preferences”, and then select this setting in the “View” tab.
Ensembl and RefSeq transcripts differ in that Ensembl transcripts are mapped onto the reference genome, whereas RefSeq transcripts are mapped onto mRNA sequences. Due to differences between reference genomes and individual mRNAs, some RefSeq mRNA’s might not map perfectly to the reference genome, resulting in the possibility of small differences between Ensembl and RefSeq transcripts. AlamutTM Visual Plus uses Splign (a tool developed by RefSeq) to align all transcripts to the genome build.
For more information please see: https://www.ensembl.org/Help/Faq?id=294
AlamutTM Visual Plus includes a splicing module accessible from the variant panel that integrates a number of prediction algorithms. It provides the user with automatically-computed prediction scores.
A brief description of splicing signal prediction can be found on page 63 available here: https://extranet.interactive-biosoftware.com/User%20Guide%20Alamut%20Visual%20Plus%20v1.6.1.pdf
AlamutTM Visual Plus computes splicing scores based on implemented algorithms. The user is responsible for interpreting these scores based on the scientific context and peer-reviewed guidelines.
We can suggest the following paper to help you interpret splicing scores: https://pubmed.ncbi.nlm.nih.gov/22505045/
And the following video from ClinGen: https://clinicalgenome.org/tools/educational-resources/materials/splicing-and-in-silico-splicing-predictors/
The user manual section on splicing (from pg 63) provides an overview of this topic, as well as links to publications and more splicing-related information: https://extranet.interactive-biosoftware.com/User%20Guide%20Alamut%20Visual%20Plus%20v1.6.1.pdf
The following paper also provides a good overview of this topic: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC275472/
The first sentence of the abstract states: “Cryptic splice sites are used only when use of a natural splice site is disrupted by mutation.”
Overall, the literature suggests that both natural and cryptic splice sites are important, but their relevance depends on the context of your investigation.
SHANK3 is quite a problematic gene. AlamutTM Visual Plus mapped the available transcripts to genome builds and NM_033517.1 could not be mapped to either genome build, due to mismatches between the transcript and builds.
We do, however, have a RefSeqGene NG_008607.2, which is based on the transcript NM_033517.1
SIFT scores are calculated based on multiple sequence alignments of protein orthologues (SIFT Aligned Sequences). Scores differ between builds 37 and 38, because MSA (Multiple Sequence Alignments) can be different depending on the genome used. MSA are accessible by clicking on the missense prediction tool button in the Variant Panel. PolyPhen-2 (in Alamut™ Visual Plus) does not use MSA, just the human protein sequence and the substitution information.
The following page explains possible reasons for inconsistencies between versions: http://genetics.bwh.harvard.edu/pph2/dokuwiki/faq. Our team is continuously working to harmonize and update the protein sequence database used to build the multiple sequence alignment by PolyPhen-2. This could explain the differences in the scores.
The Polyphen-2 prediction scores automatically displayed in AlamutTM Visual Plus are extracted from the WHESS database (http://genetics.bwh.harvard.edu/pph2/dbsearch.shtml). The WHESS database contains a pre-computed set of PolyPhen-2 predictions for the Whole Human Exome Sequence Space.
Whereas, the scores obtained on the site using the batch query (http://genetics.bwh.harvard.edu/pph2/bgi.shtml) are generated upon each request.
SIFT scores are calculated based on multiple sequence alignments of protein orthologues (SIFT Aligned Sequences). The scores can differ between genome builds 37 and 38 because MSA (Multiple Sequence Alignments) can differ depending on the genome used. Differences seen between the Alamut™ Visual Plus in-house predictors and the predictor website can be because the MSA differs and/or because the algorithm version differs between the website and the in-house versions.
In Alamut Visual and AlamutTM Visual Plus, the SIFT missense predictors are computed using the orthologues alignment. Differences in SIFT scores can be explained by differences in orthologue alignment. Alamut Visual contains ‘in-house’ orthologues for some genes, whereas AlamutTM Visual Plus contains Ensembl orthologues.
Why are there differences in the species used for conservation between genome builds 37 and 38? This makes the significance of the region look very different. Is there not a specific combination of species that you use as standard? Do these alignments feed into the predictions that are automatically computed?
The Orthologue Alignments for each gene are downloaded from Ensembl Compara (https://www.ensembl.org/info/genome/compara/index.html). Differences between GRCh37 and GRCh38 are due to the species used in the alignments not being the same in all cases. The data for GRCh38 are more up to date in Ensembl, but it is extremely difficult to determine which alignment is better. AlamutTM Visual Plus includes a standard set of species, but this is dependent on what is available in Ensembl Compara. These sequence alignments are used for missense predictions in AlamutTM Visual Plus.
Why is there a difference in nucleotide conservation scores between Alamut Visual and Alamut™ Visual Plus?
Alamut Visual uses conservation scores from UCSC: https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way, whereas, AlamutTM Visual Plus uses a more up-to-date set of conservation scores.
- Alamut Visual
- GRCh37 – 46 vertebrates (2013)
- GRCh38 – 100 vertebrates (2013)
- AlamutTM Visual Plus
- GRCh37 – 100 vertebrates (2018)
- GRCh38 – 100 vertebrates (2018)
Catalogs are updated at the same frequency as the release of those catalogs. For example, Clinvar is updated bi-monthly, whereas catalogs such as dbSNP are updated yearly, and other catalogs less frequently. GnomAD is updated when a release is available that will cover all AlamutTM Visual Plus requirements.
In AlamutTM Visual Plus, the limit fixed to trigger NMD is 53 nucleotides.
Users do not have direct access to the Alamut™ Visual Plus database and thus do not need to write SQL queries to access the data. All relevant data are available through the Alamut™ Visual Plus interface by clicking on different tabs.
Why is there a difference in mitochondrial sequences between Alamut Visual and Alamut™ Visual Plus?
This is related to mismatches that can exist between the transcript and reference genome. Alamut Visual only uses transcript data, while AlamutTM Visual Plus displays both transcript and reference genome sequences.
Several RefSeq transcript versions cannot be added to the Alamut™ Visual Plus database, due to significant mismatches with the reference genomes GRCh37 and GRCh38. Alamut™ Visual Plus uses Splign (https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi), a RefSeq alignment algorithm, to ensure that transcripts align successfully with the reference genomes before adding them into the Alamut™ Visual Plus reference database.
Can Alamut™Visual Plus visualize reads (BAM files) from the SOPHiA DDM™ Whole Exome Sequencing Solution (WES)?
It is possible to click on the “Alamut” button in the SOPHiA DDMTM Platform, or to download BAM files from WES (or from targeted panels or even from whole-genome sequencing). In AlamutTM Visual Plus, the BAM file will be loaded by segment and visualized by gene due to the index file (.bai) associated with the BAM file. The BAM file will not be entirely loaded for all genes at once, meaning that the BAM file can be of any size.
The nucleotide conservation track shows scores of evolutionarily conserved nucleotides based on phylogenetic studies between species. Nucleotide conservation scores are extracted from PhastCons statistical algorithms, represented by grey color. The red color means that the indicated value of one nucleotide is higher than what the height of the stick symbolizes. The threshold is fixed at 4 (as for the UCSC). These colors are visible when viewing values in the tooltip.
It is not possible to have more than one transcript related to a single variant in the same database. A new database will need to be created for a different transcript.
For reverse genes, why is there a difference in the deletion in genomic and transcript annotation? Why is the deletion in a repeat region displayed incorrectly?
AlamutTM Visual Plus applies the internationally recognized HGVS nomenclature. In the case of reverse genes, the 3’ rule is applied. For all descriptions, the most likely 3’ position of the reference sequence is arbitrarily assigned to have been changed. The 3’ rule also applies to changes in single residue stretches and tandem repeats (nucleotide or amino acid). The 3’ rule applies to ALL descriptions (genome, gene, transcript, and protein) of a given variant. See: http://varnomen.hgvs.org/recommendations/general/
How is variant creation in intergenic regions managed, and what causes us to see the message “No nearby genes are available for this query”?
When creating a new variant in an intergenic area, AlamutTM Visual Plus looks for the two closest genes (upstream and downstream). If no gene is found in a 10,000,000-nucleotide area around the variant position, the “No nearby genes are available” message is displayed.
The export of external annotation is allowed per variant from the Variant Panel or directly from the ‘Variant Exporter’ window.
It is not recommended to install AlamutTM Visual Plus on a shared drive because of performance issues and potential application instability. The best way to install AlamutTM Visual Plus is:
- Run AlamutTM Visual Plus executable (.exe) locally and do not store it on a shared drive.
- When installing the application, select a local settings folder.
- To share variant databases, one of your users can create a new Local Variant Database from the menu and choose to store it on a shared drive. This database can later be imported by any user that has access to this location.
- If a database is stored on a shared drive or disk, it should be flagged as “Shared Database”.
Why can some Sanger ab1 files not be read? Is there an alternative, if the base-called sequence is missing (PBAS tag)?
If the sequencer used to generate the ab1 files is Applied Biosystem 3130XL / ABI3130XL:
To load a Sanger file, AlamutTM Visual Plus uses the “PBAS” tag of the ABIF format. The “PBAS” tag contains the base-called sequence (i.e the nucleic sequence identified from the electropherogram).
Applied Biosystem 3130XL / ABI3130XL sequencer does not do the base calling step. In that case, sequence analysis software has to be used to do the base calling and to generate a Sanger file compatible with AlamutTM Visual Plus.
See here for more info about the ABIF format: https://projects.nfstc.org/workshops/resources/articles/ABIF_File_Format.pdf
The Orthologue Alignments for each gene are downloaded from Ensembl Compara ( https://www.ensembl.org/info/genome/compara/index.html). Differences between GRCh37 and GRCh38 are due to the species used in the alignments not being the same in all cases. The data for GRCh38 are more up to date in Ensembl, but it is extremely difficult to determine which alignment is better. Alamut™ Visual Plus includes a standard set of species, but this is dependent on what is available in Ensembl Compara. These sequence alignments are used for missense predictions in AlamutTM Visual Plus.
Use the following reference in your publications: Alamut™️ Visual Plus version 1.6.1, SOPHiA GENETICS™️.
For any further questions about Alamut™ Visual Plus, do not hesitate to contact us:
Page last updated: October, 2022.