Q: How are GnomAD filters used within AlamutTM Visual Plus?

By default, Alamut TM Visual Plus includes all available variants from GnomAD, irrespective of the filters or the filter cut-off. There are three filters - the "PASS", RF, and AC0 filters. Each of these filters has cut-offs defined by gnomAD , for both Exome and Genome data. If you only want to view variants that have passed through all the quality filters, tick the box "PASS only" on the gnomAD track for Alamut TM Visual Plus, deciding if you want to see Exome or Genome data, or both.

Question 1

How can we see the gnomAD dataset in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

gnomAD data are displayed in the Allele Frequency Databases track. Right-clicking on the variant of interest reveals the variant panel with details from gnomAD shown in a box within Alamut^TM Visual Plus. At the top of the gnomAD box you will see Genome and Exome tabs, which, if selected, are highlighted in blue and information is shown in the box. Clicking on the hyperlinked “gnomAD (vx.x.x)” tab will take you to the gnomAD page for the variant. Note that depending on the type of data and which filters are used, the text will be different, but it will start with gnomAD.

Question 2

How are GnomAD filters used within AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

By default, Alamut^TM Visual Plus includes all available variants from GnomAD, irrespective of the filters or the filter cut-off.

There are three filters - the "PASS", RF, and AC0 filters. Each of these filters has cut-offs defined by gnomAD, for both Exome and Genome data. If you only want to view variants that have passed through all the quality filters, tick the box "PASS only" on the gnomAD track for Alamut^TM Visual Plus, deciding if you want to see Exome or Genome data, or both.

Question 3

What version of gnomAD is used in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

gnomAD provides the following recommendation on which version to use: "gnomAD v2 is still our recommended dataset for most coding regions analyses."

Alamut^TM Visual Plus currently uses Genome Aggregation Database Version 2.1. The link to this version of gnomAD is: https://gnomad.broadinstitute.org/variant/1-55516888-G-GA?dataset=gnomad_r2_1.

The new version of gnomAD (version 3.1) was released at the end of 2020 and is not currently available in Alamut^TM Visual Plus. gnomAD 3.1 is currently only available for the GRCh38/hg38 genome build and only for whole genome data. We are assessing this dataset but do not have a defined deadline for implementation in Alamut^TM Visual Plus.

Question 4

Should I switch to the latest version of gnomAD?

ext-mcalland@sophiagenetics.com · Accepted Answer

The gnomAD v2 call-set contains fewer whole genomes than v3.1, but also contains a very large number of exomes that substantially increase its power as a reference in coding regions. Therefore, gnomAD v2 is still our recommended dataset for most coding regions analyses. However, gnomAD v3.1 represents a very large increase in the number of genomes and may be more suitable if your primary interest is in non-coding regions, or if your coding region of interest is poorly captured in the gnomAD exomes. This can be assessed using the coverage plots in the gnomAD browser. Most genomes in v2.1.1 are included in v3.1 and should therefore not be considered independent sample sets. We are currently assessing the v3.1 dataset, but do not have a defined deadline for implementation in Alamut^TM Visual Plus.

Another consideration when choosing which gnomAD dataset to use is the ancestry of the samples that you are interested in. gnomAD v3.1 contains a substantially larger number of African American samples than v2 (exomes and genomes) and for the first time provides allele frequencies for the Amish population. gnomAD v3.1 also has a fully genotyped call-set available from the Human Genome Diversity Project and the 1000 Genomes Project, representing >60 distinct populations.

Finally, gnomAD v3.1 was mapped to GRCh38. So, if your data is on this build, it probably makes sense to switch to v3.1. There is also a liftover version of gnomAD v2.1.1 onto GRCh38 available. gnomAD plans to produce a larger GRCh38-aligned exome call set in 2022.

Question 5

Why do links to gnomAD not work with some variants?

Accepted Answer

gnomAD links are built based on the position, ref, and alt of the variant. In AlamutTM Visual Plus, if we apply the 3' rule, those values may change, which directly affects the url functionality.

Question 6

How can we have a negative genotype count with gnomAD?

ext-mcalland@sophiagenetics.com · Accepted Answer

We compute Genotype Count based on values from a gnomAD VCF. In some cases, we end up with negative counts due to discrepancies in the initial values provided in the VCF. This is the way gnomAD is handling these variants.

Question 7

Why is the link to dbSNP on the GRCh37 build missing?

ext-mcalland@sophiagenetics.com · Accepted Answer

RefSeq displays GRCh38 as a default genome for all rsID links. On dbSNP variant pages, links for the GRCh37 genome build are usually available.

Question 8

Why are there sometimes mismatches between genes and transcripts (nucleotides highlighted in red on the transcript track)?

ext-mcalland@sophiagenetics.com · Accepted Answer

This is due to occasional genome/transcript sequence discrepancies, where the genome reference includes polymorphism minor alleles, but the transcript includes corresponding major alleles. This means that some genomic variants are seen as ‘non-variants’ if analyzed at the transcript level.

Basically, at the positions highlighted in red, the nucleotide of the transcript differs from the nucleotide of the genome build (GRCh37 or GRCh38). For these nucleotides, it is more difficult to definitively determine whether a variant is indeed a variant.

These discrepancies mainly occur in RefSeq transcripts (Beginning with “NM_”), as RefSeq does not correct the transcript to the genome build, while ENSEMBL transcripts (beginning with “ENST”) are corrected to match the nucleotides present in the genome build.

Question 9

Why are there differences in exon naming for some transcripts/genes?

ext-mcalland@sophiagenetics.com · Accepted Answer

Two different conventions are used for exon naming.
Systematic Exon Numbering starts at 1 and counts and numbers each exon numerically. So, if there are 10 exons, for example, they will be numbered from 1 to 10.
Whereas Custom Exon Numbering is the historical numbering that was determined when the gene was first sequenced. The Custom Exon Numbering originally included splicing variants. For example, if there was a splicing difference, you could have an exon numbered 10a in one transcript and 10b in another transcript. With Systemic Exon Numbering, this exon would just be numbered 10 if it was the 10th exon. The Custom Exon Numbering usually comes from the original paper and/or the scientist that determined the sequence. This had previously been supported by NCBI but was discontinued several years ago in favor of Systematic Exon Numbering. However, researchers still use Custom Exon Numbering for genes such as BRCA1 and BRCA2.
In the AlamutTM Visual Plus toolbar, if you click on "Exon Naming" you can change between the two naming conventions. Alternatively, you can set the program to "use systematic exon numbering by default". To do this, open the “Alamut Visual Plus” menu > "Preferences", and then select this setting in the “View” tab.

Question 10

Why are there differences in exon naming for some transcripts/genes?

ext-mcalland@sophiagenetics.com · Accepted Answer

Two different conventions are used for exon naming.
Systematic Exon Numbering starts at 1 and counts and numbers each exon numerically. So, if there are 10 exons, for example, they will be numbered from 1 to 10.
Whereas Custom Exon Numbering is the historical numbering that was determined when the gene was first sequenced. The Custom Exon Numbering originally included splicing variants. For example, if there was a splicing difference, you could have an exon numbered 10a in one transcript and 10b in another transcript. With Systemic Exon Numbering, this exon would just be numbered 10 if it was the 10th exon. The Custom Exon Numbering usually comes from the original paper and/or the scientist that determined the sequence. This had previously been supported by NCBI but was discontinued several years ago in favor of Systematic Exon Numbering. However, researchers still use Custom Exon Numbering for genes such as BRCA1 and BRCA2.
In the AlamutTM Visual Plus toolbar, if you click on "Exon Naming" you can change between the two naming conventions. Alternatively, you can set the program to "use systematic exon numbering by default". To do this, open the “Alamut Visual Plus” menu > "Preferences", and then select this setting in the “View” tab.

Question 11

Why are there differences between RefSeq and Ensembl transcripts and exons?

ext-mcalland@sophiagenetics.com · Accepted Answer

Ensembl and RefSeq transcripts differ in that Ensembl transcripts are mapped onto the reference genome, whereas RefSeq transcripts are mapped onto mRNA sequences. Due to differences between reference genomes and individual mRNAs, some RefSeq mRNA’s might not map perfectly to the reference genome, resulting in the possibility of small differences between Ensembl and RefSeq transcripts. AlamutTM Visual Plus uses Splign (a tool developed by RefSeq) to align all transcripts to the genome build.
For more information please see: https://www.ensembl.org/Help/Faq?id=294

Question 12

How are splicing scores interpreted in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

Alamut^TM Visual Plus includes a splicing module accessible from the variant panel that integrates a number of prediction algorithms. It provides the user with automatically-computed prediction scores.
A brief description of splicing signal prediction can be found on page 63 available here: https://extranet.interactive-biosoftware.com/User%20Guide%20Alamut%20Visual%20Plus%20v1.6.1.pdf

Alamut^TM Visual Plus computes splicing scores based on implemented algorithms. The user is responsible for interpreting these scores based on the scientific context and peer-reviewed guidelines.
We can suggest the following paper to help you interpret splicing scores: https://pubmed.ncbi.nlm.nih.gov/22505045/

And the followingvideo from ClinGen: https://clinicalgenome.org/tools/educational-resources/materials/splicing-and-in-silico-splicing-predictors/

Question 13

Should I consider cryptic or natural splice sites, or both?

ext-mcalland@sophiagenetics.com · Accepted Answer

The user manual section on splicing (from pg 63) provides an overview of this topic, as well as links to publications and more splicing-related information: https://extranet.interactive-biosoftware.com/User%20Guide%20Alamut%20Visual%20Plus%20v1.6.1.pdf

The following paper also provides a good overview of this topic: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC275472/
The first sentence of the abstract states: "Cryptic splice sites are used only when use of a natural splice site is disrupted by mutation."

Overall, the literature suggests that both natural and cryptic splice sites are important, but their relevance depends on the context of your investigation.

Question 14

Why does the gene SHANK3 not show transcript NM_033517.1?

ext-mcalland@sophiagenetics.com · Accepted Answer

SHANK3 is quite a problematic gene. Alamut^TM Visual Plus mapped the available transcripts to genome builds and NM_033517.1 could not be mapped to either genome build, due to mismatches between the transcript and builds.

We do, however, have a RefSeqGene NG_008607.2, which is based on the transcript NM_033517.1

Question 15

How are SIFT, and other missense prediction scores calculated?

ext-mcalland@sophiagenetics.com · Accepted Answer

SIFT scores are calculated based on multiple sequence alignments of protein orthologues (SIFT Aligned Sequences). Scores differ between builds 37 and 38, because MSA (Multiple sequence Alignments) can be different depending on the genome used. MSA are accessible by clicking on the missense prediction tool button in the Variant Panel. PolyPhen-2 (in AlamutTM Visual Plus) does not use MSA, just the human protein sequence and the substitution information.

Question 16

What versions of missense prediction tools are available in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

SIFT 6.2.0
Polyphen2

Question 17

Why are there inconsistencies in scores between versions of Polyphen-2?

ext-mcalland@sophiagenetics.com · Accepted Answer

The following page explains possible reasons for inconsistencies between versions: http://genetics.bwh.harvard.edu/pph2/dokuwiki/faq. Our team is continuously working to harmonize and update the protein sequence database used to build the multiple sequence alignment by PolyPhen-2. This could explain the differences in the scores.

Question 18

Why do Polyphen-2 prediction scores differ when generated automatically versus manually?

ext-mcalland@sophiagenetics.com · Accepted Answer

The Polyphen-2 prediction scores automatically displayed in AlamutTM Visual Plus are extracted from the WHESS database (http://genetics.bwh.harvard.edu/pph2/dbsearch.shtml). The WHESS database contains a pre-computed set of PolyPhen-2 predictions for the Whole Human Exome Sequence Space.
Whereas, the scores obtained on the site using the batch query (http://genetics.bwh.harvard.edu/pph2/bgi.shtml) are generated upon each request.

Question 19

Why do SIFT scores differ when generated automatically versus manually?

ext-mcalland@sophiagenetics.com · Accepted Answer

SIFT scores are calculated based on multiple sequence alignments of protein orthologues (SIFT Aligned Sequences). The scores can differ between genome builds 37 and 38 because MSA (Multiple sequence Alignments) can differ depending on the genome used. Differences seen between the Alamut^TM Visual Plus in-house predictors and the predictor website can be because the MSA differs and/or because the algorithm version differs between the website and the in-house versions.

Question 20

Why are there differences in SIFT scores between Alamut Visual and AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

In Alamut Visual and AlamutTM Visual Plus, the SIFT missense predictors are computed using the orthologues alignment. Differences in SIFT scores can be explained by differences in orthologue alignment. Alamut Visual contains 'in-house' orthologues for some genes, whereas AlamutTM Visual Plus contains Ensembl orthologues.

Question 21

Why are there differences in the species used for conservation between genome builds 37 and 38? This makes the significance of the region look very different. Is there not a specific combination of species that you use as standard? Do these alignments feed into the predictions that are automatically computed?

ext-mcalland@sophiagenetics.com · Accepted Answer

The Orthologue Alignments for each gene are downloaded from Ensembl Compara (https://www.ensembl.org/info/genome/compara/index.html). Differences between GRCh37 and GRCh38 are due to the species used in the alignments not being the same in all cases. The data for GRCh38 are more up to date in Ensembl, but it is extremely difficult to determine which alignment is better. AlamutTM Visual Plus includes a standard set of species, but this is dependent on what is available in Ensembl Compara. These sequence alignments are used for missense predictions in AlamutTM Visual Plus.

Question 22

How are orthologs aligned in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

See answer 21.

Question 23

How are orthologs aligned in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

See answer 21.

Question 24

Why is there a difference in nucleotide conservation scores between Alamut Visual and AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

Alamut Visual uses conservation scores from UCSC: https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way, whereas, AlamutTM Visual Plus uses a more up-to-date set of conservation scores.

Alamut Visual
- PhastCons
  - GRCh37 - 46 vertebrates (2013)
  - GRCh38 - 100 vertebrates (2013)
AlamutTM Visual Plus
- PhastCons
  - GRCh37 - 100 vertebrates (2018)
  - GRCh38 - 100 vertebrates (2018)

Question 25

How frequently do we update each catalog?

ext-mcalland@sophiagenetics.com · Accepted Answer

Catalogs are updated at the same frequency as the release of those catalogs. For example, Clinvar is updated bi-monthly, whereas catalogs such as dbSNP are updated yearly, and other catalogs less frequently. GnomAD is updated when a release is available that will cover all AlamutTM Visual Plus requirements.

Question 26

How is nonsense-mediated mRNA decay (NMD) prediction determined?

ext-mcalland@sophiagenetics.com · Accepted Answer

In AlamutTM Visual Plus, the limit fixed to trigger NMD is 53 nucleotides.

Question 27

Do users need access to the SQL database to use AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

Users do not have direct access to the AlamutTM Visual Plus database and thus do not need to write SQL queries to access the data. All relevant data are available through the AlamutTM Visual Plus interface by clicking on different tabs.

Question 28

Why is there a difference in mitochondrial sequences between Alamut Visual and AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

This is related to mismatches that can exist between the transcript and reference genome. Alamut Visual only uses transcript data, while AlamutTM Visual Plus displays both transcript and reference genome sequences.

Question 29

Why can some transcript versions not be added to the AlamutTM Visual Plus database?

ext-mcalland@sophiagenetics.com · Accepted Answer

Several RefSeq transcript versions cannot be added to the Alamut^TM Visual Plus database, due to significant mismatches with the reference genomes GRCh37 and GRCh38. Alamut^TM Visual Plus uses Splign (https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi), a RefSeq alignment algorithm, to ensure that transcripts align successfully with the reference genomes before adding them into the Alamut^TM Visual Plus reference database.

Question 30

Can AlamutTM Visual Plus visualize reads (BAM files) from the SOPHiA DDMTM Whole Exome Sequencing Solution (WES)?

ext-mcalland@sophiagenetics.com · Accepted Answer

It is possible to click on the “Alamut” button in the SOPHiA DDM^TMPlatform, or to download BAM files from WES (or from targeted panels or even from whole-genome sequencing). In Alamut^TM Visual Plus, the BAM file will be loaded by segment and visualized by gene due to the index file (.bai) associated with the BAM file. The BAM file will not be entirely loaded for all genes at once, meaning that the BAM file can be of any size.

Question 31

What is the meaning of the colors in the nucleotide conservation track?

ext-mcalland@sophiagenetics.com · Accepted Answer

The nucleotide conservation track shows scores of evolutionarily conserved nucleotides based on phylogenetic studies between species. Nucleotide conservation scores are extracted from PhastCons statistical algorithms, represented by grey color. The red color means that the indicated value of one nucleotide is higher than what the height of the stick symbolizes. The threshold is fixed at 4 (as for the UCSC). These colors are visible when viewing values in the tooltip.

Question 32

Is it possible to save variants with different transcripts in the same local variant database?

ext-mcalland@sophiagenetics.com · Accepted Answer

It is not possible to have more than one transcript related to a single variant in the same database. A new database will need to be created for a different transcript.

Question 33

For reverse genes, why is there a difference in the deletion in genomic and transcript annotation? Why is the deletion in a repeat region displayed incorrectly?

ext-mcalland@sophiagenetics.com · Accepted Answer

Alamut^TM Visual Plus applies the internationally recognized HGVS nomenclature. In the case of reverse genes, the 3’ rule is applied. For all descriptions, the most likely 3’ position of the reference sequence is arbitrarily assigned to have been changed. The 3’ rule also applies to changes in single residue stretches and tandem repeats (nucleotide or amino acid). The 3’ rule applies to ALL descriptions (genome, gene, transcript, and protein) of a given variant. See: http://varnomen.hgvs.org/recommendations/general/

Question 34

How is variant creation in intergenic regions managed, and what causes us to see the message “No nearby genes are available for this query”?

ext-mcalland@sophiagenetics.com · Accepted Answer

When creating a new variant in an intergenic area, Alamut^TM Visual Plus looks for the two closest genes (upstream and downstream). If no gene is found in a 10,000,000-nucleotide area around the variant position, the “No nearby genes are available” message is displayed.

Question 35

Can I export external annotation for several variants from AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

The export of external annotation is allowed per variant from the Variant Panel or directly from the 'Variant Exporter' window.

Question 36

Could we install the application on a shared drive, and would it impact the user experience?

ext-mcalland@sophiagenetics.com · Accepted Answer

It is not recommended to install Alamut^TM Visual Plus on a shared drive because of performance issues and potential application instability. The best way to install Alamut^TM Visual Plus is:

Run Alamut^TM Visual Plus executable (.exe) locally and do not store it on a shared drive.
When installing the application, select a local settings folder.
To share variant databases, one of your users can create a new Local Variant Database from the menu and choose to store it on a shared drive. This database can later be imported by any user that has access to this location.
- If a database is stored on a shared drive or disk, it should be flagged as “Shared Database”.

Question 37

Why can some Sanger ab1 files not be read? Is there an alternative, if the base-called sequence is missing (PBAS tag)?

ext-mcalland@sophiagenetics.com · Accepted Answer

If the sequencer used to generate the ab1 files is Applied Biosystem 3130XL / ABI3130XL:

To load a Sanger file, Alamut^TM Visual Plus uses the "PBAS" tag of the ABIF format. The "PBAS" tag contains the base-called sequence (i.e the nucleic sequence identified from the electropherogram).

Applied Biosystem 3130XL / ABI3130XL sequencer does not do the base calling step. In that case, sequence analysis software has to be used to do the base calling and to generate a Sanger file compatible with Alamut^TM Visual Plus.

See here for more info about the ABIF format: https://projects.nfstc.org/workshops/resources/articles/ABIF_File_Format.pdf

Alamut™ Visual Plus FAQs

Alamut™ Visual Plus (37)

How can we see the gnomAD dataset in AlamutTM Visual Plus?

How are GnomAD filters used within AlamutTM Visual Plus?

What version of gnomAD is used in AlamutTM Visual Plus?

Should I switch to the latest version of gnomAD?

Why do links to gnomAD not work with some variants?

How can we have a negative genotype count with gnomAD?

Why is the link to dbSNP on the GRCh37 build missing?

Why are there sometimes mismatches between genes and transcripts (nucleotides highlighted in red on the transcript track)?

Why are there differences in exon naming for some transcripts/genes?

Why are there differences in exon naming for some transcripts/genes?

Why are there differences between RefSeq and Ensembl transcripts and exons?

How are splicing scores interpreted in AlamutTM Visual Plus?

Should I consider cryptic or natural splice sites, or both?

Why does the gene SHANK3 not show transcript NM_033517.1?

How are SIFT, and other missense prediction scores calculated?

What versions of missense prediction tools are available in AlamutTM Visual Plus?

Why are there inconsistencies in scores between versions of Polyphen-2?

Why do Polyphen-2 prediction scores differ when generated automatically versus manually?

Why do SIFT scores differ when generated automatically versus manually?

Why are there differences in SIFT scores between Alamut Visual and AlamutTM Visual Plus?

How are orthologs aligned in AlamutTM Visual Plus?

How are orthologs aligned in AlamutTM Visual Plus?

Why is there a difference in nucleotide conservation scores between Alamut Visual and AlamutTM Visual Plus?

How frequently do we update each catalog?

How is nonsense-mediated mRNA decay (NMD) prediction determined?

Do users need access to the SQL database to use AlamutTM Visual Plus?

Why is there a difference in mitochondrial sequences between Alamut Visual and AlamutTM Visual Plus?

Why can some transcript versions not be added to the AlamutTM Visual Plus database?

Can AlamutTM Visual Plus visualize reads (BAM files) from the SOPHiA DDMTM Whole Exome Sequencing Solution (WES)?

What is the meaning of the colors in the nucleotide conservation track?

Is it possible to save variants with different transcripts in the same local variant database?

For reverse genes, why is there a difference in the deletion in genomic and transcript annotation? Why is the deletion in a repeat region displayed incorrectly?

How is variant creation in intergenic regions managed, and what causes us to see the message “No nearby genes are available for this query”?

Can I export external annotation for several variants from AlamutTM Visual Plus?

Could we install the application on a shared drive, and would it impact the user experience?

Why can some Sanger ab1 files not be read? Is there an alternative, if the base-called sequence is missing (PBAS tag)?

Need Support? Get in touch with us.

Need Support?

Get in touch with us.