Q: How are GnomAD filters used within AlamutTM Visual Plus?

By default, Alamut TM Visual Plus includes all available variants from GnomAD, irrespective of the filters or the filter cut-off. There are three filters - the "PASS", RF, and AC0 filters. Each of these filters has cut-offs defined by gnomAD , for both Exome and Genome data. If you only want to view variants that have passed through all the quality filters, tick the box "PASS only" on the gnomAD track for Alamut TM Visual Plus, deciding if you want to see Exome or Genome data, or both.

Question 1

How can we see the gnomAD dataset in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

gnomAD data are displayed in the Allele Frequency Databases track. Right-clicking on the variant of interest reveals the variant panel with details from gnomAD shown in a box within Alamut^TM Visual Plus. At the top of the gnomAD box you will see Genome and Exome tabs, which, if selected, are highlighted in blue and information is shown in the box. Clicking on the hyperlinked “gnomAD (vx.x.x)” tab will take you to the gnomAD page for the variant. Note that depending on the type of data and which filters are used, the text will be different, but it will start with gnomAD.

Question 2

How are GnomAD filters used within AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

By default, Alamut^TM Visual Plus includes all available variants from GnomAD, irrespective of the filters or the filter cut-off.

There are three filters - the "PASS", RF, and AC0 filters. Each of these filters has cut-offs defined by gnomAD, for both Exome and Genome data. If you only want to view variants that have passed through all the quality filters, tick the box "PASS only" on the gnomAD track for Alamut^TM Visual Plus, deciding if you want to see Exome or Genome data, or both.

Question 3

What version of gnomAD is used in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

AlamutTM Visual Plus currently uses Genome Aggregation Database Version 2.1 for GRCh37/hg19 genome build. The link to this version of gnomAD is: https://gnomad.broadinstitute.org/variant/1-55516888-G-GA?dataset=gnomad_r2_1 .
For GRCh38/hg38 genome build, gnomAD 4.0.0 is now available in AlamutTM Visual Plus. The link to this version of gnomAD is: https://gnomad.broadinstitute.org/variant/1-55051215-G-GA?dataset=gnomad_r4

Question 4

Should I switch to the latest version of gnomAD?

ext-mcalland@sophiagenetics.com · Accepted Answer

With 730,947 exomes and 76,215 genomes, the gnomAD v4.1 call set is the largest of all the gnomAD versions. It also contains nearly all data from prior versions (v2 and v3) except a small number of samples excluded due to data quality and updated sample filtering pipelines. Since gnomAD v4.1 is mapped to GRCh38, if you haven’t switched to GRCh38, now is the time!

Question 5

Why do links to gnomAD not work with some variants?

Accepted Answer

gnomAD links are built based on the position, ref, and alt of the variant. In AlamutTM Visual Plus, if we apply the 3' rule, those values may change, which directly affects the url functionality.

Question 6

How can we have a negative genotype count with gnomAD?

ext-mcalland@sophiagenetics.com · Accepted Answer

We compute Genotype Count based on values from a gnomAD VCF. In some cases, we end up with negative counts due to discrepancies in the initial values provided in the VCF. This is the way gnomAD is handling these variants.

Question 7

Why is the link to dbSNP on the GRCh37 build missing?

ext-mcalland@sophiagenetics.com · Accepted Answer

RefSeq displays GRCh38 as a default genome for all rsID links. On dbSNP variant pages, links for the GRCh37 genome build are usually available.

Question 8

Why are there sometimes mismatches between genes and transcripts (nucleotides highlighted in red on the transcript track)?

ext-mcalland@sophiagenetics.com · Accepted Answer

This is due to occasional genome/transcript sequence discrepancies, where the genome reference includes polymorphism minor alleles, but the transcript includes corresponding major alleles. This means that some genomic variants are seen as ‘non-variants’ if analyzed at the transcript level.

Basically, at the positions highlighted in red, the nucleotide of the transcript differs from the nucleotide of the genome build (GRCh37 or GRCh38). For these nucleotides, it is more difficult to definitively determine whether a variant is indeed a variant.

These discrepancies mainly occur in RefSeq transcripts (Beginning with “NM_”), as RefSeq does not correct the transcript to the genome build, while ENSEMBL transcripts (beginning with “ENST”) are corrected to match the nucleotides present in the genome build.

Question 9

Why are there differences in exon naming for some transcripts/genes?

ext-mcalland@sophiagenetics.com · Accepted Answer

Two different conventions are used for exon naming.
Systematic Exon Numbering starts at 1 and counts and numbers each exon numerically. So, if there are 10 exons, for example, they will be numbered from 1 to 10.
Whereas Custom Exon Numbering is the historical numbering that was determined when the gene was first sequenced. The Custom Exon Numbering originally included splicing variants. For example, if there was a splicing difference, you could have an exon numbered 10a in one transcript and 10b in another transcript. With Systemic Exon Numbering, this exon would just be numbered 10 if it was the 10th exon. The Custom Exon Numbering usually comes from the original paper and/or the scientist that determined the sequence. This had previously been supported by NCBI but was discontinued several years ago in favor of Systematic Exon Numbering. However, researchers still use Custom Exon Numbering for genes such as BRCA1 and BRCA2.
In the AlamutTM Visual Plus toolbar, if you click on "Exon Naming" you can change between the two naming conventions. Alternatively, you can set the program to "use systematic exon numbering by default". To do this, open the “Alamut Visual Plus” menu > "Preferences", and then select this setting in the “View” tab.

Question 10

Why are there differences in exon naming for some transcripts/genes?

ext-mcalland@sophiagenetics.com · Accepted Answer

Two different conventions are used for exon naming.
Systematic Exon Numbering starts at 1 and counts and numbers each exon numerically. So, if there are 10 exons, for example, they will be numbered from 1 to 10.
Whereas Custom Exon Numbering is the historical numbering that was determined when the gene was first sequenced. The Custom Exon Numbering originally included splicing variants. For example, if there was a splicing difference, you could have an exon numbered 10a in one transcript and 10b in another transcript. With Systemic Exon Numbering, this exon would just be numbered 10 if it was the 10th exon. The Custom Exon Numbering usually comes from the original paper and/or the scientist that determined the sequence. This had previously been supported by NCBI but was discontinued several years ago in favor of Systematic Exon Numbering. However, researchers still use Custom Exon Numbering for genes such as BRCA1 and BRCA2.
In the AlamutTM Visual Plus toolbar, if you click on "Exon Naming" you can change between the two naming conventions. Alternatively, you can set the program to "use systematic exon numbering by default". To do this, open the “Alamut Visual Plus” menu > "Preferences", and then select this setting in the “View” tab.

Question 11

Why are there differences between RefSeq and Ensembl transcripts and exons?

ext-mcalland@sophiagenetics.com · Accepted Answer

Ensembl and RefSeq transcripts differ in that Ensembl transcripts are mapped onto the reference genome, whereas RefSeq transcripts are mapped onto mRNA sequences. Due to differences between reference genomes and individual mRNAs, some RefSeq mRNA’s might not map perfectly to the reference genome, resulting in the possibility of small differences between Ensembl and RefSeq transcripts. AlamutTM Visual Plus uses Splign (a tool developed by RefSeq) to align all transcripts to the genome build.
For more information please see: https://www.ensembl.org/Help/Faq?id=294

Question 12

How are splicing scores interpreted in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

Alamut^TM Visual Plus includes a splicing module accessible from the variant panel that integrates a number of prediction algorithms. It provides the user with automatically-computed prediction scores.
A brief description of splicing signal prediction can be found on page 63 available here: https://extranet.interactive-biosoftware.com/User%20Guide%20Alamut%20Visual%20Plus%20v1.6.1.pdf

Alamut^TM Visual Plus computes splicing scores based on implemented algorithms. The user is responsible for interpreting these scores based on the scientific context and peer-reviewed guidelines.
We can suggest the following paper to help you interpret splicing scores: https://pubmed.ncbi.nlm.nih.gov/22505045/

And the followingvideo from ClinGen: https://clinicalgenome.org/tools/educational-resources/materials/splicing-and-in-silico-splicing-predictors/

Question 13

Should I consider cryptic or natural splice sites, or both?

ext-mcalland@sophiagenetics.com · Accepted Answer

The user manual section on splicing (from pg 63) provides an overview of this topic, as well as links to publications and more splicing-related information: https://extranet.interactive-biosoftware.com/User%20Guide%20Alamut%20Visual%20Plus%20v1.6.1.pdf

The following paper also provides a good overview of this topic: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC275472/
The first sentence of the abstract states: "Cryptic splice sites are used only when use of a natural splice site is disrupted by mutation."

Overall, the literature suggests that both natural and cryptic splice sites are important, but their relevance depends on the context of your investigation.

Question 14

Why does the gene SHANK3 not show transcript NM_033517.1?

ext-mcalland@sophiagenetics.com · Accepted Answer

SHANK3 is quite a problematic gene. Alamut^TM Visual Plus mapped the available transcripts to genome builds and NM_033517.1 could not be mapped to either genome build, due to mismatches between the transcript and builds.

We do, however, have a RefSeqGene NG_008607.2, which is based on the transcript NM_033517.1

Question 15

How are SIFT, and other missense prediction scores calculated?

ext-mcalland@sophiagenetics.com · Accepted Answer

SIFT scores are calculated based on multiple sequence alignments of protein orthologues (SIFT Aligned Sequences). Scores differ between builds 37 and 38, because MSA (Multiple sequence Alignments) can be different depending on the genome used. MSA are accessible by clicking on the missense prediction tool button in the Variant Panel. PolyPhen-2 (in AlamutTM Visual Plus) does not use MSA, just the human protein sequence and the substitution information.

Question 16

What versions of missense prediction tools are available in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

SIFT 6.2.0
Polyphen2

Question 17

Why are there inconsistencies in scores between versions of Polyphen-2?

ext-mcalland@sophiagenetics.com · Accepted Answer

The following page explains possible reasons for inconsistencies between versions: http://genetics.bwh.harvard.edu/pph2/dokuwiki/faq. Our team is continuously working to harmonize and update the protein sequence database used to build the multiple sequence alignment by PolyPhen-2. This could explain the differences in the scores.

Question 18

Why do Polyphen-2 prediction scores differ when generated automatically versus manually?

ext-mcalland@sophiagenetics.com · Accepted Answer

The Polyphen-2 prediction scores automatically displayed in AlamutTM Visual Plus are extracted from the WHESS database (http://genetics.bwh.harvard.edu/pph2/dbsearch.shtml). The WHESS database contains a pre-computed set of PolyPhen-2 predictions for the Whole Human Exome Sequence Space.
Whereas, the scores obtained on the site using the batch query (http://genetics.bwh.harvard.edu/pph2/bgi.shtml) are generated upon each request.

Question 19

Why do SIFT scores differ when generated automatically versus manually?

ext-mcalland@sophiagenetics.com · Accepted Answer

SIFT scores are calculated based on multiple sequence alignments of protein orthologues (SIFT Aligned Sequences). The scores can differ between genome builds 37 and 38 because MSA (Multiple sequence Alignments) can differ depending on the genome used. Differences seen between the Alamut^TM Visual Plus in-house predictors and the predictor website can be because the MSA differs and/or because the algorithm version differs between the website and the in-house versions.

Question 20

Why are there differences in SIFT scores between Alamut Visual and AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

In Alamut Visual and AlamutTM Visual Plus, the SIFT missense predictors are computed using the orthologues alignment. Differences in SIFT scores can be explained by differences in orthologue alignment. Alamut Visual contains 'in-house' orthologues for some genes, whereas AlamutTM Visual Plus contains Ensembl orthologues.

Question 21

Why are there differences in the species used for conservation between genome builds 37 and 38? This makes the significance of the region look very different. Is there not a specific combination of species that you use as standard? Do these alignments feed into the predictions that are automatically computed?

ext-mcalland@sophiagenetics.com · Accepted Answer

The Orthologue Alignments for each gene are downloaded from Ensembl Compara (https://www.ensembl.org/info/genome/compara/index.html). Differences between GRCh37 and GRCh38 are due to the species used in the alignments not being the same in all cases. The data for GRCh38 are more up to date in Ensembl, but it is extremely difficult to determine which alignment is better. AlamutTM Visual Plus includes a standard set of species, but this is dependent on what is available in Ensembl Compara. These sequence alignments are used for missense predictions in AlamutTM Visual Plus.

Question 22

How are orthologs aligned in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

See answer 21.

Question 23

How are orthologs aligned in AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

See answer 21.

Question 24

Why is there a difference in nucleotide conservation scores between Alamut Visual and AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

Alamut Visual uses conservation scores from UCSC: https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons46way, whereas, AlamutTM Visual Plus uses a more up-to-date set of conservation scores.

Alamut Visual
- PhastCons
  - GRCh37 - 46 vertebrates (2013)
  - GRCh38 - 100 vertebrates (2013)
AlamutTM Visual Plus
- PhastCons
  - GRCh37 - 100 vertebrates (2018)
  - GRCh38 - 100 vertebrates (2018)

Question 25

How frequently do we update each catalog?

ext-mcalland@sophiagenetics.com · Accepted Answer

Catalogs are updated at the same frequency as the release of those catalogs. For example, Clinvar is updated bi-monthly, whereas catalogs such as dbSNP are updated yearly, and other catalogs less frequently. GnomAD is updated when a release is available that will cover all AlamutTM Visual Plus requirements.

Question 26

How is nonsense-mediated mRNA decay (NMD) prediction determined?

ext-mcalland@sophiagenetics.com · Accepted Answer

In AlamutTM Visual Plus, the limit fixed to trigger NMD is 50 nucleotides.

Question 27

Do users need access to the SQL database to use AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

Users do not have direct access to the AlamutTM Visual Plus database and thus do not need to write SQL queries to access the data. All relevant data are available through the AlamutTM Visual Plus interface by clicking on different tabs.

Question 28

Why is there a difference in mitochondrial sequences between Alamut Visual and AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

This is related to mismatches that can exist between the transcript and reference genome. Alamut Visual only uses transcript data, while AlamutTM Visual Plus displays both transcript and reference genome sequences.

Question 29

Why can some transcript versions not be added to the AlamutTM Visual Plus database?

ext-mcalland@sophiagenetics.com · Accepted Answer

Several RefSeq transcript versions cannot be added to the Alamut^TM Visual Plus database, due to significant mismatches with the reference genomes GRCh37 and GRCh38. Alamut^TM Visual Plus uses Splign (https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi), a RefSeq alignment algorithm, to ensure that transcripts align successfully with the reference genomes before adding them into the Alamut^TM Visual Plus reference database.

Question 30

Can AlamutTM Visual Plus visualize reads (BAM files) from the SOPHiA DDMTM Whole Exome Sequencing Solution (WES)?

ext-mcalland@sophiagenetics.com · Accepted Answer

It is possible to click on the “Alamut” button in the SOPHiA DDM^TMPlatform, or to download BAM files from WES (or from targeted panels or even from whole-genome sequencing). In Alamut^TM Visual Plus, the BAM file will be loaded by segment and visualized by gene due to the index file (.bai) associated with the BAM file. The BAM file will not be entirely loaded for all genes at once, meaning that the BAM file can be of any size.

Question 31

What is the meaning of the colors in the nucleotide conservation track?

ext-mcalland@sophiagenetics.com · Accepted Answer

The nucleotide conservation track shows scores of evolutionarily conserved nucleotides based on phylogenetic studies between species. Nucleotide conservation scores are extracted from PhastCons statistical algorithms, represented by grey color. The red color means that the indicated value of one nucleotide is higher than what the height of the stick symbolizes. The threshold is fixed at 4 (as for the UCSC). These colors are visible when viewing values in the tooltip.

Question 32

Is it possible to save variants with different transcripts in the same local variant database?

ext-mcalland@sophiagenetics.com · Accepted Answer

It is not possible to have more than one transcript related to a single variant in the same database. A new database will need to be created for a different transcript.

Question 33

For reverse genes, why is there a difference in the deletion in genomic and transcript annotation? Why is the deletion in a repeat region displayed incorrectly?

ext-mcalland@sophiagenetics.com · Accepted Answer

Alamut^TM Visual Plus applies the internationally recognized HGVS nomenclature. In the case of reverse genes, the 3’ rule is applied. For all descriptions, the most likely 3’ position of the reference sequence is arbitrarily assigned to have been changed. The 3’ rule also applies to changes in single residue stretches and tandem repeats (nucleotide or amino acid). The 3’ rule applies to ALL descriptions (genome, gene, transcript, and protein) of a given variant. See: http://varnomen.hgvs.org/recommendations/general/

Question 34

How is variant creation in intergenic regions managed, and what causes us to see the message “No nearby genes are available for this query”?

ext-mcalland@sophiagenetics.com · Accepted Answer

When creating a new variant in an intergenic area, Alamut^TM Visual Plus looks for the two closest genes (upstream and downstream). If no gene is found in a 10,000,000-nucleotide area around the variant position, the “No nearby genes are available” message is displayed.

Question 35

Can I export external annotation for several variants from AlamutTM Visual Plus?

ext-mcalland@sophiagenetics.com · Accepted Answer

The export of external annotation is allowed per variant from the Variant Panel or directly from the 'Variant Exporter' window.

Question 36

Could we install the application on a shared drive, and would it impact the user experience?

ext-mcalland@sophiagenetics.com · Accepted Answer

It is not recommended to install Alamut^TM Visual Plus on a shared drive because of performance issues and potential application instability. The best way to install Alamut^TM Visual Plus is:

Run Alamut^TM Visual Plus executable (.exe) locally and do not store it on a shared drive.
When installing the application, select a local settings folder.
To share variant databases, one of your users can create a new Local Variant Database from the menu and choose to store it on a shared drive. This database can later be imported by any user that has access to this location.
- If a database is stored on a shared drive or disk, it should be flagged as “Shared Database”.

Question 37

Why can some Sanger ab1 files not be read? Is there an alternative, if the base-called sequence is missing (PBAS tag)?

ext-mcalland@sophiagenetics.com · Accepted Answer

If the sequencer used to generate the ab1 files is Applied Biosystem 3130XL / ABI3130XL:

To load a Sanger file, Alamut^TM Visual Plus uses the "PBAS" tag of the ABIF format. The "PBAS" tag contains the base-called sequence (i.e the nucleic sequence identified from the electropherogram).

Applied Biosystem 3130XL / ABI3130XL sequencer does not do the base calling step. In that case, sequence analysis software has to be used to do the base calling and to generate a Sanger file compatible with Alamut^TM Visual Plus.

See here for more info about the ABIF format: https://projects.nfstc.org/workshops/resources/articles/ABIF_File_Format.pdf

Alamut™ Visual Plus FAQs

Alamut™ Visual Plus (37)

How can we see the gnomAD dataset in AlamutTM Visual Plus?

How are GnomAD filters used within AlamutTM Visual Plus?

What version of gnomAD is used in AlamutTM Visual Plus?

Should I switch to the latest version of gnomAD?

Why do links to gnomAD not work with some variants?

How can we have a negative genotype count with gnomAD?

Why is the link to dbSNP on the GRCh37 build missing?

Why are there sometimes mismatches between genes and transcripts (nucleotides highlighted in red on the transcript track)?

Why are there differences in exon naming for some transcripts/genes?

Why are there differences in exon naming for some transcripts/genes?

Why are there differences between RefSeq and Ensembl transcripts and exons?

How are splicing scores interpreted in AlamutTM Visual Plus?

Should I consider cryptic or natural splice sites, or both?

Why does the gene SHANK3 not show transcript NM_033517.1?

How are SIFT, and other missense prediction scores calculated?

What versions of missense prediction tools are available in AlamutTM Visual Plus?

Why are there inconsistencies in scores between versions of Polyphen-2?

Why do Polyphen-2 prediction scores differ when generated automatically versus manually?

Why do SIFT scores differ when generated automatically versus manually?

Why are there differences in SIFT scores between Alamut Visual and AlamutTM Visual Plus?

How are orthologs aligned in AlamutTM Visual Plus?

How are orthologs aligned in AlamutTM Visual Plus?

Why is there a difference in nucleotide conservation scores between Alamut Visual and AlamutTM Visual Plus?

How frequently do we update each catalog?

How is nonsense-mediated mRNA decay (NMD) prediction determined?

Do users need access to the SQL database to use AlamutTM Visual Plus?

Why is there a difference in mitochondrial sequences between Alamut Visual and AlamutTM Visual Plus?

Why can some transcript versions not be added to the AlamutTM Visual Plus database?

Can AlamutTM Visual Plus visualize reads (BAM files) from the SOPHiA DDMTM Whole Exome Sequencing Solution (WES)?

What is the meaning of the colors in the nucleotide conservation track?

Is it possible to save variants with different transcripts in the same local variant database?

For reverse genes, why is there a difference in the deletion in genomic and transcript annotation? Why is the deletion in a repeat region displayed incorrectly?

How is variant creation in intergenic regions managed, and what causes us to see the message “No nearby genes are available for this query”?

Can I export external annotation for several variants from AlamutTM Visual Plus?

Could we install the application on a shared drive, and would it impact the user experience?

Why can some Sanger ab1 files not be read? Is there an alternative, if the base-called sequence is missing (PBAS tag)?

Need Support? Get in touch with us.

Need Support?

Get in touch with us.