There are several reasons why the SOPHiA Platform may classify a variant as “low-confidence”.
1. The variant may occur in a low complexity region (e.g. homopolymers or heteropolymers). Such regions adversely affect amplification (polymerase slippage) and hinder reliable variant calling. Variants in these regions are filtered with a “problematic_region” tag.
2. The variant may have a variant fraction lower than expected (germline) or lower than what can be confidently called with a statistical test (somatic). Variants with low variant fractions are filtered as “ow_variant_fraction”.
3. A variant with low coverage is filtered with a “low_coverage” tag. Many germline solutions give warnings for regions covered with less than 50x and any variant detected in a region with less than 30x will be classified as low-confidence. The exact thresholds may vary between solutions.
4. In some somatic solutions, a minimum number of supporting reads (usually at least 50) for the alternate sequence is required for confident variant calling. Variants that were detected but are not supported by at least 50 reads are filtered as “low_alt_coverage”.
5. Variants outside the target region of the solution are filtered as ‘off-target’. For capture-based solutions, many off-target variants may be observed in regions flanking the target region that still have sufficient coverage for reliable variant calling.
6. INDELs in long homopolymers are filtered as ‘homopolymer_region’. Long homopolymers impede reliable variant calling due to experimental artefacts like polymerase slippage leading to elevated sequencing errors. A high error rate in homopolymers is often observed on most sequencing platforms. Therefore, any INDEL identified in homopolymers greater than a certain length (exact length cut-offs are solution-specific) are filtered as “homopolymer_region”.