There are several reasons why the SOPHiA DDM™ Platform may classify a variant as “low-confidence”.
1. The variant may occur in a low complexity region (e.g. homopolymers or heteropolymers). Such regions adversely affect amplification (polymerase slippage) and hinder reliable variant calling. Variants in other challenging regions from the reference genome (e.g., high/low GC, long tandem repeats, regions with high homology to other regions in the genome, …) can be assigned a “problematic region” tag, as can variants from regions leading to noises specific to the sequencing platform or the NGS chemistry.
2. The variant may have a variant fraction lower than expected (germline) or lower than what can be confidently called with a statistical test (somatic). Variants with low variant fractions are filtered as “low_variant_fraction”.
3. A variant with low coverage is filtered with a “low_coverage” tag. Many germline solutions give warnings for regions covered with less than 50x and any variant detected in a region with less than 30x will be classified as low-confidence. The exact thresholds may vary between solutions.
4. Variants outside the target region of the solution are filtered as ‘off-target’. For capture-based solutions, many off-target variants may be observed in regions flanking the target region that still have sufficient coverage for reliable variant calling.
5. INDELs in long homopolymers are filtered as ‘homopolymer_region’. Long homopolymers impede reliable variant calling due to experimental artefacts like polymerase slippage leading to elevated sequencing errors. A high error rate in homopolymers is often observed on most sequencing platforms. Therefore, any INDEL identified in homopolymers greater than a certain length (exact length cut-offs are solution-specific) are filtered as “homopolymer_region”.