Skip to content

Commit 044cc97

Browse files
authored
publication update 5
1 parent c46c269 commit 044cc97

1 file changed

Lines changed: 8 additions & 8 deletions

File tree

publication-data.json

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -905,10 +905,10 @@
905905
"lab_member": false
906906
}
907907
],
908-
"title": "A recurrent sequencing artifact on Illumina sequencers with two-color fluorescent dye chemistry and its impact on somatic variant detection",
909-
"title_link": "https://www.biorxiv.org/content/10.1101/2025.09.27.678978v1",
910-
"abstract": "Background: The sequencing-by-synthesis technology by Illumina, Inc. enables efficient and scalable readouts of mutations from genomic data. To enhance sequencing speed and efficiency, Illumina has shifted from the four-color base calling chemistry of the HiSeq series to a two-color fluorescent dye chemistry in the NovaSeq series. Benchmarking sequencing artifacts due to biases in the newer chemistry is important to evaluate the quality of identified mutations. Results: We re-analyzed a series of whole-genome sequencing experiments in which the same samples were sequenced on the NovaSeq 6000 (two-color) and HiSeq X10 (four-color) platforms by independent groups. In several samples, we observed a higher frequency of T-to-G and A-to-C substitutions ('T>G') at the read level for NovaSeq 6000 versus HiSeq X10. As the per-base error rate is still low, the artifactual substitutions have a negligible effect in identifying germline or high variant allele frequency (VAF) somatic mutations. However, such errors can confound the detection of low-VAF somatic variants in high-depth sequencing samples, particularly in studies of mosaic mutations in normal tissues, where variants have low read support and are called without a matched normal. The artifactual T>G variant calls disproportionately occur at NT[TG] trinucleotides, and we leveraged this observation to bioinformatically reduce the T>G excess in somatic mutation callsets. Conclusions: We identified a recurrent artifact specific to the Illumina two-color chemistry platform on the NovaSeq 6000 with the potential to contaminate low-VAF somatic mutation calls. Thus, an unexpected enrichment of T>G mutations in mosaicism studies warrants caution. Keywords: Illumina NovaSeq 6000; Next-generation sequencing; mosaic mutations; sequencing artifacts; somatic mutations.",
911-
"abstract_link": "https://compbio.hms.harvard.edu/publications/a-recurrent-sequencing-artifact-on-illumina-sequencers-with-two-color-fluorescent-dye-chemistry-and-its-impacts-on-somatic-variant-detection",
908+
"title": "Comprehensive benchmarking of somatic structural variant detection at ultra-low allele fractions",
909+
"title_link": "https://www.biorxiv.org/content/10.1101/2025.09.18.677206v1",
910+
"abstract": "Postzygotic mosaicism gives rise to somatic structural variants (SVs) at ultra-low variant allele fractions (VAFs), which pose challenges for detection due to the high-coverage sequencing required and noise introduced by sequencing artifacts. Although somatic SV detection has been extensively studied in cancer, these studies are not directly applicable to the study of tissue mosaicism, as they rely on matched normals, target higher VAF ranges, and are enriched for different types of SVs. We present comprehensive benchmark data and best practices for non-cancer somatic SV detection. We created a synthetic mosaic sample by combining six HapMap individuals at varying proportions, generating allele fractions as low as 0.25%. This sample was sequenced to ~2,300x total coverage using Illumina, PacBio, and Nanopore technologies across multiple sequencing centers. A high-confidence benchmark SV set containing over 21,000 pseudo-somatic insertions and deletions ≥50bp was derived from haplotype-resolved assemblies. We evaluated 12 SV discovery pipelines and identified caller-specific strengths and sequencing platform-specific shortcomings. We find that short read-based approaches show reduced recall for insertions and repeat-associated SVs, whereas long-read sequencing achieves high accuracy throughout the genome, increasing linearly with coverage. The best algorithm's sensitivity exceeded 80% for VAFs ≥4% and 15% for VAFs of 0.5-1% with 60x coverage. The publicly available benchmarking data and comparative analysis of current methods provide a foundation for robust discovery of SV mosaicism in non-cancer tissues.",
911+
"abstract_link": "https://compbio.hms.harvard.edu/publications/comprehensive-benchmarking-of-somatic-structural-variant-detection-at-ultra-low-allele-fractions",
912912
"year": "2025",
913913
"type": "2025",
914914
"journal": "bioRxiv",
@@ -937,10 +937,10 @@
937937
"lab_member": true
938938
}
939939
],
940-
"title": "Comprehensive benchmarking of somatic structural variant detection at ultra-low allele fractions",
941-
"title_link": "https://www.biorxiv.org/content/10.1101/2025.09.18.677206v1",
942-
"abstract": "Postzygotic mosaicism gives rise to somatic structural variants (SVs) at ultra-low variant allele fractions (VAFs), which pose challenges for detection due to the high-coverage sequencing required and noise introduced by sequencing artifacts. Although somatic SV detection has been extensively studied in cancer, these studies are not directly applicable to the study of tissue mosaicism, as they rely on matched normals, target higher VAF ranges, and are enriched for different types of SVs. We present comprehensive benchmark data and best practices for non-cancer somatic SV detection. We created a synthetic mosaic sample by combining six HapMap individuals at varying proportions, generating allele fractions as low as 0.25%. This sample was sequenced to ~2,300x total coverage using Illumina, PacBio, and Nanopore technologies across multiple sequencing centers. A high-confidence benchmark SV set containing over 21,000 pseudo-somatic insertions and deletions ≥50bp was derived from haplotype-resolved assemblies. We evaluated 12 SV discovery pipelines and identified caller-specific strengths and sequencing platform-specific shortcomings. We find that short read-based approaches show reduced recall for insertions and repeat-associated SVs, whereas long-read sequencing achieves high accuracy throughout the genome, increasing linearly with coverage. The best algorithm's sensitivity exceeded 80% for VAFs ≥4% and 15% for VAFs of 0.5-1% with 60x coverage. The publicly available benchmarking data and comparative analysis of current methods provide a foundation for robust discovery of SV mosaicism in non-cancer tissues.",
943-
"abstract_link": "https://compbio.hms.harvard.edu/publications/comprehensive-benchmarking-of-somatic-structural-variant-detection-at-ultra-low-allele-fractions",
940+
"title": "A recurrent sequencing artifact on Illumina sequencers with two-color fluorescent dye chemistry and its impact on somatic variant detection",
941+
"title_link": "https://www.biorxiv.org/content/10.1101/2025.09.27.678978v1",
942+
"abstract": "Background: The sequencing-by-synthesis technology by Illumina, Inc. enables efficient and scalable readouts of mutations from genomic data. To enhance sequencing speed and efficiency, Illumina has shifted from the four-color base calling chemistry of the HiSeq series to a two-color fluorescent dye chemistry in the NovaSeq series. Benchmarking sequencing artifacts due to biases in the newer chemistry is important to evaluate the quality of identified mutations. Results: We re-analyzed a series of whole-genome sequencing experiments in which the same samples were sequenced on the NovaSeq 6000 (two-color) and HiSeq X10 (four-color) platforms by independent groups. In several samples, we observed a higher frequency of T-to-G and A-to-C substitutions ('T>G') at the read level for NovaSeq 6000 versus HiSeq X10. As the per-base error rate is still low, the artifactual substitutions have a negligible effect in identifying germline or high variant allele frequency (VAF) somatic mutations. However, such errors can confound the detection of low-VAF somatic variants in high-depth sequencing samples, particularly in studies of mosaic mutations in normal tissues, where variants have low read support and are called without a matched normal. The artifactual T>G variant calls disproportionately occur at NT[TG] trinucleotides, and we leveraged this observation to bioinformatically reduce the T>G excess in somatic mutation callsets. Conclusions: We identified a recurrent artifact specific to the Illumina two-color chemistry platform on the NovaSeq 6000 with the potential to contaminate low-VAF somatic mutation calls. Thus, an unexpected enrichment of T>G mutations in mosaicism studies warrants caution. Keywords: Illumina NovaSeq 6000; Next-generation sequencing; mosaic mutations; sequencing artifacts; somatic mutations.",
943+
"abstract_link": "https://compbio.hms.harvard.edu/publications/a-recurrent-sequencing-artifact-on-illumina-sequencers-with-two-color-fluorescent-dye-chemistry-and-its-impacts-on-somatic-variant-detection",
944944
"year": "2025",
945945
"type": "2025",
946946
"journal": "bioRxiv",

0 commit comments

Comments
 (0)