TY - JOUR
T1 - Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny
AU - IMSSC Laboratory Network Consortium
AU - Hunt, Martin
AU - Hinrichs, Angie S.
AU - Anderson, Daniel
AU - Karim, Lily
AU - Dearlove, Bethany L.
AU - Knaggs, Jeff
AU - Constantinides, Bede
AU - Fowler, Philip W.
AU - Rodger, Gillian
AU - Street, Teresa
AU - Lumley, Sheila
AU - Webster, Hermione
AU - Sanderson, Theo
AU - Ruis, Christopher
AU - Kotzen, Benjamin
AU - de Maio, Nicola
AU - Amenga-Etego, Lucas N.
AU - Amuzu, Dominic S.Y.
AU - Avaro, Martin
AU - Awandare, Gordon A.
AU - Ayivor-Djanie, Reuben
AU - Barkham, Timothy
AU - Bashton, Matthew
AU - Batty, Elizabeth M.
AU - Bediako, Yaw
AU - De Belder, Denise
AU - Benedetti, Estefania
AU - Bergthaler, Andreas
AU - Boers, Stefan A.
AU - Campos, Josefina
AU - Carr, Rosina Afua Ampomah
AU - Chen, Yuan Yi Constance
AU - Cuba, Facundo
AU - Dattero, Maria Elena
AU - Dejnirattisai, Wanwisa
AU - Dilthey, Alexander
AU - Duedu, Kwabena Obeng
AU - Endler, Lukas
AU - Engelmann, Ilka
AU - Francisco, Ngiambudulu M.
AU - Fuchs, Jonas
AU - Gnimpieba, Etienne Z.
AU - Groc, Soraya
AU - Gyamfi, Jones
AU - Heemskerk, Dennis
AU - Houwaart, Torsten
AU - Hsiao, Nei Yuan
AU - Huska, Matthew
AU - Hölzer, Martin
AU - Quashie, Peter K.
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2026/3
Y1 - 2026/3
N2 - The majority of SARS-CoV-2 genomes obtained during the pandemic were derived by amplifying overlapping windows of the genome (‘tiled amplicons’), reconstructing their sequences and fitting them together. This leads to systematic errors in genomes unless the software is both aware of the amplicon scheme and of the error modes of amplicon sequencing. Additionally, over time, amplicon schemes need to be updated as new mutations in the virus interfere with the primer binding sites at the end of amplicons. Thus, waves of variants swept the world during the pandemic and were followed by waves of systematic errors in the genomes, which had significant impacts on the inferred phylogenetic tree. Here we reconstruct the genomes from all public data as of June 2024 using an assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed to rigorously process amplicon sequence data. With these high-quality consensus sequences we provide a global phylogenetic tree of 4,471,579 samples, viewable at https://viridian.taxonium.org. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny.
AB - The majority of SARS-CoV-2 genomes obtained during the pandemic were derived by amplifying overlapping windows of the genome (‘tiled amplicons’), reconstructing their sequences and fitting them together. This leads to systematic errors in genomes unless the software is both aware of the amplicon scheme and of the error modes of amplicon sequencing. Additionally, over time, amplicon schemes need to be updated as new mutations in the virus interfere with the primer binding sites at the end of amplicons. Thus, waves of variants swept the world during the pandemic and were followed by waves of systematic errors in the genomes, which had significant impacts on the inferred phylogenetic tree. Here we reconstruct the genomes from all public data as of June 2024 using an assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed to rigorously process amplicon sequence data. With these high-quality consensus sequences we provide a global phylogenetic tree of 4,471,579 samples, viewable at https://viridian.taxonium.org. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny.
UR - https://www.scopus.com/pages/publications/105029877354
U2 - 10.1038/s41592-025-02947-1
DO - 10.1038/s41592-025-02947-1
M3 - Article
C2 - 41663577
AN - SCOPUS:105029877354
SN - 1548-7091
VL - 23
SP - 653
EP - 662
JO - Nature Methods
JF - Nature Methods
IS - 3
ER -