Skip to main navigation Skip to search Skip to main content

Mycobacterium tuberculosis complex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type strain h37rv

  • C. N'dira Sanoussi
  • , Mireia Coscolla
  • , Boatema Ofori-Anyinam
  • , Isaac Darko Otchere
  • , Martin Antonio
  • , Stefan Niemann
  • , Julian Parkhill
  • , Simon Harris
  • , Dorothy Yeboah-Manu
  • , Sebastien Gagneux
  • , Leen Rigouts
  • , Dissou Affolabi
  • , Bouke C. de Jong
  • , Conor J. Meehan
  • Laboratoire de Référence des Mycobactéries
  • Institute of Tropical Medicine Antwerp
  • University of Antwerp
  • University of Valencia
  • Food and Drugs Authority
  • Rutgers - The State University of New Jersey, New Brunswick
  • University of Ghana
  • London School of Hygiene & Tropical Medicine
  • German Center for Infection Research
  • Research Center Borstel - Leibniz Lung Center
  • Wellcome Sanger Institute
  • Department of Veterinary Medicine
  • Swiss Tropical and Public Health Institute Swiss TPH
  • University of Basel
  • University of Bradford

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)

Abstract

Pathogens of the Mycobacterium tuberculosis complex (MTBC) are considered to be monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate strains of the different MTBC lineages (L), especially L5 and L6 (traditionally termed Mycobacterium africanum) strains, from each other. However, this genome variability and gene content, especially of L5 strains, has not been fully explored and may be important for pathobiology and current approaches for genomic analysis of MTBC strains, including transmission studies. By comparing the genomes of 355 L5 clinical strains (including 3 complete genomes and 352 Illumina whole-genome sequenced isolates) to each other and to H37Rv, we identified multiple genes that were differentially present or absent between H37Rv and L5 strains. Additionally, considerable gene content variability was found across L5 strains, including a split in the L5.3 sub-lineage into L5.3.1 and L5.3.2. These gene content differences had a small knock-on effect on transmission cluster estimation, with clustering rates influenced by the selected reference genome, and with potential overestimation of recent transmission when using H37Rv as the reference genome. We conclude that full capture of the gene diversity, especially high-resolution outbreak analysis, requires a variation of the single H37Rv-centric reference genome mapping approach currently used in most whole-genome sequencing data analysis pipelines. Moreover, the high within-lineage gene content variability suggests that the pan-genome of M. tuberculosis is at least several kilobases larger than previously thought, implying that a concatenated or reference-free genome assembly (de novo) approach may be needed for particular questions.

Original languageEnglish
Article number000437
JournalMicrobial Genomics
Volume7
Issue number7
DOIs
Publication statusPublished - 2021
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Gene presence/absence
  • Genomic diversity
  • H37Rv
  • L5.3.2
  • Lineage 5
  • M. africanum
  • Reference genome
  • Within-lineage variability

Fingerprint

Dive into the research topics of 'Mycobacterium tuberculosis complex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type strain h37rv'. Together they form a unique fingerprint.

Cite this