Systematic prediction of functionally linked genes in bacterial and archaeal genomes

Sergey A. Shmakov, Guilhem Faure, Kira S. Makarova, Yuri I. Wolf, Konstantin V. Severinov, Eugene V. Koonin

Research output: Contribution to journalArticle

Abstract

Functionally linked genes in bacterial and archaeal genomes are often organized into operons. However, the composition and architecture of operons are highly variable and frequently differ even among closely related genomes. Therefore, to efficiently extract reliable functional predictions for uncharacterized genes from comparative analyses of the rapidly growing genomic databases, dedicated computational approaches are required. We developed a protocol to systematically and automatically identify genes that are likely to be functionally associated with a ‘bait’ gene or locus by using relevance metrics. Given a set of bait loci and a genomic database defined by the user, this protocol compares the genomic neighborhoods of the baits to identify genes that are likely to be functionally linked to the baits by calculating the abundance of a given gene within and outside the bait neighborhoods and the distance to the bait. We exemplify the performance of the protocol with three test cases, namely, genes linked to CRISPR–Cas systems using the ‘CRISPRicity’ metric, genes associated with archaeal proviruses and genes linked to Argonaute genes in halobacteria. The protocol can be run by users with basic computational skills. The computational cost depends on the sizes of the genomic dataset and the list of reference loci and can vary from one CPU-hour to hundreds of hours on a supercomputer.

Original languageEnglish (US)
Pages (from-to)3013-3031
Number of pages19
JournalNature Protocols
Volume14
Issue number10
DOIs
StatePublished - Oct 1 2019

Fingerprint

Archaeal Genome
Bacterial Genomes
Genes
Operon
Network protocols
Archaeal Genes
Euryarchaeota
Databases
Proviruses
Supercomputers

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Shmakov, Sergey A. ; Faure, Guilhem ; Makarova, Kira S. ; Wolf, Yuri I. ; Severinov, Konstantin V. ; Koonin, Eugene V. / Systematic prediction of functionally linked genes in bacterial and archaeal genomes. In: Nature Protocols. 2019 ; Vol. 14, No. 10. pp. 3013-3031.
@article{b4695dd71aef49b090e059f29540e9be,
title = "Systematic prediction of functionally linked genes in bacterial and archaeal genomes",
abstract = "Functionally linked genes in bacterial and archaeal genomes are often organized into operons. However, the composition and architecture of operons are highly variable and frequently differ even among closely related genomes. Therefore, to efficiently extract reliable functional predictions for uncharacterized genes from comparative analyses of the rapidly growing genomic databases, dedicated computational approaches are required. We developed a protocol to systematically and automatically identify genes that are likely to be functionally associated with a ‘bait’ gene or locus by using relevance metrics. Given a set of bait loci and a genomic database defined by the user, this protocol compares the genomic neighborhoods of the baits to identify genes that are likely to be functionally linked to the baits by calculating the abundance of a given gene within and outside the bait neighborhoods and the distance to the bait. We exemplify the performance of the protocol with three test cases, namely, genes linked to CRISPR–Cas systems using the ‘CRISPRicity’ metric, genes associated with archaeal proviruses and genes linked to Argonaute genes in halobacteria. The protocol can be run by users with basic computational skills. The computational cost depends on the sizes of the genomic dataset and the list of reference loci and can vary from one CPU-hour to hundreds of hours on a supercomputer.",
author = "Shmakov, {Sergey A.} and Guilhem Faure and Makarova, {Kira S.} and Wolf, {Yuri I.} and Severinov, {Konstantin V.} and Koonin, {Eugene V.}",
year = "2019",
month = "10",
day = "1",
doi = "10.1038/s41596-019-0211-1",
language = "English (US)",
volume = "14",
pages = "3013--3031",
journal = "Nature Protocols",
issn = "1754-2189",
publisher = "Nature Publishing Group",
number = "10",

}

Systematic prediction of functionally linked genes in bacterial and archaeal genomes. / Shmakov, Sergey A.; Faure, Guilhem; Makarova, Kira S.; Wolf, Yuri I.; Severinov, Konstantin V.; Koonin, Eugene V.

In: Nature Protocols, Vol. 14, No. 10, 01.10.2019, p. 3013-3031.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Systematic prediction of functionally linked genes in bacterial and archaeal genomes

AU - Shmakov, Sergey A.

AU - Faure, Guilhem

AU - Makarova, Kira S.

AU - Wolf, Yuri I.

AU - Severinov, Konstantin V.

AU - Koonin, Eugene V.

PY - 2019/10/1

Y1 - 2019/10/1

N2 - Functionally linked genes in bacterial and archaeal genomes are often organized into operons. However, the composition and architecture of operons are highly variable and frequently differ even among closely related genomes. Therefore, to efficiently extract reliable functional predictions for uncharacterized genes from comparative analyses of the rapidly growing genomic databases, dedicated computational approaches are required. We developed a protocol to systematically and automatically identify genes that are likely to be functionally associated with a ‘bait’ gene or locus by using relevance metrics. Given a set of bait loci and a genomic database defined by the user, this protocol compares the genomic neighborhoods of the baits to identify genes that are likely to be functionally linked to the baits by calculating the abundance of a given gene within and outside the bait neighborhoods and the distance to the bait. We exemplify the performance of the protocol with three test cases, namely, genes linked to CRISPR–Cas systems using the ‘CRISPRicity’ metric, genes associated with archaeal proviruses and genes linked to Argonaute genes in halobacteria. The protocol can be run by users with basic computational skills. The computational cost depends on the sizes of the genomic dataset and the list of reference loci and can vary from one CPU-hour to hundreds of hours on a supercomputer.

AB - Functionally linked genes in bacterial and archaeal genomes are often organized into operons. However, the composition and architecture of operons are highly variable and frequently differ even among closely related genomes. Therefore, to efficiently extract reliable functional predictions for uncharacterized genes from comparative analyses of the rapidly growing genomic databases, dedicated computational approaches are required. We developed a protocol to systematically and automatically identify genes that are likely to be functionally associated with a ‘bait’ gene or locus by using relevance metrics. Given a set of bait loci and a genomic database defined by the user, this protocol compares the genomic neighborhoods of the baits to identify genes that are likely to be functionally linked to the baits by calculating the abundance of a given gene within and outside the bait neighborhoods and the distance to the bait. We exemplify the performance of the protocol with three test cases, namely, genes linked to CRISPR–Cas systems using the ‘CRISPRicity’ metric, genes associated with archaeal proviruses and genes linked to Argonaute genes in halobacteria. The protocol can be run by users with basic computational skills. The computational cost depends on the sizes of the genomic dataset and the list of reference loci and can vary from one CPU-hour to hundreds of hours on a supercomputer.

UR - http://www.scopus.com/inward/record.url?scp=85072718544&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072718544&partnerID=8YFLogxK

U2 - 10.1038/s41596-019-0211-1

DO - 10.1038/s41596-019-0211-1

M3 - Article

C2 - 31520072

AN - SCOPUS:85072718544

VL - 14

SP - 3013

EP - 3031

JO - Nature Protocols

JF - Nature Protocols

SN - 1754-2189

IS - 10

ER -