Large-Scale Structure-Based Prediction and Identification of Novel Protease Substrates Using Computational Protein Design

Manasi A. Pethe, Aliza B. Rubenstein, Sagar Khare

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Characterizing the substrate specificity of protease enzymes is critical for illuminating the molecular basis of their diverse and complex roles in a wide array of biological processes. Rapid and accurate prediction of their extended substrate specificity would also aid in the design of custom proteases capable of selectively and controllably cleaving biotechnologically or therapeutically relevant targets. However, current in silico approaches for protease specificity prediction, rely on, and are therefore limited by, machine learning of sequence patterns in known experimental data. Here, we describe a general approach for predicting peptidase substrates de novo using protein structure modeling and biophysical evaluation of enzyme–substrate complexes. We construct atomic resolution models of thousands of candidate substrate–enzyme complexes for each of five model proteases belonging to the four major protease mechanistic classes—serine, cysteine, aspartyl, and metallo-proteases—and develop a discriminatory scoring function using enzyme design modules from Rosetta and AMBER's MMPBSA. We rank putative substrates based on calculated interaction energy with a modeled near-attack conformation of the enzyme active site. We show that the energetic patterns obtained from these simulations can be used to robustly rank and classify known cleaved and uncleaved peptides and that these structural-energetic patterns have greater discriminatory power compared to purely sequence-based statistical inference. Combining sequence and energetic patterns using machine-learning algorithms further improves classification performance, and analysis of structural models provides physical insight into the structural basis for the observed specificities. We further tested the predictive capability of the model by designing and experimentally characterizing the cleavage of four novel substrate motifs for the hepatitis C virus NS3/4 protease using an in vivo assay. The presented structure-based approach is generalizable to other protease enzymes with known or modeled structures, and complements existing experimental methods for specificity determination.

Original languageEnglish (US)
Pages (from-to)220-236
Number of pages17
JournalJournal of molecular biology
Volume429
Issue number2
DOIs
StatePublished - Jan 20 2017

Fingerprint

Peptide Hydrolases
Proteins
Enzymes
Substrate Specificity
Biological Phenomena
Structural Models
Computer Simulation
Cysteine
Catalytic Domain
Peptides
Machine Learning

All Science Journal Classification (ASJC) codes

  • Molecular Biology

Keywords

  • Rosetta software
  • computational modeling
  • proteases
  • specificity prediction
  • substrate specificity

Cite this

@article{1970b331102a477ba43bec2ff7ac244d,
title = "Large-Scale Structure-Based Prediction and Identification of Novel Protease Substrates Using Computational Protein Design",
abstract = "Characterizing the substrate specificity of protease enzymes is critical for illuminating the molecular basis of their diverse and complex roles in a wide array of biological processes. Rapid and accurate prediction of their extended substrate specificity would also aid in the design of custom proteases capable of selectively and controllably cleaving biotechnologically or therapeutically relevant targets. However, current in silico approaches for protease specificity prediction, rely on, and are therefore limited by, machine learning of sequence patterns in known experimental data. Here, we describe a general approach for predicting peptidase substrates de novo using protein structure modeling and biophysical evaluation of enzyme–substrate complexes. We construct atomic resolution models of thousands of candidate substrate–enzyme complexes for each of five model proteases belonging to the four major protease mechanistic classes—serine, cysteine, aspartyl, and metallo-proteases—and develop a discriminatory scoring function using enzyme design modules from Rosetta and AMBER's MMPBSA. We rank putative substrates based on calculated interaction energy with a modeled near-attack conformation of the enzyme active site. We show that the energetic patterns obtained from these simulations can be used to robustly rank and classify known cleaved and uncleaved peptides and that these structural-energetic patterns have greater discriminatory power compared to purely sequence-based statistical inference. Combining sequence and energetic patterns using machine-learning algorithms further improves classification performance, and analysis of structural models provides physical insight into the structural basis for the observed specificities. We further tested the predictive capability of the model by designing and experimentally characterizing the cleavage of four novel substrate motifs for the hepatitis C virus NS3/4 protease using an in vivo assay. The presented structure-based approach is generalizable to other protease enzymes with known or modeled structures, and complements existing experimental methods for specificity determination.",
keywords = "Rosetta software, computational modeling, proteases, specificity prediction, substrate specificity",
author = "Pethe, {Manasi A.} and Rubenstein, {Aliza B.} and Sagar Khare",
year = "2017",
month = "1",
day = "20",
doi = "10.1016/j.jmb.2016.11.031",
language = "English (US)",
volume = "429",
pages = "220--236",
journal = "Journal of Molecular Biology",
issn = "0022-2836",
publisher = "Academic Press Inc.",
number = "2",

}

Large-Scale Structure-Based Prediction and Identification of Novel Protease Substrates Using Computational Protein Design. / Pethe, Manasi A.; Rubenstein, Aliza B.; Khare, Sagar.

In: Journal of molecular biology, Vol. 429, No. 2, 20.01.2017, p. 220-236.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Large-Scale Structure-Based Prediction and Identification of Novel Protease Substrates Using Computational Protein Design

AU - Pethe, Manasi A.

AU - Rubenstein, Aliza B.

AU - Khare, Sagar

PY - 2017/1/20

Y1 - 2017/1/20

N2 - Characterizing the substrate specificity of protease enzymes is critical for illuminating the molecular basis of their diverse and complex roles in a wide array of biological processes. Rapid and accurate prediction of their extended substrate specificity would also aid in the design of custom proteases capable of selectively and controllably cleaving biotechnologically or therapeutically relevant targets. However, current in silico approaches for protease specificity prediction, rely on, and are therefore limited by, machine learning of sequence patterns in known experimental data. Here, we describe a general approach for predicting peptidase substrates de novo using protein structure modeling and biophysical evaluation of enzyme–substrate complexes. We construct atomic resolution models of thousands of candidate substrate–enzyme complexes for each of five model proteases belonging to the four major protease mechanistic classes—serine, cysteine, aspartyl, and metallo-proteases—and develop a discriminatory scoring function using enzyme design modules from Rosetta and AMBER's MMPBSA. We rank putative substrates based on calculated interaction energy with a modeled near-attack conformation of the enzyme active site. We show that the energetic patterns obtained from these simulations can be used to robustly rank and classify known cleaved and uncleaved peptides and that these structural-energetic patterns have greater discriminatory power compared to purely sequence-based statistical inference. Combining sequence and energetic patterns using machine-learning algorithms further improves classification performance, and analysis of structural models provides physical insight into the structural basis for the observed specificities. We further tested the predictive capability of the model by designing and experimentally characterizing the cleavage of four novel substrate motifs for the hepatitis C virus NS3/4 protease using an in vivo assay. The presented structure-based approach is generalizable to other protease enzymes with known or modeled structures, and complements existing experimental methods for specificity determination.

AB - Characterizing the substrate specificity of protease enzymes is critical for illuminating the molecular basis of their diverse and complex roles in a wide array of biological processes. Rapid and accurate prediction of their extended substrate specificity would also aid in the design of custom proteases capable of selectively and controllably cleaving biotechnologically or therapeutically relevant targets. However, current in silico approaches for protease specificity prediction, rely on, and are therefore limited by, machine learning of sequence patterns in known experimental data. Here, we describe a general approach for predicting peptidase substrates de novo using protein structure modeling and biophysical evaluation of enzyme–substrate complexes. We construct atomic resolution models of thousands of candidate substrate–enzyme complexes for each of five model proteases belonging to the four major protease mechanistic classes—serine, cysteine, aspartyl, and metallo-proteases—and develop a discriminatory scoring function using enzyme design modules from Rosetta and AMBER's MMPBSA. We rank putative substrates based on calculated interaction energy with a modeled near-attack conformation of the enzyme active site. We show that the energetic patterns obtained from these simulations can be used to robustly rank and classify known cleaved and uncleaved peptides and that these structural-energetic patterns have greater discriminatory power compared to purely sequence-based statistical inference. Combining sequence and energetic patterns using machine-learning algorithms further improves classification performance, and analysis of structural models provides physical insight into the structural basis for the observed specificities. We further tested the predictive capability of the model by designing and experimentally characterizing the cleavage of four novel substrate motifs for the hepatitis C virus NS3/4 protease using an in vivo assay. The presented structure-based approach is generalizable to other protease enzymes with known or modeled structures, and complements existing experimental methods for specificity determination.

KW - Rosetta software

KW - computational modeling

KW - proteases

KW - specificity prediction

KW - substrate specificity

UR - http://www.scopus.com/inward/record.url?scp=85008613508&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008613508&partnerID=8YFLogxK

U2 - 10.1016/j.jmb.2016.11.031

DO - 10.1016/j.jmb.2016.11.031

M3 - Article

C2 - 27932294

AN - SCOPUS:85008613508

VL - 429

SP - 220

EP - 236

JO - Journal of Molecular Biology

JF - Journal of Molecular Biology

SN - 0022-2836

IS - 2

ER -