Protein structure validation by generalized linear model root-mean-square deviation prediction

Anurag Bagaria, Victor Jaravine, Yuanpeng J. Huang, Gaetano T. Montelione, Peter Güntert

Research output: Contribution to journalArticlepeer-review

40 Scopus citations


Large-scale initiatives for obtaining spatial protein structures by experimental or computational means have accentuated the need for the critical assessment of protein structure determination and prediction methods. These include blind test projects such as the critical assessment of protein structure prediction (CASP) and the critical assessment of protein structure determination by nuclear magnetic resonance (CASD-NMR). An important aim is to establish structure validation criteria that can reliably assess the accuracy of a new protein structure. Various quality measures derived from the coordinates have been proposed. A universal structural quality assessment method should combine multiple individual scores in a meaningful way, which is challenging because of their different measurement units. Here, we present a method based on a generalized linear model (GLM) that combines diverse protein structure quality scores into a single quantity with intuitive meaning, namely the predicted coordinate root-mean-square deviation (RMSD) value between the present structure and the (unavailable) "true" structure (GLM-RMSD). For two sets of structural models from the CASD-NMR and CASP projects, this GLM-RMSD value was compared with the actual accuracy given by the RMSD value to the corresponding, experimentally determined reference structure from the Protein Data Bank (PDB). The correlation coefficients between actual (model vs. reference from PDB) and predicted (model vs. "true") heavy-atom RMSDs were 0.69 and 0.76, for the two datasets from CASD-NMR and CASP, respectively, which is considerably higher than those for the individual scores (20.24 to 0.68). The GLM-RMSD can thus predict the accuracy of protein structures more reliably than individual coordinate-based quality scores. Published by Wiley-Blackwell.

Original languageEnglish (US)
Pages (from-to)229-238
Number of pages10
JournalProtein Science
Issue number2
StatePublished - Feb 2012

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology


  • CASP
  • Gaussian network model
  • NMR
  • Protein structure validation
  • RMSD
  • Structure quality


Dive into the research topics of 'Protein structure validation by generalized linear model root-mean-square deviation prediction'. Together they form a unique fingerprint.

Cite this