Ab initio and homology based prediction of protein domains by recursive neural networks

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Walsh, Ian
dc.contributor.author Martin, Alberto J. M.
dc.contributor.author Mooney, Catherine
dc.contributor.author Rubagotti, Enrico
dc.contributor.author Vullo, Alessandro
dc.contributor.author Pollastri, Gianluca
dc.date.accessioned 2011-12-12T12:00:30Z
dc.date.available 2011-12-12T12:00:30Z
dc.date.copyright 2009 Walsh et al; licensee BioMed Central Ltd. en
dc.date.issued 2009-06-26
dc.identifier.citation BMC Bioinformatics en
dc.identifier.uri http://hdl.handle.net/10197/3396
dc.description.abstract Background: Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures. Results: We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality. We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly. Conclusion: The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: http://distill.ucd.ie/shandy/ and we plan on running them on a multi-genomic scale and make the results public in the near future. en
dc.description.sponsorship Science Foundation Ireland en
dc.description.sponsorship Health Research Board en
dc.format.extent 584519 bytes
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher BioMed Central en
dc.rights This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. en
dc.rights.uri CC BY 2.0 en
dc.rights.uri http://creativecommons.org/licenses/by/2.0 en
dc.subject RNN en
dc.subject Neural networks en
dc.subject Protein domain prediction en
dc.subject.lcsh Neural networks (Computer science) en
dc.subject.lcsh Proteins--Structure en
dc.title Ab initio and homology based prediction of protein domains by recursive neural networks en
dc.type Journal Article en
dc.internal.availability Full text available en
dc.internal.webversions http://www.biomedcentral.com/1471-2105/10/195 en
dc.status Peer reviewed en
dc.identifier.volume 10 en
dc.identifier.issue 195 en
dc.identifier.doi 10.1186/1471-2105-10-195
dc.neeo.contributor Walsh|Ian|aut| en
dc.neeo.contributor Martin|Alberto J. M.|aut| en
dc.neeo.contributor Mooney|Catherine|aut| en
dc.neeo.contributor Rubagotti|Enrico|aut| en
dc.neeo.contributor Vullo|Alessandro|aut| en
dc.neeo.contributor Pollastri|Gianluca|aut| en
dc.description.othersponsorship UCD President's Award 2004 en
dc.description.admin au, da, sp, ke, ab - kpw2/12/11 en

This item appears in the following Collection(s)

Show simple item record

This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.

If you are a publisher or author and have copyright concerns for any item, please email research.repository@ucd.ie and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.

Search Research Repository

Advanced Search