Ab initio and homology based prediction of protein domains by recursive neural networks



Название:
Ab initio and homology based prediction of protein domains by recursive neural networks
Тип: Автореферат
Краткое содержание:

Abstract
Background: Proteins, especially larger ones, are often composed of individual evolutionary units, domains,
which have their own function and structural fold. Predicting domains is an important intermediate step in
protein analyses, including the prediction of protein structures.
Results: We describe novel systems for the prediction of protein domain boundaries powered by Recursive
Neural Networks. The systems rely on a combination of primary sequence and evolutionary information,
predictions of structural features such as secondary structure, solvent accessibility and residue contact maps,
and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the
PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for
different ranges of template quality. We find that accurately predicted contact maps are informative for the
prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find
that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations.
We test both systems trained on templates of all qualities, and systems trained only on templates of marginal
similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect
predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors
down to essentially any level of template quality.
We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8%
(precision 38.7%) within ±20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based
predictors achieve a boundary recall of 74% (precision 77.1%) again within ±20 residues, and classify single
domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we
only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary
recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly.
Conclusions: The systems presented here may prove useful in large-scale annotation of protein domains in
proteins of unknown structure. The methods are available as public web servers at the address:
http://distill.ucd.ie/shandy/ and we plan on running them on a multi-genomic scale and make the results public
in the near future

 

Background
Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have
their own function and structural fold. Predicting domains is an important intermediate step in protein
analyses, including the prediction of protein structures. In this case the prediction can be applied to each
protein domain separately, decreasing prediction times, and increasing prediction accuracy especially in the
absence of homologues/templates and when interactions among residues are long ranging. Although
domain-domain interactions would have to be ignored when predicting domain structures separately, stages
for domain-domain interaction prediction can be designed [1, 2] to tie the domains together resulting in the
final three dimensional (3D) structure. The detection of structural templates from sequence can also be
improved when only considering the sequence that corresponds to each domain, since the domain itself is
more likely to be evolutionarily conserved. Fold recognition methods also perform better when using
individual domains rather than the entire protein [3].
Experimental structural determination methods become hard to apply when considering large proteins of
many domains. In X-Ray crystallography and NMR spectroscopy difficulties often arise when protein
domains are joined by less flexible boundary regions. Also, NMR structural determination errors tend to
arise when the protein is very long. As a result, experimental methods often determine structures by only
examining individual domains or at most a few domains together [4, 5].

 


Обновить код

Заказать выполнение авторской работы:

Поля, отмеченные * обязательны для заполнения:


Заказчик:


ПОИСК ДИССЕРТАЦИИ, АВТОРЕФЕРАТА ИЛИ СТАТЬИ


Доставка любой диссертации из России и Украины