GONUTS has been updated to MW1.31 Most things seem to be working but be sure to report problems.
GO REF:HHpred
Contents
Definition
HHpred[1] is a tool for searching protein sequence and structure databases using amino acid sequences as a query. Instead of directly aligning query sequences against database sequences, as in [GO_REF:BLAST | BLAST], HHpred generates a Hidden Markov Model (HMM) for our query (using an initial PSI-BLAST sequence database alignment) and matches this generated model against databases of HMM generated from both sequence and structure databases. The database HMMs incorporate structural and sequence information, making HHpred one of most sensitive servers for remote homology detection. HHpred is available as a webservice at the Max Planck Institute for Developmental Biology in Tübingen.
Usage
In the context of Gene Ontology annotation, HHpred is used primarily to identify sequences similar to the sequence of interest (the query) we wish to annotate upon. Once we have validated that the identified database sequence in the database meets the provided criteria to establish homology between the two sequences, experimental code-based annotations on the database sequence can be transferred to the sequence of interest (our query) citing the CACAO GO_REF as a source and making use of the appropriate evidence codes.
Caveats
Note that HHpred is capable of detecting very distant relationships, which may rely on partial alignments with structurally conserved protein domains. Another way to express this is that HHpred can detect very weak similarities, based only on fragmented matches with database sequences. Critical assessment and interpretation of HHpred results is required to make GO annotations on the basis of HHpred evidence.
Evidence codes
The following evidence codes may be used:
WITH field
The database sequence used in the preexisting GO annotation must be referenced in the WITH field using an appropriate identifier.
Required criteria
Several criteria are used to inspect and validate that alignments returned by HHpred are supportive of sequence homology and can be used to transfer existing GO annotations for a database hit to our sequence of interest. These criteria apply to overall HHpred results statistics (e.g. p-value), assessment of the quality and extension of the generated alignments and evaluation of the transferability of an existing annotation to the biological context of the sequence of interest.
The following basic criteria must be met in order to transfer an existing experimental code-based GO annotation for a database hit to our sequence of interest:
P-value | Domain function | Domain coverage | Context | Release date | Expiration date | |||
---|---|---|---|---|---|---|---|---|
> 0.9 | & | Identified in GO annotation | | | > 50% | & | Function/process/component must be applicable to organism of interest | March 05 2017 | (current) |
Criteria specification
- P-value: The p-value returned by HHpred. HHpred does not return reliable e-values due the complexity of properly assessing the search space. P-values over 0.9 are required, but over 0.95 are recommended to make reasonably sound inferences of homology.
- Domain attributes (any of the two domain attributes criteria may be met)
- Domain function: HHpred may return statistically significant results based only on a partial domain match. If such is the case, the domain identified by HHpred must be explicitly cited as being involved in the function/process/component identified on the existing GO annotation for the database sequence (i.e. the authors of the experimental work must have explicitly reported the involvement of the domain in the observed function/process/component).
- Domain coverage: if the HHpred search matches several domains of a database protein sequence, and those are not explicitly mentioned in the existing GO annotation, at least half of the query protein domains must be present in the matching database protein.
- Context evaluation: existing GO annotations for identified database hits must be critically assessed in the context of the organism to which we intend to transfer the annotation. For instance, a GO annotation specifying nuclear localization may not be adequate when transferring the annotation to a bacterial protein (except for secreted proteins in intracellular bacterial pathogens).
Required data and allowed evidence codes
Annotation metadata
Annotations made using HHpred must contain the following information in the CACAO annotation comments section:
- P-value: the p-value returned by HHpred.
- Domain function: a domain name followed by a PMID identifier to the paper describing the active involvement of that domain in the function/process/component designated in the original GO annotation.
- Domain coverage: the names of query protein domains matched in the database sequence, and the ratio XX/YY of matched to total query domains.
- Context: a short, written justification of why the existing GO annotation for the identified database hit can be applied to the organism we are performing the annotation on
- ISO-specific metadata: for ISO-based annotations, curators must provide the BLASTP e-value for a reciprocal BLASTP hit (BLASTP hit using the identified database sequence as query and restricting the search to the organism we are annotating on using the Organism qualifier).
Evidence code suitability
- ISS: HHpred annotations should preferentially use the ISS evidence code. Use of ISA should be avoided.
- ISA: the ISA evidence code is not recommended for HHpred search, since alignments are not strictly sequence based.
- ISO: curators wanting to annotate using ISO must perform a reciprocal BLASTP search. The reciprocal BLASTP search must meet the same criteria outlined for [GO_REF:BLASTP BLASTP] annotations.
References
- ↑ Söding, J et al. (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33 W244-8 PubMed GONUTS page