GONUTS has been updated to MW1.31 Most things seem to be working but be sure to report problems.

Have any questions? Please email us at ecoliwiki@gmail.com

GO REF:BLASTP

From GONUTS
Jump to: navigation, search

Definition

BLASTP is a version of BLAST[1] that searches protein databases using a protein sequence as query. BLASTP is available as a web service on most sequence repositories, including NCBI and EBI.

Usage

In the context of Gene Ontology annotation, BLASTP is used primarily to identify protein sequences similar to the protein sequence of interest we wish to annotate. By establishing homology through sequence search, a preexisting experimental code-based GO annotation on a database hit may be transferred to the protein sequence of interest. Several criteria (outlined below) must be met to perform the annotation.

Evidence codes

The following evidence codes may be used:

WITH field

The database sequence used in the preexisting GO annotation must be referenced in the WITH field using an appropriate identifier.


Required criteria

The following basic criteria must be met in order to transfer an existing experimental code-based GO annotation for a database hit to our sequence of interest:

E-value Coverage Identities Context Release date Expiration date
< 1e-7 & > 75% & > 25% & Function/process/component must be applicable to organism of interest March 03 2017 (current)

Criteria specification

  • Organism restriction: The e-value, coverage and identity used to meet the above criteria correspond to those obtained when performing a follow up BLASTP search restricted to the organism on which we have identified a suitable hit (a protein record with available experimental GO annotations). This restriction is imposed through the Organism qualifier in the NCBI BLAST service, using appropriate taxonomy identifiers.
  • Coverage: is the percentage of the query that BLAST aligns to the database sequence (as %)
  • Identities: is the percentage of bases that are exact matches between query and database sequences in the alignment (as %)
  • Context evaluation: existing GO annotations for identified database hits must be critically assessed in the context of the organism to which we intend to transfer the annotation. For instance, a GO annotation specifying nuclear localization may not be adequate when transferring the annotation to a bacterial protein (except for secreted proteins in intracellular bacterial pathogens).

Required data and allowed evidence codes

Annotation metadata

Annotations made using BLASTP must contain the following information in the CACAO annotation comments section:

  • E-value: the e-value returned by BLASTP
  • Coverage: the percent query coverage
  • Identities: the percent identities
  • Context: a short, written justification of why the existing GO annotation for the identified database hit can be applied to the organism we are performing the annotation on
  • ISO-specific metadata: for ISO-based annotations, curators must provide the BLASTP e-value for the reciprocal BLASTP hit (BLASTP hit using the identified database sequence as query and restricting the search to the organism we are annotating on using the Organism qualifier).

Evidence code suitability

  • ISS: BLASTP annotations should preferentially use the more granular ISA evidence code. Use of ISS should be avoided.
  • ISA: the ISA evidence code is the recommended evidence code for BLASTP
  • ISO: curators wanting to annotate using ISO must perform a reciprocal BLASTP search. The reciprocal BLASTP search must meet the same criteria as the original one.

References

  1. Altschul, SF et al. (1990) Basic local alignment search tool. J. Mol. Biol. 215 403-10 PubMed GONUTS page