GONUTS has been updated to MW1.29.2. Most things seem to be working but be sure to report problems.

It is now the 3rd OPEN Period for CACAO Phage Hunters S2018! It will end on Friday July 20, 2018 at 05:59 pm CDT
This is your chance to make annotations OR challenge other team's annotations. You may also DEFEND or suggest improvements to your own annotations IF they have been challenged. Please note, although we ENCOURAGE challenges, an excess of identical challenges that do not appear to be applicable to the annotation or well thought out will be considered spam and ignored.

Have any questions? Please email us at ecoliwiki@gmail.com

GO REF:Synteny

From GONUTS
Jump to: navigation, search

Definition

Synteny is classically defined as the "physical co-localization of genetic loci on the same chromosome within an individual or species". In today's genomic era, synteny is typically understood to denote the conservation of co-localization among multiple chromosomes. If two genetic loci, say gene X and gene Y, are seen to co-localize in multiple genomes, we'll refer to them as syntenic. There are multiple causes for synteny, having to do with genome organization. The most obvious ones are found in prokaryotes [1], where gene expression is often modulated through the regulation of single transcriptional units (operons) and where the placement of a gene within a genome, and the strand it lies on, can greatly impact its expression. Genes in operons are hence often syntenic, in the sense that the operon organization is preserved through evolution, and the arrangement of operons within the genome can also be syntenic to some extent.

Usage

In the context of Gene Ontology annotation, synteny can be used to infer molecular function, biological process and even cellular component from the conservation of particular genetic arrangements, provided that sufficient functional information (typically detected via orthology) is available for the genomic arrangement on the genomes where such arrangement is conserved.

In practice, the transfer of information from an experimentally GO-annotated gene to a gene product of interest in a target genome follows the same principle as the transfer of annotations using orthology methods, such as BLASTP or HHpred. Note, however, that synteny-based transfer annotations can be accomplished through two alternative scenarios:

  • Direct synteny: Based on synteny (and partial homology) evidence, we transfer an existing GO annotation from a genome to our target gene product. That is, we use synteny to back up a weak homology assessment and transfer the annotation from the identified ortholog. The annotated aspect comes from the weak identified ortholog.
  • Indirect synteny: Based on synteny evidence alone, we transfer an existing GO annotation from surrounding genes in another genome to the gene product of interest. In other words, we infer function for our gene product based only on its colocalization with other genes for which we can establish homology. The annotated aspect comes from the surrounding identified orthologs [note that this implies that homology-based annotations are likely possible for surrounding genes].

Depending on the type of approach used, we may transfer different annotated aspects to the gene product, but the respective criteria will vary depending on the aspect we annotate. The reason for this is that synteny is more conductive to the assignment of biological process and cellular component than for molecular function. The rationale is that genes in a conserved environment are likely to retain whatever functional requirement clustered them together (i.e. biological process and/or cellular component), but not necessarily maintain a specific molecular function.

Evidence codes

The following evidence codes may be used:

WITH field

The conserved gene sequences surrounding the gene of interest that provide us with functional information on the syntenic region and (if any) the orthologous sequence for the gene of interest used in the preexisting GO annotation must be referenced in the WITH field using an appropriate identifier.

Required criteria

The following basic criteria must be met in order to transfer an existing experimental code-based GO annotation for a database hit to our sequence of interest.

Direct synteny

Aspect E-value Coverage Identities Matched neighbors (%) Species Context Release date Expiration date
Molecular function < 1e-2 & > 30% & > 5% & >= 75% & >= 2 & Function must be applicable to organism of interest March 03 2017 (current)

Criteria specification

  • Aspect: refers to the GO sub-ontology terms that can be annotated through this method.
  • Organism restriction: The e-value, coverage and identity used to meet the above criteria correspond to those obtained when performing a follow up BLASTP search restricted to the organism on which we have identified a suitable hit (a protein record with available experimental GO annotations). This restriction is imposed through the Organism qualifier in the NCBI BLAST service, using appropriate taxonomy identifiers.
  • Coverage: is the percentage of the query that BLAST aligns to the database sequence (as %)
  • Identities: is the percentage of bases that are exact matches between query and database sequences in the alignment (as %)
  • Matched neighbors (%):: is the minimum number of neighbors (as %) that must be identified within a region to identify it as syntenic.
  • Species: is the minimum number of species in which the syntenic organization must be observed
  • Context evaluation: existing GO annotations for identified database hits must be critically assessed in the context of the organism to which we intend to transfer the annotation. For instance, a GO annotation specifying nuclear localization may not be adequate when transferring the annotation to a bacterial protein (except for secreted proteins in intracellular bacterial pathogens).


Indirect synteny

Aspect E-value Coverage Identities Matched neighbors (%) Functional neighbors Species Context Release date Expiration date
Process/Component < 1e-5 & >= 50% & > 15% & >=75% & >=2 & >=2 & Process/component must be applicable to organism of interest March 03 2017 (current)

Criteria specification

  • Aspect: refers to the GO sub-ontology terms that can be annotated through this method.
  • Organism restriction: The e-value, coverage and identity used to meet the above criteria correspond to in a BLASTP search for all identified neighboring genes (one of these genes must correspond to a protein record with available experimental GO annotations). This restriction is imposed through the Organism qualifier in the NCBI BLAST service, using appropriate taxonomy identifiers.
  • Coverage: is the percentage of the query that BLAST aligns to the database sequence (as %)
  • Identities: is the percentage of bases that are exact matches between query and database sequences in the alignment (as %)
  • Matched neighbors (%):: is the minimum number of neighbors (as %) that must be identified within a region to identify it as syntenic.
  • Functional neighbors" the number of neighbors for which we must have genome-annotated evidence of function
  • Species: is the minimum number of species in which the syntenic organization must be observed
  • Context evaluation: existing GO annotations for identified database hits must be critically assessed in the context of the organism to which we intend to transfer the annotation. For instance, a GO annotation specifying nuclear localization may not be adequate when transferring the annotation to a bacterial protein (except for secreted proteins in intracellular bacterial pathogens).


Required data and allowed evidence codes

Annotation metadata

Annotations made using synteny must contain the following information in the CACAO annotation comments section:

  • E-value: the e-value returned by BLASTP (on ortholog of gene of interest and neighbors)
  • Coverage: the percent query coverage (on ortholog of gene of interest and neighbors)
  • Identities: the percent identities (on ortholog of gene of interest and neighbors)
  • Context: a short, written justification of why the existing GO annotation for the identified database hit can be applied to the organism we are performing the annotation on, mentioning explicitly the characteristics of the syntenic region (and its constituent genes) that are used to make the inference

Evidence code suitability

  • IGC: IGC is the only evidence code applicable to synteny-based annotations.

References

  1. Moreno-Hagelsieb, G et al. (2001) Transcription unit conservation in the three domains of life: a perspective from Escherichia coli. Trends Genet. 17 175-7 PubMed GONUTS page