GONUTS has been updated to MW1.31 Most things seem to be working but be sure to report problems.

Have any questions? Please email us at ecoliwiki@gmail.com

GO term info from the GO Documentation

From GONUTS
Jump to: navigation, search


This page contains excerpts from the GO documentation on [Gene Ontology Home]

Return to [Compilation page]

From the Introduction

It is important to clearly state the scope of GO, and what it does and does not cover. The ontologies section explains the domains covered by GO; the following areas are outside the scope of GO, and terms in these domains would not appear in the ontologies.

  • Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
  • Processes, functions or components that are unique to mutants or diseases: e.g. oncogenesis is not a valid GO term because causing cancer is not the normal function of any gene.
  • Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and will be described in a separate sequence ontology (see the OBO website external link for more information).
  • Protein domains or structural features.
  • Protein-protein interactions.
  • Environment, evolution and expression.
  • Anatomical or histological features above the level of cellular components, including cell types.

GO is not a database of gene sequences, nor a catalog of gene products. Rather, GO describes how gene products behave in a cellular context.

GO is not a dictated standard, mandating nomenclature across databases. Groups participate because of self-interest, and cooperate to arrive at a consensus.

GO is not a way to unify biological databases (i.e. GO is not a 'federated solution'). Sharing vocabulary is a step towards unification, but is not, in itself, sufficient. Reasons for this include the following:

  • Knowledge changes and updates lag behind.
  • Individual curators evaluate data differently. While we can agree to use the word 'kinase', we must also agree to support this by stating how and why we use 'kinase', and consistently apply it. Only in this way can we hope to compare gene products and determine whether they are related.
  • GO does not attempt to describe every aspect of biology; its scope is limited to the domains described above.


From Guide to GO Evidence Codes

  • Annotations to protein binding ; GO:0005515 should not be used to describe an antibody binding to another protein. However, an effect of an antibody on an activity or process can support a function or process annotation, using the IMP code.


From Component Ontology

Protein Complexes

A cellular component should include more than one gene product; complexes of one gene product with a cofactor, e.g. heme and chlorophyll, should not be included. Homomultimeric proteins, e.g. the homodimeric alcohol dehydrogenase, may be included as cellular component terms, as should heteromultimeric proteins, e.g. hemoglobin with alpha and beta chains.

All complexes in the component ontology should be given parentage under the general term protein complex ; GO:0043234.

To distinguish cellular components from functions, use 'complex' in the term name of a component, and append enzyme names with the word 'activity'. For example, the molecular function term pyruvate dehydrogenase activity ; GO:0004738 describes the enzyme activity whereas the cellular component term pyruvate dehydrogenase complex ; GO:0045254 describes the multi-subunit structure in which the enzyme activity resides.


Membranes and Envelopes

  • Terms and structure

GO distinguishes single and double membranes surrounding organelles: an organelle envelope is defined as two lipid bilayers plus the space, or lumen, between them, whereas an organelle membrane is defined as a single bilayer. For double-membrane organelles, the membrane term refers to either of the lipid bilayers, but excludes the intermembrane space. The envelope is part_of the organelle and is_a organelle envelope ; GO:0031967; the membrane is part_of the envelope, and inner membrane and outer membrane terms can be included:

  • History

Prior to December 2005, GO:0005635 was named nuclear membrane, with nuclear envelope as a synonym; this reflected a usage fairly common in the literature. For consistency with other organelle envelope and membrane terms, GO:0005635 is now named nuclear envelope, consistent with its definition, and a separate term, nuclear membrane ; GO:0031965, has been added.

  • Membrane Proteins

As GO cellular component terms describe locations where a gene product may act, rather than physical features of proteins or RNAs, the terms integral membrane protein and peripheral membrane protein are present only as non-exact synonyms.

GO distinguishes classes of membrane-related location:

extrinsic to membrane ; GO:0019898 refers to gene products that are associated with membranes, but are neither directly embedded in the membrane nor anchored by covalent bonds to any moiety embedded in the membrane.

intrinsic to membrane ; GO:0031224 refers to gene products that have some covalently attached moiety embedded in the membrane, and is further split into integral to membrane ; GO:0016021 and anchored to membrane ; GO:0031225. The former refers to proteins in which some part of the peptide sequence spans all or part of the membrane (in theory, it could also be used for RNAs embedded in a membrane, if any such exist); the latter refers to gene products tethered to a membrane by a covalently attached anchor, such as a lipid moiety, which is embedded in the membrane.

Each of these terms can have child terms referring to specific membranes, for example integral to plasma membrane ; GO:0031226 or extrinsic to vacuolar membrane ; GO:0000306.

Note that even these terms actually describe spatial relationships between a membrane and a gene product, and therefore do not fit the strictest interpretation of a 'location'. They are retained nevertheless because of their considerable utility.

The cellular component ontology does not include terms for type I, II, etc., membrane proteins, because these classifications are not locations, but instead describe a different feature of the proteins, namely topological orientation with respect to the membrane and other cellular components. Furthermore, the wording type I integral membrane protein describes a class of gene products.

The DAG structure is as follows: diagram

Figure illustrating intrinsic, extrinsic, anchored to membrane terms.


Maintaining complete is_a and part_of trees in cellular component

The cellular component ontology is now is_a complete, meaning that every term has a path to the root node which passes solely through is_a relationships. This should be preserved; the following guidelines should help maintain this structure.


From Function Ontology

  • The essence of a function term

The functions of a gene product are the jobs that it does or the "abilities" that it has. These may include transporting things around, binding to things, holding things together and changing one thing into another. This is different from the biological processes the gene product is involved in, which involve more than one activity.

Annotating a gene product with different functions

A gene product may have many different functions, but it would be wrong to create a function term that represents multiple functions. Gene product information should be captured at the annotation stage, by annotating the gene product to several function terms, rather than by hardwiring the information into the ontology by adding extra parents. If a term has parentage which isn't immediately obvious from the term name or the definition, and therefore requires you to have background knowledge, then it's probable that the function term has been mistaken for the gene product of the same name and gene product specific information has been incorporated by adding extra parents.

Granularity

GO functions describe interactions at the level of molecules, rather than atoms. Therefore a reaction would not be split into function terms describing each step of the reaction in atomic or subatomic terms (eg. electrons attracted to positive charge or formation of unstable intermediate); it would consider the starting state and the end state in terms of the molecules involved. As a consequence of this, separate function terms would be created to cover the various situations in which different reaction mechanisms provide the route between the same set of same reactants and products. In addition to this, GO functions should not cover reactions that always occur spontaneously and without the need for a gene product catalyst. Since there is no gene product involved in such a reaction, the term would never be used for annotation.

Bidirectional reactions

For bi-directional reactions, we will create a single term that describes both directions of the reaction unless there is reason to believe that there is a biological justification to separate the two directions of the reaction into separate concepts.

Valid function terms

These are some guidelines for deciding whether a term is a valid molecular function or not.

For a function term that considers binding you must know the molecule that is being bound. For example you wouldn't say 'vesicle binding'; instead you would find out which protein in the vesicle membrane was being bound and use that in the term name.

The function must be a single reaction step. Anything that requires multiple steps is a process.

Functions are not restricted to the activities of single gene products; multi-gene product complexes can also have functions.

Do not confuse the following:

  • Two things that happen at the same time or that are done by the same molecule.
  • Two things that are dependent on each other and cannot occur independently.

For example, the proposed term actin binding with sliding actually includes the two functions binding and motor activity, so it is not appropriate as a function term. However, calcium-transporting ATPase activity represents two activities that are dependent on each other and cannot occur independently; thus calcium-transporting ATPase activity is appropriate as a GO molecular function.

Following on from this, it is also important not to confuse the case of two interdependent activities with the superficially similar situation where a process and an activity are dependent on each other. For example, cell adhesion receptor activity would not be a function ontology term since it describes the activity of receptor activity coupled to the process of cell adhesion.

It helps to consider the term name. Is it immediately obvious what's going on or does it sound like a gene product with 'activity' stuck on the end? For example with transporter activity, you know immediately what kind of function this is describing; whereas with actin activity it not really clear. It should be obvious what a function is without in-depth biological knowledge of a certain area.

Function terms for subunits

Regulatory and catalytic subunits of kinases, heterotrimeric G proteins, etc., are represented in the function ontology by a regulator activity term, under the enzyme regulator activity node, and an enzyme activity term, under the catalytic activity node.

Note that GO no longer uses the part_of relationship between the enzyme regulator term and the catalytic activity term. A full discussion of this topic can be found under 'Annotation Issues' in the minutes from the September 2003 Bar Harbor GO meeting.

Please see the GO annotation guide for advice on how to annotate subunits of a complex.

Things to avoid

  • Avoid Cellular Component Information

Cellular structures are not functions. Many cellular component references have been made obsolete in the function ontology. For example, a mitochondrial primase needs only be primase activity because annotators can assign location to gene products by annotating with appropriate terms from the cellular component ontology. By contrast, there are many cases where component terms are appropriate in the process ontology. For example, Golgi organization and biogenesis is different from lysosome organization and biogenesis, so the anatomical qualifiers 'Golgi' and 'lysosome' are necessary.

  • Avoid Gene Products

Gene products in themselves are not nodes of the function ontology, although doing something with or to a specific gene product can be one. For example, being hedgehog or a hedgehog receptor are not functions, but hedgehog receptor binding and hedgehog binding are functions. Most GO molecular function terms include the word 'activity' to help differentiate them from the physical gene product. When defining molecular function terms, be careful not to describe them as gene products. For example, the molecular function term kinase activity is defined as 'Catalysis of the transfer of a phosphate group, usually from ATP, to a substrate molecule', not 'an enzyme that catalyzes the transfer of a phosphate group, usually from ATP, to a substrate molecule'.

  • Avoid Binding Relationships

Catalytic activities should not be related to binding terms (see the September 2003 Bar Harbor GO meeting minutes); for example, ATPase activity should not be related to ATP binding. Similarly, there should not be a relationship between transporter terms and binding terms. Binding terms should only be used in cases where a stable binding interaction occurs. There are several reasons for this.

Firstly, transporter, catalysis and binding activities are all in the function ontology, which is used to describe elemental single step activities that occur at the macromolecular level. That means that if we were to further subdivide these functions - for example, splitting the catalysis of a reaction into steps such as "substrate binding", "formation of unstable intermediate" or "attraction of electrons to positive charge" - we would be saying that a reaction was actually a series of functions - i.e. a process. Additionally, we would be going beyond the scope of the molecular function ontology as we would be dealing with events on a molecular or atomic level.

Another reason is the sheer practicality of sorting through the 4000+ catalytic reactions we have in GO and deciding which of the substrates and products should be given 'binding' terms. Should we say that only substrates are bound by an enzyme? How about reversible reactions or cases where the reaction mechanism is unknown?

Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis.

From Biological Process Ontology

Cytokinesis

Cytokinesis is placed under cell division but not under cell cycle, something which seems counterintuitive to many. This is because bacteria, which do not have a cell cycle, undergo cytokinesis. Organisms that do have a cell cycle can use more specific terms, such as cytokinesis after meiosis I and cytokinesis after mitosis to represent cytokinesis in their organism.

The Cell Cycle

The Development Node

The set of standard terms below can be applied to each developing structure in each species covered in the ontology. However it is generally not practical to implement every term for every structure, since this would lead to a massive proliferation of terms. Where one term e.g. x development, is present, the rest of the terms for the development of x are considered to be implied, without having actually been implemented. Further terms are generally only implemented when they are required for annotation. To see an example of a more full implementation please see the children of mesoderm development, which cover the development of the mesoderm, and the axial, paraxial, and intermediate mesoderm.

This development node structure was agreed upon in 2003 and is gradually being retrofitted. Where terms appear not to conform to it, this may be because they have not yet been retrofitted, or because their development includes an exception to the normal model. Any questions about the development terms should be posted on the GO curator requests tracker.

Multi-Organism Process

Metabolism

Regulation

Detection of and Response to Stimuli

Sensory Perception

Signaling Pathways

Transport and Localization

Transporter activity (molecular function)

Other Misc. Standard Definitions