the intercollegiate annotation competition!
The Community Assessment of Community Annotation with Ontologies (CACAO) is a competition for teams of undergrads around the world to improve the functional annotation of genes. CACAO was developed and is currently run at Texas A&M University, along with many other institutions. If you are interested in participating, please email us - Suzi or Jim.


Centenary College participants

The rapid advances in genomics technology mean that we are discovering genes faster than we can figure out what they do. For most new sequences, gene function will be inferred by comparison with well-studied systems that have published experimental data. This means that the quality of functional annotation for these systems is very important for the future of biology. This course aims to remedy the problem of available curators by leveraging undergraduates -- teaching them how to critically read scientific literature and make useful and correct annotations. We think that the large number of biology-related undergraduates at world-renowned institutions are a vast untapped resource for biocuration.

The Community Assessment of Community Annotation with Ontologies (CACAO) is a project to do large-scale manual community annotation of gene function using the Gene Ontology as a multi-institution student competition. In CACAO, teams of students get points for making annotations, but can also take points from competitors by correcting their annotations. Annotations that are judged to be correct will be submitted to the GO Consortium for incorporation into the overall annotation of gene products in major databases. CACAO teams can be run as courses or as club activities, and the scope of CACAO annotation can be any protein in UniProt where students can find experimental literature supporting appropriate GO annotations.

The Competitions

For Instructors

Course Materials

Instructors may find the following resources helpful.

Biocurator Training Videos - Spring 2013

  • Videos intended for CACAO I students are in the "Help For Students" section below, but can also be very informative to instructors new to CACAO or CACAO II students who have not previously competed

Other videos and old training videos from Fall 2012:

Downloadable Biocurator Training Powerpoints

Recruitment Flyer used at TAMU

Help for Students

The 6 parts to making a GO annotation

  1. Find a suitable paper about a protein on PubMed (no review articles, must have a PMID, must have data in it about the protein)
  2. Find the same protein in UniProt (same species, strain, organism)
  3. Use the UniProt accession to make a page for the protein on GONUTS (make sure you are signed in, then click on "Create New Gene Page" on the left & paste the accession # in the box)
  4. Find a suitable GO term based on figure(s)/table(s) in the paper (check the figure legend & figure description in the text for key words. then search on GONUTS, QuickGO or AmiGO for the key words)
  5. Pick a suitable evidence code based on how the protein was characterized
  6. Enter your GO annotation on the protein's page in GONUTS, complete with notes (figure(s) or table(s) that support your annotation).


If you're having any issues, first make sure you are logged in on GONUTS!

The helpful handouts for students page has several documents that address PubMed searches, Evidence Codes, a guide to making annotations, and more. (IMPORTANT!)

  • The Gene Ontology Help page is an alternative, general explanation to just what you're doing, why it's important, and what's involved. Read over this short guide if you're still completely lost as to what CACAO is all about, and don't worry- email us for more help.
  • Help:CACAO Errors describes errors made by CACAO students and how to avoid them
  • Finally, the links at the bottom of the page (under Pages in category "CACAO") may be helpful.

Biocurator Training Videos

Evidence codes

The helpful handouts for students page also has several handouts including the Evidence Code Decision Tree and a "Sampler" sheet listing experiments that are typically used to support each evidence code (IMPORTANT!)

This Guide to Evidence Codes is an at-a-glance-guide to the codes you can use for CACAO and how/when they are used (IMPORTANT!)

  • Here's the list of the evidence codes that CACAO students may use:
    1. IDA: Inferred from Direct Assay
    2. IMP: Inferred from Mutant Phenotype
    3. IGI: Inferred from Genetic Interaction - requires with/from field to be filled in
    4. ISS: Inferred from Sequence or Structural Similarity - almost always requires with/from field to be filled in
    5. ISO: Inferred from Sequence Orthology - requires with/from field to be filled in
    6. ISA: Inferred from Sequence Alignment - requires with/from field to be filled in
    7. ISM: Inferred from Sequence Model - requires with/from field to be filled in
    8. IGC: Inferred from Genomic Context


Assessing Annotations

Still Stuck?

We are more than happy to answer questions by email or Skype. Suzi will answer emails FROM ANY SCHOOL as fast as possible, usually on the same day. However, be warned that if you wait until the last day or two of a round (especially the last round) we might not be able to answer by the round deadline.

If you're on the Texas A&M campus, you can also email us to set up a meeting. We're in room 443 of the Bio/Bio building.

Links and Other Resources


