GONUTS has been updated to MW1.31 Most things seem to be working but be sure to report problems.

Have any questions? Please email us at ecoliwiki@gmail.com

Evaluating CACAO annotations

From GONUTS
Jump to: navigation, search

The Process

Annotations will be made in several week-long rounds by students enrolled in CACAO I. These annotations can be evaluated by Judges, the students in CACAO II, at anytime after they're made. However, most of the evaluations should be done during the assigned times in order to allow CACAO I students to fix any of their own mistakes they might catch. Every other week, CACAO I students will pick through their peer's annotations and challenge any errors they find. Judges are also responsible for mediating these disputes, and picking the best solution. The best "fix" might not be suggested by either the challengers or the original annotator, in which case the Judge is responsible for finding the most accurate, complete annotation and entering that in their evaluation. Evaluated annotations might also be verified by instructors before being accepted.

Assessment Icons

  • What do the icons in the status column mean?
assessment icons.jpg

The Goal

Once a "perfect" solution has been suggested and verified, the original annotation will be changed by CACAO students. We are trying to obtain high-quality annotations to the correct strain or organism with precise, detailed GO terms, and correct Evidence Codes. While a majority of these annotations will be checked several times, the point is to start with excellent annotations and minimise the amount of review that is needed.

Briefly how annotations will be checked

For more complete documentation see the page on evaluating annotations. If any of these conditions are not met, the GO annotation will be marked as incorrect and will not be submitted to UniProt or other databases.

  1. Is the annotation on the right protein’s page? (Is the paper about the protein?)
  2. Is the annotation complete? Does it have the 4 required parts? Does the annotation require either of the additional 2 fields (i.e. does the annotation use an evidence code that needs the with/from field filled in)?
  3. Has the student used information NOT allowed by the CACAO rules (i.e. evidence code or binding terms)?
  4. Do the notes point to a figure/table that supports the annotation? (i.e. is the paper a peer-reviewed article (no review article)? Is the figure or table experimental data (no models or crystal structures)?)
  5. Is there a more suitable GO term (more or less specific)?
  6. Does the evidence code fit with the experiment described?
  7. For IGI, ISO, or ISA have they entered the correct accession in the with/from field?
  8. For ISO & ISA, does the protein in the with/from field have a GO annotation that has experimental evidence for that GO term? (i.e. Does the annotation maintain a direct chain of evidence?)
  9. Is the annotation complete, correct and accurate based on the paper? (i.e. will it be submitted to UniProt?)

How annotations will be checked

Although these points can be used as a checklist, there are other possible issues, and annotations that are acceptable according to the following might be rejected for other reasons. Another criteria not mentioned below is improper grammar or spelling in the Notes field- we cannot submit annotations to databases that contain obvious mistakes or typos in the facts or the justifications.

Correct Protein Page

Is the paper about the SPECIFIC protein in question? In cases where UniProt has multiple records, it is important to check the strain/species. For example, the "Zaire Ebola virus" proteins have different accessions than those for "Sudan Ebola virus"; annotations meant for "Escherichia coli O6" do not go on a page for "Escherichia coli O8 (strain IAI1)". You can also usually narrow the strain down to a more accurate accession than the "Escherichia coli" protein. You may need to contact the authour to ask for specifics, but not all strains of all organisms are recognised in UniProt.

Complete Annotation

  1. Does it have the 4 required parts?  Even if the row is marked "Complete", the information in the rows may not satisfy all the CACAO rules.
  2. Does the annotation require either of the additional 2 fields, Qualifier or With/From? Certain evidence codes require the with/from field to be filled in, whereas others, like IDA, cannot be used when there is something to put in the with/from.

Duplicate/Redundant Annotation

  • Has the paper already been used for this protein...
    • for an annotation to this GO term? Even if several figures can be used for the same GO, make only one annotation.
    • for an annotation to a parent term? The GO rules state that an annotation to a term infers all the parent terms as well. This helps eliminate redundant annotations and "clutter" on the annotation pages. If a student makes several annotations, check carefully that they are not parent/child/grandchild etc. terms. If they are, only allow the MOST SPECIFIC applicable term. If you need to suggest a GO term to get the best annotation, disallow all but one and make your modifications in the normal way.

It is highly encouraged to annotate a paper fully, so multiple annotations using the same paper using terms that do not have a parent/child relationship is acceptable. If two sister terms apply, it is better to annotate to each rather than the parent term.

Any Information NOT Allowed by CACAO rules

  1. No annotations to "binding" terms are allowed. Not only do most of these terms provide very little information, but there are very complicated rules about using these GO terms and they usually require IPI, a forbidden Evidence Code.
  2. There are more restrictions on Evidence Codes this year than previous years, so make sure to pay attention to the allowed Evidence Codes.
  3. No references that are reviews. You can usually tell a review without reading it by looking at the Pubmed page: there is sometimes a link below the summary that has a "+" sign and is labelled "Publication Types, MeSH Terms, Substances". When you click the "+" and expand this list, under "Publication Types" it will explicitly say "Review". You will likely be able to find an appropriate reference for the annotation in the review's references, remember you must suggest a fix in order to get points for evaluating the annotation.

Specific Table/Figure

  1. Is the figure or table experimental data originally performed by the authours? Not only are reviews banned, but cartoons, models, crystal structures, etc. are not allowed as evidence. Just because the authours thought a figure or table was essential for their paper does not mean we can use it for an annotation.
  2. Although IMP usually overrides an IDA, carefully consider any alternatives. Make sure there are no superfluous figures mentioned, and ALL PARTS of the table/figure(s) apply to the annotation. Often a figure will have multiple panels. For western blots and the like, although annotators are encouraged to cite specific lanes that support their interest, an annotation is not wrong if they cite the whole blot. Do all parts of the figure:
    • apply to the protein being annotated?
      • If several different proteins are being analysed in a figure, make sure all parts of the figure cited actually involve the protein being annotated.
    • support the GO term?
      • Although proteins should be purified in order to prove they have a certain activity, a gel showing the purification steps does not contribute to an "function" term. It might show a "component" term, though.
    • support the Evidence code?
      • Sometimes, only panel B is needed for an IDA but A and B are needed to show IMP.

Correct Evidence Code

There are many different rules for evidence codes, and figures can be assigned different evidence codes depending on the parts of the figure cited.

With/From Field

  1. For IGI, ISO, or ISA have they entered the correct protein's accession?
  2. If the annotation uses ISO or ISA, this protein is called the match protein. It needs to have an existing annotation to the exact GO that is being used in the annotation. The match protein's annotation MUST NOT use IEA as the evidence code. This is to maintain a "direct chain of evidence".

GO Term Specificity

Assuming the GO is actually supported by the paper:

  1. In cases where a "regulation of" applies, can a "positive" or "negative" child term be used?
  2. There are many terms that seem quite similar, but are slightly different. Is a term like GO:0005624 membrane fraction more appropriate than GO:0016020 membrane? Note how much more specific the first term is, but some assays only show GO:0016020 membrane.
  3. Does the "best" term exist? The Ontology is growing daily, mostly due to users like you suggesting terms. Ask for help to look for an alternate term, because you might need to request a new term! (Bonus points if your terms get accepted!!)
  4. Does the GO term provide detailed information? There is a list of GO terms that will be rejected for being too vague, but annotations to child terms might be acceptable. Think of these types of terms as storage bins to organise the child terms, instead of terms that are descriptive or available for annotation. These include, but are not limited to:
    • GO:0000988 : protein binding transcription factor activity
    • GO:0001071 : nucleic acid binding transcription factor activity
    • GO:0006950 : response to stress
    • GO:0007610 : behavior
    • GO:0009605 : response to external stimulus
    • GO:0009607 : response to biotic stimulus
    • GO:0009628 : response to abiotic stimulus
    • GO:0009719 : response to endogenous stimulus
    • GO:0042221 : response to chemical stimulus
    • GO:0048583 : regulation of response to stimulus
    • GO:0048584 : positive regulation of response to stimulus
    • GO:0048585 : negative regulation of response to stimulus
    • GO:0050896 : response to stimulus
    • GO:0051716 : cellular response to stimulus