New Reactome Curator Guide

From ReactomeWiki

Jump to: navigation, search

Contents

A Guide for Reactome Curators

Introduction

The goal of Reactome is to describe the known biochemical details of human biological pathways. A Reactome curators job is to work with experts in different fields of biology to identify and curate suitable human pathways (see below) by breaking them down into subpathways and reactions and describing them in a format that is compatible with the Reactome data model. The purpose of this guide is to describe each step of that curation process to help the curator fully understand the steps involved. The guide is a useful reference for the experienced curator as it is nearly impossible to remember all of the steps and details of the curation process. Curators are actively encouraged to add to sections of the guide providing useful hints that may be missing for current and future curators. Regular use and addition of material to the guide will facilitate the process of curation and increase annotation consistency. If in doubt about any of these steps...do not hesitate to e-mail the internal list for clarification (and then contribute any helpful info that you get to save someone else the trouble of asking later!) This guide is divided into 7 sections.

  1. FAQ ("How do I ....?") A list of curation common curator related questions organized by keyword
  2. Choosing a topic
  3. Guidelines for naming entities and events
  4. How to use essential data model classes (using a curated example)
  5. Creating a "Reactome friendly" framework of your pathway module
  6. The essential QA process during curation
  7. Using the curator tool
  8. Drawing diagrams
  9. Preparing for database releases


Here we provide basic guidelines as to how data for relevant pathways should be collected, organized and entered into Reactome via the Curator Tool. The process of converting a biological topic should maintain the referential integrity of the reactions, both newly added as well as those already present in Reactome. It is assumed that you (the reader of the guide) have some familiarity with Reactome, and have read the Reactome papers. They are freely available as PDF files. Though this document is oriented toward the Reactome curator, there is much here for outside groups using Reactome. The entire dataset, website, curator, and author tools can be downloaded from the Reactome download page. Installation and configuration instructions are also provided here. If you have any problems with the install or tools please contact us at help@reactome.org.

FAQ or "How do I ...?"

A key word organized FAQ sheet for Reactome curation questions can be found here. Please add to this as you come up with solved examples for your own questions.

Choosing a topic/pathway for curation

Focus first on:

  1. Areas of biology that you know well (or one in which you have good contacts).
  2. Pathways that are well characterized at the molecular level AND for which there is considerable HUMAN biochemical data and proteins that are new to Reactome.

Points to consider:

  1. Give lower priority to topics that will require a large amount of inferences from other species.
  2. Don't try to tackle huge pathways all at once.
  3. Break large topics into *manageable* pieces. Experts are generally unwilling to commit to a project bigger than a dozen rxns and it is easy to get lost in the details and lose focus which is costly in terms of time.
  4. Work on several (preferably related) small projects simultaneously. Experts often fail to come through at critical times so it is good to have a few projects at different stages to keep work flowing through the pipeline.
  5. Check the Editorial Calendar to ensure that your plans don't ovelap with another curator. If in doubt contact the curator.
  6. Check with Peter - he may be aware of planned work that is not recorded in the Editorial Calendar.

Understanding and using the data model

A detailed understanding of the Reactome data model is important for accurate and consistent curation. The best way to start learning about the different data classes and how to use them is to browse the data model glossary. Some classes are used infrequently but you still need to be aware of them. Some of the more important/confusing data classes and attributes are listed below along with some relevant use cases. Please add to this if you come across new and useful cases. All the classes have associated data fields, divided into several categories. These are described. Below them are some usage examples from curated Reactome pathways (mostly from Apoptosis right now). We will continue to add to this list (and provide more actual examples) and we encourage you to add any informative examples of your own.


Key Reactome data classes and relevant use cases

Incomplete.......

Data Classes

Important attributes

Conceptual Approaches To Curating and other hints

Mandatory, Required, and Optional

All the classes of data in Reactome have a number of details that MUST be completed before thay can be 'released', i.e. made visible to the public. Fields containing details have a symbol that indicates the type, either a red box with a letter superimposed, or a yellow diamond with n superimposed. The yellow diamond with n indicates fields that are automatically completed. The red squares indicate, by letter,

  • m - Mandatory - you must provide this information.
  • r - Required - you must provide this information if it is possible - a good example of this is species, required for defined sets of proteins, but not required for defined sets of small molecules.
  • o - Optional - not available for every instance, so cannot be 'Required' but in some circumstances may be a procedural requirement, e.g. inferredFrom must be completed for human reactions that are inferred from a model organism.

Events

The main concepts in the Reactome data model are Event and PhysicalEntity. Events are of two types: Pathways or Reactions. Pathways in Reactome are multi-step events, whereas Reactions are single-step events (at the molecular or atomic level).

Regulation of Apoptosis


A concrete curated example is the Reactome Apoptosis pathway which is broken down into the subpathways "Extrinsic pathway", "Intrinsic pathway", "Activation of Effector Caspases", "Execution phase" and "Regulation of Apoptosis".

Apoptosis In Reactome


Events may be linked to other Events that precede them, regulate or are regulated by them. Reactions contain PhysicalEntities that take part in the Event. At present there are two subclasses of Event: ReactionlikeEvent and Pathway. The ReactionlikeEvent subclass has 4 subclasses: BlackBoxEvent, Depolymerisation, Polymerisation, and Reaction.


Reaction

A Reaction is an event that converts inputs to outputs in a single step.

The Reactome data model robustly represents biology. The central concept in Reactome is the reaction, which is used together with pathways, macromolecules, small molecules, complexes, and catalyst activities to represent biological processes. The reaction itself is a single step biological event in which input entities are converted to output entities.


PhysicalEntities can be the inputs, outputs, catalysts, regulators, or requirements in Reactions. PhysicalEntities can be single entities, such as proteins, small molecules, RNA, DNA, carbohydrates, lipids, or sub-atomic particles. They can also be complexes consisting of a combination of any of the single entities, or polymers synthesized from the single entities. Related entities can be grouped into a set. *The use of sets is described below*.

Here are some common examples of different types of reactions and blackbox event use cases. Clicking on the blue highlighted class name will bring you to some additional usage hints and warnings.
Simple binding/complex formation reaction

Binding reaction.jpg

Simple dissociation reaction

Reaction.jpg

Post-translational modifications

Posttranslational mod.jpg

ReactionLikeEvents

BlackBoxEvent

A BlackBox event converts inputs to outputs in a multiple steps that are not annotated because:

  1. We don't know all of the intervening steps.
  2. The intervening steps are known but we do not want to curate them for the purposes of the module. Two examples are:
    1. the transactivation of a gene leading to production of a protein....we don't want to include all the steps of translation of that protein.
    2. The degradation of a protein.

Blackbox.jpg

Guidelines for Naming Entities and Events

Curators should use the shortest possible accepted molecule names and modification abbreviations when naming entities. Here are some guidelines for the different classes of entities:

1. EWAS: HGNC name, or uniprot short name

2. Modified entities (e.g post translationally modified proteins)

3. Complexes and Sets

  • Use accepted compact name preferentially
  • If no compact name available, list component short names separated by colons
  • Do NOT include the words "complex","dimer","monomer","associated with","bound to" in the complex name.

4. Sequence variants: see Human Genome variation society description of sequence changes at protein level.

5. Large deletions Annotation of Large Deletions, Insertions and Protein Fusions

Annotating a disease process

Structuring the disease pathway

New pathways that describe disease processes should be placed under the "disease" chapter.
When you are creating a disease pathway that has/will have a "normal" counterpart pathway in Reactome, the pathway should be structured as it is in the following example:

  • Signaling by EGFR in cancer
    • Signaling by EGFR
    • Signaling by constitutively active EGFR


The disease pathway should have two sub-pathways only, one for normal counterpart, and another for grouping all disease related events (pathways or reactions).

Assigning Disease term attributes

To pathways:
Please be sure to label the top-level disease pathway and its sub-level disease pathway with a disease attribute. You can browse the hierarchy here and if you don't find the term that you are looking for in gk_central you can create a new one by opting to create a new instance. Enter the DOID identifier (number only) in the identifier slot and you will be asked if you want to import the entry. Say yes.
To reactions:
A disease attribute should be added to all reactions involving desease related physical entities, such as proteins of bacterial/viral/fungal pathogens, mutant human proteins and drugs used in disease management. Use the same procedure as the one described for addition of disease attributes to pahtways. If there are several related disease tags that are applicable to reactions and pathways, using the most general tag is preferrable, and even if specific disease attributes are added, always include the most general attribute for a given disease type. For example, for cancer related reactions and pathways, always include the cancer tag.
To entities:
Disease attributes should be added to mutant proteins associated with disease and may be very specific, referring to the specific disease type(s) in which a particular mutation was found. For example, EGFR L861Q mutant (DB_ID 1177542), in which L-leucine at position 861 is replaced with L-glutamine has been detected in non-small cell lung carcinoma and adult glioblastoma multiforme. Besides these specific disease tags, it may be advisable to also add more general disease tags to an EWAS, in this case lung cancer and cancer, to enable search of mutant EWASs using these general terms. When it comes to entity sets, such as EGFR KD mutants (DB_ID 1182966) which includes all kinase domain mutants of EGFR in cancer, a very general disease tag, such as cancer, is appropriate. This is because each member of the set has its own range of cancer types (while EGFR L861Q is found in lung cancer and glioblastoma, EGFR L858R is found in lung cancer, thymoma, thyroid cancer, breast cancer and ovarian cancer), but their biological behavior is identical/similar. For the same reasons, only the general cancer disease tag should be used for events in which cancer disease entities participate.
When annotating drugs used to treat a particular disease, curators should add an appropriate disease tag to a drug entity. For a cancer drug, the general cancer disease attribute should definitely be added, but when it comes to more specific disease tags, it may be advisable to add only those cancer types for which the treatment by a given drug is approved.

Associating normal and disease pathways with the same diagram

In order to be able to have the disease pathway share a diagram with the normal pathway, you must add the disease pathway as a value in the representedPathway slot for the normal pathway diagram. For the example above, you would create a diagram for the normal pathway "Signaling by EGFR " (called "Diagram of Signaling by EGFR") To share this diagram with the disease pathway, you would add "Signaling by constitutively active EGFR" as the second value in the representedPathway slot for the diagram "Diagram of Signaling by EGFR". Disease pathways will show normal events and entities as a shaded background, while disease events and entities should be emphasized by red lines.

Disease Pathways without a corresponding "normal" pathway in Reactome

If the disease pathway will not have a corresponding "normal pathway" in Reactome, the above organization does not apply, but the pathway should still be placed under the disease chapter and a disease term should be applied. If a suitable term cannot be found, please ask about submitting a new term suggestion to the EBI. Red lines should be used to emphasize disease entities and events in other disease pathways - e.g. viral proteins and their reactions with human proteins. Highlight any reaction that had to do with disease progression and any entity that is from another species. Host entities should be left black, but complexes that had host and other species were colored red. Coloring host entities red would be misleading, even if the host proteins are hijacked into doing something that has to do with disease progression. In these cases the reaction lines are red, but the host entities are not highlighted.

Using the curator tool

Downloading, Installing, And Maintaining The Curator Tool

Instructions for Downloading Installing and Maintaining the Curator tool can be found here.

Curator tool user interface

The curator tool has three panes to view the content. These different views can be switched using the tabs in the upper left hand corner of the curator tool window.

The Menu Bar

The menu bar contains a number of shortcut buttons, a scroll over message will tell you the function of each button.

The Schema View

The Schema view provides a hierarchical list of the data classes. In this view you can search, open, edit and create instances of these classes. This panel is used most often for creating new items, such as an EWAS derived from a ReferenceGeneProduct.

CT schemaView.jpg


The Event View

This view is where most curation is done. The event view displays a hierarchical, alphabetical list of the events in the local project on the left, a graphic in the middle that shows the relationship between objects for selected events, and on the right the details of selected events. Unfurling a pathway by clicking on the + symbol to the left of its name in this views reveals all of its component Reaction events. DON'T CONFUSE these symbols for the check box - clicking on an unchecked box marks the event as ready for release onto the public website; clicking on a checked box has the opposite effect. For pathways the check box selects the ENTIRE pathway, doing this by mistake can be time-consuming and tedious to reverse so be warned!

CT eventView.jpg


The Entity Level View (ELV)

The ELV is where pathway diagrams are created and subsequently represented. In this view the pathway is layed out graphically showing the inputs, outputs, catalyst where appropriate, and regulatory molecules, each in the correct cellular compartment.


CT ELV.jpg


Synchronizing local projects with the database

Synchronize at least once a day. It is good practice to synchronize your local project with the gk_central database to ensure that your work is not lost should the local copy become corrupted or lost due to hard drive failure. It is useful for other curators who may be able to reuse EWASes or other instances that you have created. It is perfectly acceptable to 'check-in' incomplete work unless it is marked as 'doRelease'.

To keep the data in the opened project and the database repository consistent, you need to synchronize the local project with the database periodically. Several actions are available for you to do synchronizing. The following is the "Database" menu for doing database-related actions:
CDBMenu.gif

"Match Instance in DB..." can find a matched instance in the database repository for the selected instance in the local repository. A matched instance has the same defining attributes as the local one. If a matched instance can be found, you can merge the local one to the database one. "Compare Instance in DB..." can compare a selected instance with the corresponded one in the database. Corresponding means the same DB_IDs. So you should not compare one instance checked from database A to an instance in database B because same DB_ID might be assigned to different instances. "Update from DB" will update a checked out instance from the database. "Check In" is used to check in newly created or modified instances to the database. It is recommended that you should use "Compare Instance in DB..." first before "Update from DB" or "Check In". After a new instance or modified instance checked into the database, the marker ">" will be removed to indicate that the local and the database copies are the same.

The "Synchronize with DB..." menu is used to check all instances in the selected schema class in the schema view or in the whole opened project if no class is selected. There are four possible inconsistency categories between the two repositories: instances different between the local and the db repositories resulting from instance modification either locally or in the database, instances created locally, instances deleted in the database by you or others, and instances deleted locally by you. You can choose appropriate action for the selected instances in different categories. Please be aware that if you do a multiple selection from different categories, the enabled actions are applied to all selected categories. For example, actions "Update from DB" and "Commit to DB" can be applied to instance "AIF-mediated response" in the first category, but only "Commit to DB" can be applied to instance "TestPathway" in the second category. If you select both "AIF-mediated response" and "TestPathway", only "Commit to DB" is enabled and "Update from DB" is disabled. Double clicking an instance in the first category will popup a comparison dialog the local and database instances, while double clicking an instance in other categories will show the contents of the clicked instance. To deselect a single instance, hold the control key and click the selected instance.

CSynchronization.gif


There are three different cases in the first category, instances different between the local and the db repositories: instance is modified in the local project but not in the database, instance modified in the database but not in the local project, and instances are modified in both the local and database. The user can commit changes to the database or overwrite changes by updating from database for the first case. The user cannot commit changes for an instance in the second or third case, but can update from database for both cases.

The following is the list of icons used in the synchronization dialog:

Icon
Applicable actions
Note
InstanceNewChange.png Update from DB, Commit to DB,

Show Comparison

New changes in local instance

only

InstanceNewChangeInDB.png Update from DB, Show Comparison
New changes in database instance

only

InstanceChangeConflict.png Update from DB, Show Comparison
New changes in both local and

database instances

InstanceNew.png Commit to DB
New local instance
InstanceDelete.png Update from DB, Commit to DB Local instance is deleted but

still exists in the database

InstanceDeleteInDB.png Update from DB, Commit to DB,

Clear Record

Database instance is deleted but

still exists in the local

What do I do if the curator tool flags instances as being duplicated in gk_central

You will need to evaluate each instance that is flagged as being a duplicate and compare it to the instance that is flagged in gk_central. This is easiest to do before synchonizing your project with the database as described below:
Search the local project for all database instances that contain ‘-‘, then highlight each one in turn, right-click to open a pop-up menu, and choose “match instance in DB”. Any results returned by that query are already-existing instances that, by the rules of gk_central, are duplicated by the new local instance. Mostly, that’s right and you should accept the option offered by the form, to replace the local instance in the local project with the already-existing one from gk_central. That cleanly gets rid of the duplication and preserves all references correctly in the local project and in gk_central. Sometimes (rarely) the form is wrong because, despite identical defining attributes, two instances are genuinely different, e.g., if a person instance has already been created for J(ohn) Doe and you have now created one for J(ane) Doe. In this case, you should refuse the offer made by the form and should make a note to force the new instance into gk_central at synchronize-and-commit time.

The annotation process

Precuration

1. Create a basic outline of pathway: Regulation of Apoptosis as an example. Further description of the annotation process will focus on the subpathway highlighted below in yellow.

Precuration1.jpg

2. Flesh out outline with information including: molecules, compartment, species, text summary, references (PMIDs)

Precuration4.jpg Precuration3.jpg

3. Create table of of molecules participating in the pathway

-Each modified form of a protein as a separate entry
-Look up/enter corresponding uniprotID identifiers

Spreadsheet2.jpg

Data entry

Identify existing proteins/molecules

In many cases the proteins on your list will already exist in the database. You should make every effort to reuse existing instances wherever possible to avoid unnecessary and confusing duplications. Use the curator tool to search the ReferenceGeneProduct (RGP) class using Uniprot identifiers as follows:

Searching the database

Choose Class: ReferenceGeneProduct
Choose attribute: identifier
Attribute value: Use REGEXP
Enter your identifier in the search box. You can enter several as a pipe separated list (e.g A1A4S6|O43293|P43146|P43146....).

CT RPS search.jpg

Select the ReferenceGeneProduct (RGP) returned, if a list select them one at a time, right click and opt to "View referrers". The reulting Referrers Dialog box lists Referrers by property name. At the top of the list can be isoforms of the protein, if present in Uniprot. Do not use isoforms unless you are certain that only specific isoforms have the functionality you intend to represent. Isoforms may have their own referrers and you should check this - if someone took the trouble to create an isoform-specific EWAS they probably had good reason to do so. Items listed with the property name referenceEntity are EntitiesWithAccessionedSequence (EWASs), a Reactome identifier for specific forms/locations of a protein. Often there will be more than one EWAS for a single RGP, because post-translationally modified forms of proteins and proteins in different cellular compartments each have a separate EWAS. If any of the listed EWASs correspond to your needs, right click and opt to "Check Out" that referrer. If the correct molecular compartment or post-translationally modified form is not present, you can still check out an EWAS and later use the Curator tool to clone it and modify it to your needs.

CT EWAS checkout.jpg

Create new proteins/molecules

Please see [|guidelines]] on naming entities! If you search for a RGP that does NOT have referrers in the database you will get this message:

CT RPS no referrers.jpg

In this case, you will need to create the EWAS.

To do this, first check out the RGP of interest into your local project. Then, in the curator tool, select RGP in the class list, then scroll or search for the RGP of interest. Right click and opt to create EWAS from RGP.

Make EWAS1.jpg

It will ask if you want to accept the end coordinates described by Uniprot. Only say yes if you can confirm them to be accurate. If the end coordinates are not certain, the convention is to represent the start as 1, and end as -1.

Make EWAS2.jpg

In the newly created EWAS, the RefereneEntity will have a name and species entered by default. You can add an alternative name if this was specified by the Author, do this by right-clicking on the existing name and select Add. You must define the compartment. To do this, right click in the compartment slot and select the correct compartment in your local repository. Select the compartment and hit OK. If you don't see the desired compartment in your local project , click the "Browse database" button to search in gk_central.

Make EWAS3.jpg

If the protein you want to represent with an EWAS is post translationally modified, that is represented by completing the modified residue slot of the EWAS. For example, to indicate that the protein has a phospho-serine at residue 126, right click on the modified residue slot. You will be prompted to choose a modified residue instance from your local project, browse gk_central or create a new modified residue instance. Almost always you will want to create a new instance, as modified residue instances are specific to the RGP and residue position. Enter the Uniprot identifier for the ReferenceGeneproduct in the ReferenceSequence slot and right click in the PsiMod slot to select a modification type from within the local repository or by searching the gk_central database, and hit ok. Finally, enter the residue number in the coordinate slot and hit ok.

Modres1.jpg

The modified residue instance can now be applied to the EWAS by clicking ok.

Modres2.jpg]

The modified EWAS is shown below.

modres3.jpg

If you are annotating a protein fragment, you can define the start and end coordinates of the fragment as shown below: Don't forget to change the default name to indicate that it is a fragment.

CT EWAS fragment.jpg

Creating a Complex

This is done in one of two ways. Either:

Go to the Schema view, select Complex, right-click and select Create Instance. The Create A a New Instance dialog box appears. Enter a name in the field for Name. Typically you would also enter the Compartment, Species and identify the entities (EWASes, sets or complexes) that make up this complex using the field hasComponent. All of these fields are completed by either double-clciking to type, or right clicking to select Add and identify the correct item from the local project.

Or, by selecting the appropriate field in the details of an event or entity that contains a complex, right click and select Add. This will produce a 'Select Instance' dialog that preselects the allowed classes that can be added to that field. E.g. if you right click the Output field in a Reaction, the allowed classes include several types of set, complexes, polymers and EWAS. To create a new complex at this point, select Complex in the list of options on the left, and click the New button on the right. The process is then identical to that described above.

Creating a Set

Reactome has several types of set - refer to the Glossary and User Guide for definitions.

The most commonly used sets are Defined Sets and Candidate Sets.

Defined Set members should be proven equivalents, i.e. all of them have been demonstrated to perform the function that is described by the event they participate in.

Candidate Sets have two categories of inclusion, members, equivalent to defined set members, and candidates, members that are not proven to be functionally equivalent, but are believed to be equivalent based on phylogeny, domain structure etc.

Creating sets is a similar process for all subtypes, select the appropriate type in the Schema view, right-click and select Create Instance, fill in the Name and Species, right click to add the set members.

  • All members of a set must have the same compartment. The only time a set can have multiple compartment attributes is if its members themselves all have the same multiple compartment attributes, e.g., a set of membrane-spanning complexes with components explicitly located [on this side], in the membrane, and [on that side].

Creating a Pathway

Please see *note* below if you are adding a pathway that will be a new top level pathway. Here is description of how the outlined mini pathway "Regulation of activated PAK-2p34 by proteasome mediated degradation" is built from its component events in the curator tool. The pathway consists of 2 reactions: "Ubiquitination of PAK-2p34" and "Proteasome mediated degradation of PAK-2p34". For simplicity, the reactions have already been created (see section on creating inferered reaction for an example.)


Path0.jpg

With the pathway class highlighted in the class hierarchy, right click and select Create instance, or use the create instance button in the menu bar at the top of tool.

Path1.jpg


After adding the pathway title, associate reactions as component events of the pathway by right clicking on the hasEvent slot and selecting "Add"

Pathx.jpg

You can then select the events that you need one at a time or, as a short cut, you can search for your newly created events in your project if they have not yet been submitted to gk_central (they will all have DB_ID attributes that are negative ). To do this, search in your project for events with DB_ID containing - . From this list, you can hold the control key and select the events of interest.

Path3.jpg

The order of the events in a pathway is described through the use of the "precedingEvent" attribute on the reactions that are components of the pathway. If/when no preceding events is specified the order of the events displayed on the webpage reflects the order in which they are listed as components in the pathway instance that you are creating.

Path4.jpg

Once the component events have been added, the remaining required attributed are added. If the pathway that you are describing corresponds to a GO biological process, right click on the goBiologicalProcess slot and select set. Select the appropriate GO term from your local repository of gk_central. If you can't find the term of interest, ask for help.

  • note: If you are creating a top level pathway(check with Peter if this is appropriate), it must be listed as frontPage item. To do this, check out the frontPage instance from gk_central and add your pathways in the frontPageItem slot. Also please mark this in the editorial calendar as a front page item.

Creating a Reaction

Like all other classes, you can create a new reaction by selecting the Reaction class in the Schema view, right-click and select Create Instance, but it is perhaps better practice and more intuitive to create new reactions inside a pathway. To do this, select a pathway in the Event Hierarchical View, select the hasEvent property name in the details panel on the right, right-click and select Add. This leads to a dialogue for providing details of the reaction.

When creating a reaction, first enter the name:

Create rxn0.jpg


This may be enough detail for the moment, if you are simply creating a placeholder click OK.

To set the species of the reaction right click and select add in the species field.


Create rxn1.jpg


If the species that you happen to be working with is not in your local project you can opt to search for it in the database using the "Browse Database" Button in the dialog box. When you set/change the species or the compartment of a reaction, it will ask if you want to propagte the species/compartment to all of the component molecules as well.


Create rxn2.jpg


Use caution when selecting Yes. ONLY say yes here if you know that event and contained molecules have no referrers in the database that would be affected. (In other words ALL other reactions or complexes or sets in the db that make use of these now "changed" moleules would be affected).

Next add the compartment to the reaction. Right click in the compartment box and select add.

Create rxn3.jpg


Again if you don't have the compartment you need in your project, you can Browse Database to find the one you need.

Now add the input and output molecules. Right click in the respective box and select "Add".

Create rxn4.jpg


Select the molecule from the appropriate class:

Create rxn5.jpg


Important: After adding your input and output molecules, it is important to verify that your reaction is balanced (all molecules represented as input are also present as output). See the QA section below for a description of how to do this.


Add the literature reference(s). The references associated with a reaction MUST provide direct experimental evidence for the occurance of that reaction in the species you are annotating (i.e. human for human Reactome). If there is no direct experimental evidence in human then you need to create an inferred human reaction as described in the section below. Enter the PMID for journal articles, and say yes when prompted do have the details filled in automatically. A description of how to add other types of references (Books , URLs) will be added soon.

Create rxn7.jpg


Create rxn8.jpg


If you don't see the reference in your local repository then opt to Browse Database.

Create rxn9.jpg


Create rxn9b.jpg


If you can't find the literature reference in the database either, then you need to create a new one:

Enter the PMID (number only). Click out of the PMID box. You will be asked if you want to import the PMID record information. Say yes.

Create rxn10.jpg


You will now see the full record:

Create rxn11.jpg


Look back in your local repository and you will see the new literature reference. You an now add this as a reference for your reaction.

Create rxn12.jpg


Once you have added your references, you can a text summary for the reaction in the "summation"slot. Right click and select add.

Create rxn13.jpg


Then input your text and citations.

Create rxn16.jpg


Associate the references associated with the citations in the text summary using the "Literaturereference" slot for the summation as described above for reactions.

Create rxn17.jpg


Once the mandatory attributes have been filled, enter the remaining required attributed. Right click on edited to chose an instanceEdit from your project.

Create rxn17b.jpg


If you haven't created one previously, opt to create a new one by clicking on the New Instance button.

Create rxn18.jpg


Then right click on author to select a person. If you don't see the one you want locally, browse the database. The edited slot holds the name of the curator that should be credited with creating and editing the event and the date it was edited. This slot is filled with an instanceedit instance that contains this information.

Create rxn19.jpg


A date time stamp is created automatically for that instanceEdit and clicking "OK" will add this instanceEdit to the edited slot in your reaction.

Create rxn20.jpg


The authored and reviewed slots hold the instanceEdits describing the author/reviewer and dates of authoring/reviewing respetively. These are created as described above for the "editor" slot. When the reaction is read for release, the do_Release flag should be set to TRUE and the releaseDate slot should be filled with the appropriate release date.

Create rxn21.jpg


Important: After adding your input and output molecules, it is important to verify that your reaction is balanced (all molecules represented as input are also present as output). See the QA section below for a description of how to do this.


When you can define preceding events for an event it should be done. One precision on that point is that a preceding event for a reaction should always be a reaction/reaction like event and not a pathway and the preceding event for a pathway should be another pathway when relevant and not a reaction or reaction like event.

Creating an inferred event

When constructing a human pathway, a curator will come across events that have no direct experimental evidence in humans, but have supporting experimental data from another/other species. If experts in the field believe that the event can in fact occur in humans, the 'other species event' can be used to infer the human event. In the case illustrated below a human reaction is inferred from in vitro experimental results using proteins from human and Oryctolagus cuniculus (rabbit).

Infer1.jpg

Here a reaction "Proteosome mediated degradation of PAK-2p34" is created and the species Homo sapiens and Oryctolagus cuniculus are assigned. To avoid having a human and a non-human reaction with identical names, it can be useful to use capitalized forms of object names for the non-human reaction and all-upercase names for human, e.g. Jak2 and JAK2 for the non-human and human proteins respectively. A text summary and the literature reference providing evidence for this mixed species reaction is added.

Infer2.jpg

The rabbit protein "PAK-2p34" is seleted as an input

Infer3.jpg

The human set "ubiquitin" is selected as an input...

Infer4.jpg

...and the mixed species output complex "PAK-2p34" is selected as the output.

Note that this complex is not natural, it only occurs in vitro. To flag this the complex "isChimera" attribute is set to True.

Infer6.jpg

Since the reaction is also multispecies and not a natural event, the isChimera flag is also set to true for the reaction.

Infer7.jpg

Now that the reaction to be used for inference has been created, create the same reaction for human, using human participating molecules. Note, however, that the literature reference is not associated with this human event! Instead, in the "inferredFrom" field, right click and select "Add" to enter the non-human reaction used for inference.

Infer8.jpg

Selecting the mixed species reaction created previously...

Infer9.jpg

...now the link to the inferred reaction has been made.

Infer10.jpg

Adding a cross reverence to a new database (not previously existing in Reactome) to an instance

If you want to add a crossReference attribut you can add this as long as you create an instance for the linkable database.

Connection between a generic and specific reactions

You may come across a situation where it's convenient to create a generic, all-encompassing reaction in which a set of proteins perform the same function. Specific proteins from this set may be used elsewhere in other pathways as specific, single reactions so we want a way to indicate the specific reaction is one reaction derived from the generic reaction. The way to indicate this is to use the "hasMember" property of the generic reaction.

An example is the ABCC family of transporters mediating organic anion transport across the plasma membrane. Three of the proteins from the set of proteins in this reaction are involved in three specific reactions elsewhere. To show there is a connection between them and this reaction, use the hasMember slot to indicate these three reactions are specific examples of this generic reaction (as shown below)

HasMemberCT.jpg

Creating a Catalyst

Reactions that involve a catalyst should include this information. Within the Reaction Details, the field is called catalystActivity. To complete this you need two things: the physicalEntity or object that is acting as catalyst, and the Activity of that object, defined as a GO molecular function. The physicalEntity will be a molecule or set or complex probably in your local project. The GO molecular function can be identified by consulting Uniprot, look at the ontologies section for GO Molecular Function. If none of the listed terms seems to be appropriate, either Browse Database for terms in gk_central, or use the OLS website at http://www.ebi.ac.uk/ontology-lookup/ to identify the correct term first. Use the most specific term possible.


Creating a new ChEBI entry

Using SMILES strings as input for the ChEBI submission tool
To use the submission tool, you must get a user name and password, and log in. Go here to do that. Once you have logged in, click "create a new submission" from the choices at the bottom of the page. That will cause a new line to appear in the table of "your active submissions" on that page. Click the "edit submission" option on that line to open the actual submission form.
Under the ‘Name And Structure’ section of the Submission tool, select ‘Edit Structure’.
Under the ‘Edit’ Menu, select ‘Import Name’.


ChEBI.1.jpg

An input box named ‘The Source – Name’ appears, this is the place to paste your SMILES string.


ChEBI.2.jpg

In this example, the following string for 1-PP-IP5 is used: OP(O)(=O)O[C@H]1[C@H](OP(O)(O)=O)[C@@H](OP(O)(O)=O)[C@H](OP(O)(=O)OP(O)(O)=O)[C@H](OP(O)(O)=O)[C@@H]1OP(O)(O)=O


ChEBI.3.jpg

After the SMILES string has been pasted, select the ‘File’ menu in ‘The Source – Name’ input box. Select ‘Import As’. Maybe displace circle or enlage so as not to hide word “File” in menu

ChEBI.4.jpg

Make sure ‘Import as Recognized (SMILES)’ Import Mode has been selected and click on ‘Import’.

ChEBI.5.jpg

The chemical structure defined by the SMILES string should now be present in the box, ready for editing. Press ‘Update structure’ to obtain details of the structure.


ChEBI.6.jpg


Structure details are now displayed in the right of the page. The structure on the left hand side can now be edited as you wish.

ChEBI.7.jpg

Annotating the regulation of a process

The following organization of regulation events works well for many kinds of processes, and it fits with our view that all parts of a process should be grouped, while respecting GO's view that regulatory events should be distinguishable from the rest of the process.

All about [process] (pathway) --The steps of [process] (pathway)


[process]reaction 1
[process] reaction 2
etc.

--Regulation of [process] (pathway)


[process] regulatory reaction 1
[process regulatory reaction 2
etc.


Regulation here can include reactions that are themselves concrete molecular transformations whose effect is to modulate one of the main process reactions by activating an enzyme, or providing or sequestering an input molecule.

but can also include airy things like "[this regulatory event] by an unknown molecular mechanism positively or negatively regulates [process reaction #] .

Modifying and Deleting

It is important to understand that if you locally modify or delete an instance you checked out from gk_central, it will also be modified/deleted in gk_central when you synchronize. You must ALWAYS CHECK FIRST that the instance you intend to modify/delete is not in use elsewhere, outside your local project. The best way to do this is to search for it using the Database Browser Schema View, select referrers, if it has any you didn't know about do not modify or delete it! There may be circumstances when you think something should be modified or removed but if it has been used by another curator, check with them first, or contact an experienced curator for advice.

Diagram checks after deleting entities or reactionlikeEvents

If and when you need to delete an entity from gk_central, you must run the deleted object in diagram check over gk_central to find any diagrams that have used those instances. A description of how to run this check is shown here.

More curation examples

Another example of the annotation process can be found here.

Project QA using the Curator Tool QA checks

Within the Tools menu the "QA Check" menu can be found.

QA Check Menu



This menu has a six separate QA script items within it.

  • Imbalance Check (checks that the molecules present as input are also present as output)
  • Mandatory Attributes Check (checks that the mandatory attributes for a class have been entered)
  • Required Attributes Check (checks that the required attributes for a class have been entered)
  • Compartment Check For:
    • EntitySet (component of set members matches compartment of set)
    • Complex (component of complex matches compartment of individual components)
    • Reaction (component of reaction matches compartment of individual components)
  • Diagram checks

Note: In order for the QA checks to effectively pick up errors, the project that you are working on must be fully extracted from the database. Instructions on how to do a full extraction can be found here.

Imbalance check

You must select Reactions in the hierarchy to perform this check.

QA3.jpg


Reactions are flagged as cleavage reactions if the output differs from the input only that the output contains "fragments" of the input molecule. A true imbalance is shown below:

QA4a.jpg


Mandatory attribute check

QA1.jpg


A list of instances missing mandatory attributes (by class) is shown. To make the missing attributes of the instance easier to see, you can use the "order attribute" button (downward arrow with circle and triangle) in the upper right side of the tool. This orders the instances by type (mandatory, required, optional...etc)

QA2.jpg


Required attribute check

This checks work in the same way that the mandatory attribute check works.

Compartment check

Select the class you want to check in the hierarchy panel

QA5.jpg

Compartment conflicts are indicated in the bottom of the dialog box.

Comp conflict set.jpg

Species checks

These checks work in the same way that compartment checks work with the exception that Pathways are also checked for species conflicts.

Diagram checks

  • Deleted objects in diagrams

When an entity or reaction is deleted in the instanceview of the curator tool, it must also be removed manually from any diagram that it has been drawn into. This does not happen automatically. This check is run over gk_central and will look for any diagrams that are affected by the deletion of a reaction or reactionlikeevent. This check MUST be run after any deletions of reactionlike events or entities have been committed to gk_central so that the affected diagrams can be identified and the appropriate changes made in any affected diagrams.

Deleted objects1.jpg

Any affected diagrams will be flagged. The DB_ID of the deleted instance will be displayed, but to see the affected "objects" in the diagram, you will need to view the diagram in gk_central.

Deleted objects2.jpg

Select that diagram in gk_central, right click and opt to "Show diagram".

Deleted objects56.jpg

Affected objects will be flagged

Deleted objects11.jpg

and the objects in the diagrams highlighted in red.

Deleted objects14.jpg

QA of projects before release

Because of the nature of the release process and the growing number of curators submitting projects the QA load has become greater. One of the solutions to this problem is for curators enter this data right from the begining and to run the QA checks in the curator tool before finishing their projects

  • Top Six List Of Problems Identified During the Slice
    • Species
    • Complex Balances
    • UniProt IDs
    • Complex Compartment Checks
    • Entity Compartments
    • Balancing Of Reactions

QA Reminders

1. All of the instances must be updated from gk_central in order for these checks to be meaningful
2. QA Checks should be run regularly, once you have created a reaction or even a bunch of EWASs.

Everyday QA includes:

    • Complete check-outs (No shell instances)
    • Match instance in DB
    • QA Tools

Drawing a pathway diagram

Reactome pathway diagrams are drawn and viewed in the curator tool using the ELV pane of the tool. This example will shows how the superpathway "Regulation of Apoptosis" is diagrammed. If you are drawing a new diagram, see below. The pathway Regulation of Apoptosis is part of the supercanoical Apoptois pathway. Here, you can tell that Regulation of Apoptosis has been annotated but not yet incorporated in the Apoptosis diagram. You can tell this because the pathway is greyed out in the event hierarchy.

Diagram1.jpg

To incorporate this pathway in the Apoptosis diagram, simply click on, hold, and drag the pathway from the hierarchy to the diagram.

Diagram2.jpg

If you right click an any of the pathway boxes, you are offered the option to open Diagram. If one has been created it will open.

Diagram3.jpg

If it has not, as in the case of Regulation of Apoptosis, you get the below message.

Diagram4.jpg

Select "No" and an empty diagram will be opened in the pathway editor pane.

Diagram5.jpg

To create the cellular compartments that you need for the diagram, click on the shaded square in the menu bar for the Pathway editor. It is best to create all the compartments that you will need before you start to lay out reactions. Here cytosol is created.

Diagram6.jpg

To reposition the compartment, click on it and drag.

Diagram7.jpg

To enlarge the compartment, click on it to select it and then grab the compartment at one of its nodes in the corners. Then drag outward.

Diagram8.jpg

To begin drawing, select a reaction from the event hierarchy and drag it onto the diagram.

Diagram9.jpg

To see the names of the compartments of the reaction participating moleules, right click anywhere on the diagram and opt to show compartment names. This will make it easier to see that the molecules have been positioned in the correct compartment.

Diagram10.jpg

Diagram11.jpg

To reposition the reaction you can click and drag different components or you can select the entire reaction by clicking and dragging a selection bax over it. Then the reaction and all its component molecules can be moved as a unit.

Diagram12.jpg

Additional reactions are dragged out and positioned one at a time.

Diagram13.jpg

r>

Diagram14.jpg

Reactions are of one of 5 types: Transition, Association, Dissociation, Omitted process, and Uncertain process. Transitions involve the moleules changing state, Association is a binding reaction, Dissociation is the Dissociation of a complex. Omitted process, and Uncertain process are currently not used. To apply a reaction type to a reaction, right click on on reaction, select change type .

Diagram15.jpg

Select type of interest. Here it is an association.

Diagram16.jpg

Continue to drag out, and map, and assign reaction type to the remaining reations.

Diagram17.jpg

Diagram18.jpg

Diagram19.jpg

Diagram20.jpg

Diagram21.jpg

Diagram22.jpg

Diagram23.jpg

Diagram24.jpg

Diagram25.jpg

Diagram26.jpg

Once all of the reactions have been laid out, the compartment names on molecules can be hidden by right clicking on the diagram anywhere and selecting "Hide compartment in names".

Diagram27.jpg

Right click and select "Tight Node Bounds. This will reduce the space left by removing the compartment names. Diagram28.jpg

Here is the completed diagram. Diagram29.jpg

Use of pathways icons as links to related "pathways" in diagrams

If you want to include a link to a pathway that is not actually "part of" the pathway that you are diagramming, you can do this by checking out both the pathway you are diagramming (pathway A) as well as the pathway you'd like to include as a icon (pathway B) using the Event view. Open the diagram of pathway A in the ELV view. Then, drag the icon of pathway B into the the ELV. Save and commit the changes. Then, redeploy the pathway A diagram.


Drawing a new pathway diagram

If you are creating a diagram for a new top level pathway (check with Peter/Lisa if this is appropriate) ,remember that the pathway itself must be listed as frontPage item in order to see the deployed diagram. To do this, check out the frontPage instance from gk_central and add your pathways in the frontPageItem slot. Also, please mark this in the editorial calendar as a front page item and inform. In order to see the changes the Pathway hierarchy will need to be updated on the 8084 site. Please contact Peter or Lisa to do this.

If you are creating a new diagram diagramming a pathway (and is not a top level pathway, you will have to make sure that the pathway is represented (as an icon) in a diagram that represents (or is part of) a top level pathway.

Preparing for database releases

A full description of the release procedure can be found in the release SOP:

Curation Tools

Remote Attribute Search Tool

http://reactomedev.oicr.on.ca/cgi-bin/remoteattsearch2

or on live site:

http://www.reactome.org/cgi-bin/remoteattsearch2?DB=gk_current

Examples of how to use the remoteattsearch tool can be found here.


Identifying list members that are unique to one of two lists using Microsoft Excel

Here is a procedure that describes how to take two lists and compare entries to identify those that are present in only one of the two lists. This procedure can be useful, for example, to compare the list of proteins in the gk_central vs. live site to find those that are unreleased.


Advanced Curation

CuratorTool Tools

Helpful Tips

Create the framework for the pathway(s) you intend to curate before filling in the details. Start by creating a pathway, add to this new or existing reactions in the correct order, complete the summations and literature citations, then identify the EWASES, Complexes, Sets etc. required and complete the details of the reactions consecutively. Cascading signaling processes can involve very complicated Complexes, in these circumstances the Graphic Display in Entity Hierarchical View is very useful as an overview of the order of events.

FAQ