New Reactome Curator Guide
- 1 A Guide for Reactome Curators
- 2 Introduction
- 3 FAQ or "How do I ...?"
- 4 Choosing a topic/pathway for curation
- 5 Understanding and using the data model
- 6 Guidelines for Naming Entities and Events
- 7 Using the curator tool
- 7.1 Downloading, Installing, And Maintaining The Curator Tool
- 7.2 Curator tool user interface
- 8 Annotating a disease process
- 8.1 Introduction
- 8.2 Researching a disease
- 8.3 Structuring the disease hierarchy and disease display
- 8.3.1 Overview
- 8.3.2 Structuring the disease pathway
- 8.3.3 Associating normal and disease pathways with the same diagram
- 8.3.4 Highlighting disease entities and events in red
- 8.4 Specifics of annotating disease events and entities
- 8.4.1 Assigning Disease tags: assigned to events and entities
- 8.4.2 Disease Event specific classes and inference
- 8.4.3 Disease entity specific annotation
- 9 The annotation process
- 9.1 Precuration
- 9.2 Data entry
- 9.2.1 Identify existing proteins/molecules
- 9.2.2 Create new proteins/molecules
- 9.2.3 Creating a Pathway
- 9.2.4 Creating a Reaction
- 9.2.5 Connection between a generic and specific reactions
- 9.2.6 Creating a Catalyst
- 9.2.7 Creating a new ChEBI entry
- 9.2.8 Annotating the regulation of a process
- 9.3 Modifying and Deleting
- 9.4 More curation examples
- 9.5 Project QA using the Curator Tool QA checks
- 9.6 Drawing a pathway diagram
- 10 Preparing for database releases
- 11 Curation Tools
- 12 Advanced Curation
- 13 Helpful Tips
- 14 FAQ
A Guide for Reactome Curators
The goal of Reactome is to describe the known biochemical details of human biological pathways. A Reactome curators job is to work with experts in different fields of biology to identify and curate suitable human pathways (see below) by breaking them down into subpathways and reactions and describing them in a format that is compatible with the Reactome data model. The purpose of this guide is to describe each step of that curation process to help the curator fully understand the steps involved. The guide is a useful reference for the experienced curator as it is nearly impossible to remember all of the steps and details of the curation process. Curators are actively encouraged to add to sections of the guide providing useful hints that may be missing for current and future curators. Regular use and addition of material to the guide will facilitate the process of curation and increase annotation consistency.
If in doubt about any of these steps...do not hesitate to e-mail the internal list for clarification (and then contribute any helpful info that you get to save someone else the trouble of asking later!)
This guide is divided into 7 sections.
- FAQ ("How do I ....?") A list of curation common curator related questions organized by keyword
- Choosing a topic
- Guidelines for naming entities and events
- How to use essential data model classes (using a curated example)
- Creating a "Reactome friendly" framework of your pathway module
- The essential QA process during curation
- Using the curator tool
- Drawing diagrams
- Preparing for database releases
Here we provide basic guidelines as to how data for relevant pathways should be collected, organized and entered into Reactome via the Curator Tool. The process of converting a biological topic should maintain the referential integrity of the reactions, both newly added as well as those already present in Reactome. It is assumed that you (the reader of the guide) have some familiarity with Reactome, and have read the Reactome papers. They are freely available as PDF files. Though this document is oriented toward the Reactome curator, there is much here for outside groups using Reactome. The entire dataset, website, curator, and author tools can be downloaded from the Reactome download page. Installation and configuration instructions are also provided here. If you have any problems with the install or tools please contact us at email@example.com.
FAQ or "How do I ...?"
A key word organized FAQ sheet for Reactome curation questions can be found here. Please add to this as you come up with solved examples for your own questions.
Choosing a topic/pathway for curation
Focus first on:
- Areas of biology that you know well (or one in which you have good contacts).
- Pathways that are well characterized at the molecular level AND for which there is considerable HUMAN biochemical data and proteins that are new to Reactome.
Points to consider:
- Give lower priority to topics that will require a large amount of inferences from other species.
- Don't try to tackle huge pathways all at once.
- Break large topics into *manageable* pieces. Experts are generally unwilling to commit to a project bigger than a dozen rxns and it is easy to get lost in the details and lose focus which is costly in terms of time.
- Work on several (preferably related) small projects simultaneously. Experts often fail to come through at critical times so it is good to have a few projects at different stages to keep work flowing through the pipeline.
- Check the Editorial Calendar to ensure that your plans don't ovelap with another curator. If in doubt contact the curator.
- Check with Peter - he may be aware of planned work that is not recorded in the Editorial Calendar.
Understanding and using the data model
A detailed understanding of the Reactome data model is important for accurate and consistent curation. The best way to start learning about the different data classes and how to use them is to browse the data model glossary. Some classes are used infrequently but you still need to be aware of them. Some of the more important/confusing data classes and attributes are listed below along with some relevant use cases. Please add to this if you come across new and useful cases. All the classes have associated data fields, divided into several categories. These are described. Below them are some usage examples from curated Reactome pathways (mostly from Apoptosis right now). We will continue to add to this list (and provide more actual examples) and we encourage you to add any informative examples of your own.
Key Reactome data classes and relevant use cases
Conceptual Approaches To Curating and other hints
Mandatory, Required, and Optional
All the classes of data in Reactome have a number of details that MUST be completed before thay can be 'released', i.e. made visible to the public. Fields containing details have a symbol that indicates the type, either a red box with a letter superimposed, or a yellow diamond with n superimposed. The yellow diamond with n indicates fields that are automatically completed. The red squares indicate, by letter,
- m - Mandatory - you must provide this information.
- r - Required - you must provide this information if it is possible - a good example of this is species, required for defined sets of proteins, but not required for defined sets of small molecules.
- o - Optional - not available for every instance, so cannot be 'Required' but in some circumstances may be a procedural requirement, e.g. inferredFrom must be completed for human reactions that are inferred from a model organism.
The main concepts in the Reactome data model are Event and PhysicalEntity. Events are of two types: Pathways or Reactions. Pathways in Reactome are multi-step events, whereas Reactions are single-step events (at the molecular or atomic level).
A concrete curated example is the Reactome Apoptosis pathway which is broken down into the subpathways "Extrinsic pathway", "Intrinsic pathway", "Activation of Effector Caspases", "Execution phase" and "Regulation of Apoptosis".
Events may be linked to other Events that precede them, regulate or are regulated by them. Reactions contain PhysicalEntities that take part in the Event. At present there are two subclasses of Event: ReactionlikeEvent and Pathway. The ReactionlikeEvent subclass has 4 subclasses: BlackBoxEvent, Depolymerisation, Polymerisation, and Reaction.
A Reaction is an event that converts inputs to outputs in a single step.
PhysicalEntities can be the inputs, outputs, catalysts, regulators, or requirements in Reactions. PhysicalEntities can be single entities, such as proteins, small molecules, RNA, DNA, carbohydrates, lipids, or sub-atomic particles. They can also be complexes consisting of a combination of any of the single entities, or polymers synthesized from the single entities. Related entities can be grouped into a set. *The use of sets is described below*.
Here are some common examples of different types of reactions and blackbox event use cases. Clicking on the blue highlighted class name will bring you to some additional usage hints and warnings.
Simple binding/complex formation reaction
Simple dissociation reaction
A BlackBox event converts inputs to outputs in a multiple steps that are not annotated because:
- We don't know all of the intervening steps.
- The intervening steps are known but we do not want to curate them for the purposes of the module. Two examples are:
- the transactivation of a gene leading to production of a protein....we don't want to include all the steps of translation of that protein.
- The degradation of a protein.
Guidelines for Naming Entities and Events
Curators should use the following guidelines when naming entities:
1. EWAS: Protein entity names are derived from the gene symbol, which we copy from UniProt. This symbol is used as the name of the unmodified, canonical protein, as represented in UniProt by the feature 'Chain'. To learn how to extend the name to represent post-translational modifications and what to do if there are multiple Chain features in UniProt, look here
3. Small molecules are derived from ChEBI. Use the recommended abbreviation in the table below or if not present, the ChEBI recommended name.
Reactome makes frequent use of complexes and sets. In many cases these do not have obvious names in the literature. Here are the guidelines for naming sets and complexes, to maximize searchability and consistency and reduce the chances of ambiguity.
Complexes should be named as a concatenated list of their components, separated by colons with no spaces, e.g. GRB2:SOS1
If a complex contains more than one of an entity, the entity name is preceded by the count, e.g. 2xPPOX:FAD
Sets should be named as a concatenated list of the contained entity names, separated by commas with no spaces, e.g. ASXL1,ASXL2.
Candidate set names should enclose the names of candidates in parentheses e.g. BRPF1,(BRPF2,3).
These rules should always be used to generate a full name. As a general rule, you should not consider using an alternative name if your complex/set has fewer than 5 components/members. However, under some circumstances, an alternative name may be used as the first (uppermost) name in the list of names seen in the Curator Tool. The name is used as a label for the diagram object.
- In rare cases, a set or complex may have a universally recognized common name. These common names are acceptable if there is no possibility of ambiguity.
- Don't include the words "complex", "heterodimer", "homodimer", "associated with", "bound to" in the complex name. Dimer, trimer etc. are acceptable alternatives to the 2x, 3x prefixes.
- When a set contains entities that have a common prefix to the gene symbol, it is acceptable to use the prefix once, followed by the suffixes of additional members, e.g. AQP3,7,9,10. A full version of the name should be included as a second name for the set, for the preceding example the full name is AQP3,AQP7,AQP9,AQP10.
- If a set is believed to contain an entire family, it can be shortened to show only the common gene symbol prefix, e.g. all Wnt proteins in a set could be named Wnts. Note that the name must be written in plural form (ending in s).
- If the contents of a large set (more than 4 members) are mixed and there is no common gene symbol prefix, it is acceptable to use a name that indicates the set role, e.g. Ligands of CD36.
5. Sequence variants: see Human Genome variation society description of sequence changes at protein level.
7. Genes and mRNA
Genes are named using the gene symbol followed by the word 'gene' in lowercase. Messenger RNA molecules are named using the gene symbol followed by the abbreviation 'mRNA'.
Using the curator tool
Downloading, Installing, And Maintaining The Curator Tool
Instructions for Downloading Installing and Maintaining the Curator tool can be found here.
Curator tool user interface
The curator tool has three panes to view the content. These different views can be switched using the tabs in the upper left hand corner of the curator tool window.
The Menu Bar
The menu bar contains a number of shortcut buttons, a scroll over message will tell you the function of each button.
The Schema View
The Schema view provides a hierarchical list of the data classes. In this view you can search, open, edit and create instances of these classes. This panel is used most often for creating new items, such as an EWAS derived from a ReferenceGeneProduct.
The Event View
This view is where most curation is done. The event view displays a hierarchical, alphabetical list of the events in the local project on the left, a graphic in the middle that shows the relationship between objects for selected events, and on the right the details of selected events. Unfurling a pathway by clicking on the + symbol to the left of its name in this views reveals all of its component Reaction events. DON'T CONFUSE these symbols for the check box - clicking on an unchecked box marks the event as ready for release onto the public website; clicking on a checked box has the opposite effect. For pathways the check box selects the ENTIRE pathway, doing this by mistake can be time-consuming and tedious to reverse so be warned!
The Entity Level View (ELV)
The ELV is where pathway diagrams are created and subsequently represented. In this view the pathway is layed out graphically showing the inputs, outputs, catalyst where appropriate, and regulatory molecules, each in the correct cellular compartment.
Synchronizing local projects with the database
Synchronize at least once a day. It is good practice to synchronize your local project with the gk_central database to ensure that your work is not lost should the local copy become corrupted or lost due to hard drive failure. It is useful for other curators who may be able to reuse EWASes or other instances that you have created. It is perfectly acceptable to 'check-in' incomplete work unless it is marked as 'doRelease'.
To keep the data in the opened project and the database repository
consistent, you need to synchronize the local project with the database
periodically. Several actions are available for you to do synchronizing.
The following is the "Database" menu for doing database-related actions:
"Match Instance in DB..." can find a matched instance in the database repository for the selected instance in the local repository. A matched instance has the same defining attributes as the local one. If a matched instance can be found, you can merge the local one to the database one. "Compare Instance in DB..." can compare a selected instance with the corresponded one in the database. Corresponding means the same DB_IDs. So you should not compare one instance checked from database A to an instance in database B because same DB_ID might be assigned to different instances. "Update from DB" will update a checked out instance from the database. "Check In" is used to check in newly created or modified instances to the database. It is recommended that you should use "Compare Instance in DB..." first before "Update from DB" or "Check In". After a new instance or modified instance checked into the database, the marker ">" will be removed to indicate that the local and the database copies are the same.
The "Synchronize with DB..." menu is used to check all instances in
the selected schema class in the schema view or in the whole opened
project if no class is selected. There are four possible inconsistency
categories between the two repositories: instances different between the
local and the db repositories resulting from instance modification
either locally or in the database, instances created locally, instances
deleted in the database by you or others, and instances deleted locally
by you. You can choose appropriate action for the selected instances in
different categories. Please be aware that if you do a multiple
selection from different categories, the enabled actions are applied to
all selected categories. For example, actions "Update from DB" and
"Commit to DB" can be applied to instance "AIF-mediated response" in the
first category, but only "Commit to DB" can be applied to instance
"TestPathway" in the second category. If you select both "AIF-mediated
response" and "TestPathway", only "Commit to DB" is enabled and "Update
from DB" is disabled. Double clicking an instance in the first category
will popup a comparison dialog the local and database instances, while
double clicking an instance in other categories will show the contents
of the clicked instance. To deselect a single instance, hold the control
key and click the selected instance.
There are three different cases in the first category, instances different between the local and the db repositories: instance is modified in the local project but not in the database, instance modified in the database but not in the local project, and instances are modified in both the local and database. The user can commit changes to the database or overwrite changes by updating from database for the first case. The user cannot commit changes for an instance in the second or third case, but can update from database for both cases.
The following is the list of icons used in the synchronization
What do I do if the curator tool flags instances as being duplicated in gk_central
You will need to evaluate each instance that is flagged as being a duplicate and compare it to the instance that is flagged in gk_central.
This is easiest to do before synchonizing your project with the database as described below:
Search the local project for all database instances that contain ‘-‘, then highlight each one in turn, right-click to open a pop-up menu, and choose “match instance in DB”. Any results returned by that query are already-existing instances that, by the rules of gk_central, are duplicated by the new local instance. Mostly, that’s right and you should accept the option offered by the form, to replace the local instance in the local project with the already-existing one from gk_central. That cleanly gets rid of the duplication and preserves all references correctly in the local project and in gk_central. Sometimes (rarely) the form is wrong because, despite identical defining attributes, two instances are genuinely different, e.g., if a person instance has already been created for J(ohn) Doe and you have now created one for J(ane) Doe. In this case, you should refuse the offer made by the form and should make a note to force the new instance into gk_central at synchronize-and-commit time.
Annotating a disease process
We describe the annotation of disease in two steps. The first, Researching a disease, briefly describes some useful resources for accessing disease information for curation. The second Structuring the disease hierarchy and disease display describes how to structure a disease pathway and how this shows up on the live site. The third section annotating disease events and entities has the nuts and bolts of disease event and entity curation
Researching a disease
Resources for disease couration Online Mendelian Inheritance in Man (OMIM) - A classic resource. The entry for SERPINC1 (antithrombin III) is a good example. The text rambles (its a cumulative historical narrative of relevant published results) but as an annotated bibliography it is generally quite complete, with very good coverage of papers describing molecular and cellular biology of the gene and molecular bases of associated disease. The entry for each gene includes a list of mutations with phenotypic effects (under the table of contents link on the right of the page, choose allelic variants) and links to locus-specific databases which sometimes have much more extensive catalogues. (For a while, the catalogue that had the most extensive coverage for the most human genes was HGMD, accessible from OMIM table of contents -> external links -> variation. It may still, but its hard to tell because a license is needed to access it now.) It looks like searches in OMIM for a UniProt ID fail (e.g., searching for P01008 in UniProt does not return SERPINC1) but the UniProt record for a protein includes links to its OMIM record. Catalogue of somatic mutations in cancer (COSMIC) - For cancer mutations, COSMIC is useful, since it provides information on mutations detected in individual tumors (far from ideal, not all types of mutations are equally covered). In addition, it is good to start with one or two good reviews just to get a bird perspective view of the landscape, and then to focus on 1-3 research publications where lots of patients were analyzed, which gives you an idea on the frequency of different mutations. This helps you create your mutant top list, since we aren't capturing all of the mutants, of course. After that, you follow your favourites through PubMed. If you're lucky, you may find a group that maintains a database of mutations for a specific gene. Gene Reviews - Entries provide concise reviews of the genetics and clinical biology of many human diseases, as well as links to clinical testing laboratories, support groups, and other clinical resources that are not of much use to us. Its coverage is less comprehensive than OMIMs and much more narrowly focused on material useful to the working doctor who has just encountered an affected patient and is wondering how to proceed.
Scriver's Metabolic and Molecular Bases of Inherited Disease - A comprehensive coverage of the genes and genetic mechanisms underlying human disease states. Based on Scriver's Metabolic and Molecular Bases of Inherited Disease, first published in 1960, OMMBID is a digital reference tool that provides geneticists, researchers, students, clinicians, and fellows involved with the causation and treatment of inherited diseases with information.
Structuring the disease hierarchy and disease display
There are two main classes of disease pathway. The first type of disease pathway has a normal counterpart and shows perturbations to a normal biological process that result from disease entities with altered behavior relative to the WT entities. This type includes, for instance, cancer pathways that arise as the result of gain- or loss-of-function mutations in oncogenes or tumor suppressor genes. Both gain- and loss-of-function reactions with a corresponding normal biological pathway will be displayed (highlighted in red) in the context of the greyed-out WT pathway. Loss-of-function disease events are automatically overlaid on the corresponding normal reactions by virtue of their normal reaction attribute (see below). If the disease hierarchy is constructed appropriately (described below), these loss-of-function reactions are visible on the (curator and live) website *only; a curator does not see them displayed in the ELV in the curator tool. In contrast, gain-of-function disease events must be manually placed in the ELV by the curator, and are visible both in the curator tool and on the curator/live site. The second type of disease pathway has no normal counterpart. This type includes, for instance, events that occur after the introduction and expression of foreign proteins encoded by genomes of infectious agents like viruses and intracellular parasites. Like the gain-of-function disease events described above, these novel events must be manually laid out in the curator tool, with foreign/disease entities highlighted in red. Because there is no normal WT pathway for these events, the ELV is unique for the disease pathway and does not have the greyed out background. Disease pathways, of whichever type, must be added to the Disease chapter. To do this, check the disease chapter out, add the new disease pathway to the list of contained events and update the disease chapter diagram by dragging the new disease pathway name onto the disease diagram to generate the green box for new disease pathway.
Structuring the disease pathway
Disease Pathways without a corresponding "normal" pathway in Reactome
If the disease pathway will not have a corresponding "normal pathway" in Reactome, the pathway should still be placed under the disease chapter. These pathways are labeled with an appropriate disease term (see below) and all species that contribute to the reactions and events (ie, generally, the human host and the infecting species, which is entered in the related species slot, see below). Red lines (see below) should be used to emphasize disease entities and events in other disease pathways - e.g. viral proteins and their reactions with human proteins. Highlight any reaction that has to do with disease progression and any entity that is from another species. Host entities should be left black, but complexes that have host and other species are colored red. Coloring host entities red would be misleading, even if the host proteins are hijacked into doing something that has to do with disease progression. In these cases the reaction lines are red, but the host entities are not highlighted. Below is a screen shot of part of the Toxicity of botulinum toxin type A pathway, showing disease entities and disease reaction lines highlighted in red, but WT human entities outlined in black as usual. Because this is a disease-specific ELV, there is no greying out of the background.
The screenshot below shows the structure of the disease pathway. The parent pathway is labeled with a disease term (see below), and contains 5 disease events. The pathway is labeled with the two species that contribute to the reactions: Homo sapiens in the main species slot, and Clostridium botulinum in the new RelatedSpecies slot. This class was introduced to prevent inappropriate inferring of pathways to the other species during website release.
Disease pathways with a corresponding "normal" (wild type) pathway in Reactome
These disease pathways need to be linked to the WT ELV to allow proper display of these events. This is accomplished by structuring the disease pathway appropriately and by sharing the WT diagram with the disease pathway. Both of these are explained in more detail below.
Structuring the disease hierarchy
There are at least 2 different ways a disease pathway can be structured, and the display on the website will change accordingly. At the moment, it is largely at the curators discretion as to which type of disease hierarchy to use. The simplest module that can correctly associate a disease pathway with the appropriate WT ELV is structured as in the following example
- Processing-defective Hh variants abrogate ligand secretion (a holder disease parent pathway)
- Hh ligand biogenesis disease (disease pathway)
- Hh ligand biogenesis (WT pathway)
Here, all the disease events are contained in a single disease subpathway Hh ligand biogenesis disease, and are housed in a new disease parent pathway Processing-defective Hh variants abrogate ligand secretion along with the WT subpathway Hh ligand biogenesis. Once the WT ELV is shared with the new disease parent pathway (described below), all the disease events contained in the child disease pathway Hh ligand biogenesis disease will appear highlighted in red at the same time in the ELV as shown in the screenshot below:
This pathway happens to have a loss-of-function event (hence the red Xs to indicate WT products that are not generated) and some disease-specific gain of function events, laid out manually in the ELV.
Clicking through the disease hierarchy on the website highlights the selected disease event in blue, against the background of the remaining red highlighted disease events:
The essential point is that, with this hierarchy structure (one disease parent pathway holding a single disease subpathway and a single WT counterpart pathway), all the disease events show up highlighted in red in the ELV at the same time. This is the way all our initial disease curation was done, so the oldest disease pathways (Signaling by EGFR in cancer, Signaling by FGFR in disease) have this type of display. Other examples of more recent disease pathways with this layout include Diseases associated with visual transduction and PI3K/AKT Signaling in Cancer. In fact, this structure can be even more simple, with the parent disease pathway holding a single disease *reaction along with the corresponding WT pathway.
More complicated hierarchies allow the display of *subsets of disease events. This is achieved by stringing together a bunch of these simple modules (each with parent disease pathway holding disease pathway/event and WT counterpart) under a grandparent pathway, as in the example below for Mucopolysaccharidoses:
Unfurling the hierarchy shows that each disease subpathway now contains a disease pathway/reaction and the WT pathway.
The display on the website now shows in red only the particular subpathway disease event selected in the hierarchy; all other disease subpathway events have no red highlighting:
Clicking through the disease hierarchy allows successive highlighting in red of individual disease subpathways/events.
In some cases, this type of display may be preferred by external reviewers or experts as it allows unique display of mutations associated with different diseases, as in the mucopolysaccharidoses example above. In other cases, where multiple inputs for a single WT reaction are known to have mutations associated with them in disease, but the mutations are known (or believed) not to occur at the same time in the same patient, this type of display is required. As an example, WNT signaling can be aberrantly activated by destabilization of the destruction complex after mutation of a number of different genes, but these mutations generally occur exclusively of each other. This hierarchy allows individual display of each of these disease-causing events.
The hierarchy also allows the grouping of mutations that occur, for instance, in different domains of a protein, as in Signaling by NOTCH1 in cancer:
Associating normal and disease pathways with the same diagram
In order for the disease pathway to share a diagram with the normal pathway, the disease pathway must be added as a value in the representedPathway slot for the normal pathway diagram. For the Hh ligand biogenesis example (previously shown), the diagram for the normal pathway "Hedgehog ligand biogenesis" (called "Diagram of Hedgehog ligand biogenesis") is shared with the disease pathway by adding the parent disease pathway "Hh variants abrogate ligand biogenesis as the second value in the representedPathway slot for the diagram.
It is also possible to have different diagrams associated with different disease subpathways, as is shown in the Signaling by WNT in cancer pathway.
If the disease module is of the simplest type, with a single parent disease pathway holding a single disease subpathway/event and the WT pathway, clicking on the parent pathway in the hierarchy will immediately bring up the relevant ELV in the pathway panel on the website. If the disease hierarchy is more complicated and contains nested disease subpathways, as in the Mucopolysaccharidoses, NOTCH1 and other pathways described above, a green box diagram will need to be created for the highest level disease pathways.
Highlighting disease entities and events in red
Gain-of-function disease events/entities and disease events/entities with no normal counterparts are both laid out manually in the disease ELV and need to be manually colored red. To do this, drag the disease event into the WT diagram as usual, then right click on any disease entities or reaction lines to bring up the menu.
right click on any disease entities or reaction lines to bring up the menu
Note: Loss-of-function reactions are automatically colored red in the web display and dont need to be manually adjusted.
Specifics of annotating disease events and entities
We want to use disease in a broad sense: if a process has a specific bad outcome we can associate it with that outcome even if it hasnt yet progressed to the point of causing major disease. The association of Amyloidosis with the Amyloids pathway is an example. It is possible, for instance, that amyloid deposition, that is going on in all of us, is pathological, even though most of us will die of something else before the amyloidosis progresses to a point where its symptomatic. Targeting these early pathological processes (propathology?) will likely be very important: good for us as patients because early intervention slows or stops the process and keeps us healthy longer; good for the drug companies because that intervention will likely be a continuing one an additional drug to be taken for life rather than a one-time course.
Reactome disease terms are taken from the Disease Ontology. If gk-central doesnt have the disease term you need, you can create a new one by opting to create a new instance. Browse the hierarchy here, then create a new record in Reactome by entering the corresponding DOID identifier (number only) in the identifier slot. You will be asked if you want to import the entry. Say yes. If the Disease Ontology itself doesnt have an appropriate term, we can request that one be made by contacting Lynn Schrimml and Warren Kibbe.
Please be sure to label the top-level disease pathway and its sub-level disease pathway with a disease attribute.
A disease attribute should be added to all reactions involving disease related physical entities, such as proteins of bacterial/viral/fungal pathogens, mutant human proteins and drugs used in disease management. If there are several related disease tags that are applicable to reactions and pathways, using the most general tag is preferrable, and even if specific disease attributes are added, always include the most general attribute for a given disease type. For example, for cancer related reactions and pathways, always include the cancer tag.
Disease attributes should be added to mutant proteins associated with disease and may be very specific, referring to the specific disease type(s) in which a particular mutation was found. For example, EGFR L861Q mutant (DB_ID 1177542), in which L-leucine at position 861 is replaced with L-glutamine has been detected in non-small cell lung carcinoma and adult glioblastoma multiforme. Besides these specific disease tags, it may be advisable to also add more general disease tags to an EWAS, in this case lung cancer and cancer, to enable search of mutant EWASs using these general terms. When it comes to entity sets, such as EGFR KD mutants (DB_ID 1182966) which includes all kinase domain mutants of EGFR in cancer, a very general disease tag, such as cancer, is appropriate. This is because each member of the set has its own range of cancer types (while EGFR L861Q is found in lung cancer and glioblastoma, EGFR L858R is found in lung cancer, thymoma, thyroid cancer, breast cancer and ovarian cancer), but their biological behavior is identical/similar. For the same reasons, only the general cancer disease tag should be used for events in which cancer disease entities participate.
When annotating drugs used to treat a particular disease, curators should add an appropriate disease tag to a drug entity. For a cancer drug, the general cancer disease attribute should definitely be added, but when it comes to more specific disease tags, it may be advisable to add only those cancer types for which the treatment by a given drug is approved.
Disease Event specific classes and inference
Gain-of-function or loss-of-function events
Disease events are effectively either gain- or loss-of-function events. Gain-of-function events arise either as a result of expression of a variant of a human protein that has activity not seen by the WT protein, or as the result of expression of a foreign gene in the human host. Gain-of-function events are curated in a standard reaction event, and are labeled with a disease tag as described above. Gain-of-function events are added to the disease pathway hierarchy and dragged into the WT ELV in the normal manner. These events are manually colored red, as described above.
Loss-of-function reactions are housed in a new class of event called FailedReaction. These are similarly labeled with a disease tag, but have only inputs (the disease variant entity or complex or set, plus any WT entities that take part in the normal reaction), and do not have any outputs. These FailedReactions are placed in the appropriate disease pathway as for the gain-of-function events, but are not dragged into the ELV. As a result, a curator will not see the loss-of-function events displayed in the curator tool, but only on the curator/live site. The FailedReactions show up automatically on the web with the disease input and the reaction node highlighted in red and the outputs of the WT reaction Xd out.
In general, we are trying to name FailedReactions with the general structure Defective [core protein name] doesnt do [some description of WT function], as in Defective ALG9 does not add mannose to the N-glycan precursor. Some of us are doing better at this than others. Key is using the words defective and variant to describe the disease related entities.
In addition to the disease tag, both gain- and loss-of-function disease events also have 2 additional attributes distinguishing them from WT, an EntityFunctionalStatus and a NormalEvent attribute.
EntityFunctionalStatus is used to mark a disease event as either a gain- or a loss-of-function, and to describe the underlying reason for the change in behavior. This tag, along with the NormalEvent attribute, is especially important for loss-of-function events, as it is needed for deploying the red lines and Xing out of WT outputs in the website display.
The physical entity is simply the disease variant (or complex or set containing the relevant disease variant(s)) that is responsible for the phenotype.
Disease events are also labeled with their corresponding NormalEvent, when appropriate. This is required for loss-of-function events, as it tells the display where on the WT ELV to overlay the disease FailedEvent, but may also be used for gain-of-function events that have enhanced activity relative to a normal WT (rather than a totally novel activity not represented in the normal pathway).
Inferring disease events from WT pathways
Most disease associated mutant proteins are not studied at the same level of detail as their wild-type counterparts. This means that not all reactions annotated for the wild-type protein will be studied on a mutant protein. Frequently, only key interactions, posttranslational modifications and pathway outputs will be checked for a mutant protein. For example, researchers studying the activation of SHC1 by EGFR cancer mutants will usually check if EGFR tyrosines that serve as SHC1 docking sites are phosphorylated (some studies may also demonstrate physical interaction between SHC1 and EGFR mutants) and if MAP kinase cascade is activated. The in between steps, such as the recruitment of GRB2:SOS1 complex and RAS guanyl-nucleotide exchange are not examined but are assumed to happen. The proper way to include these intermediate steps, not directly studied on mutant proteins, in disease associated pathways is to infer disease events from wild-type events, applying the same strategy used for inferring events from other species
Disease entity specific annotation
Like disease events, disease entities must be labeled with a disease tag (described above). In general, we are trying to label our disease entities as variants if a descriptor is being added (rather than mutants, as previously) The data model was updated to include a new class of modified residue GeneticallyModifiedResidue to distinguish them from post-translational modifications.
The FragmentModification class describes more extensive changes to the coding sequence through insertions and deletions in the gene, and includes three subclasses:
- FragmentDeletionModification is used for in-frame deletions of amino-acids
- FragmentInsertionModification is used for in-frame insertions of amino-acids, including genomic events that result in fusion proteins,
- FragmentReplacedModification is used for frameshifts.
The ReplacedResidue class is used for amino-acid substitutions, as previously. This class is also used for simple nonsense mutations that change a coding amino acid for a stop codon.
This class is used for in-frame deletions of amino-acids leading to internally truncated proteins. These variants are named core protein name [first amino acid of deletion_last amino acid of deletion]del.
This class is used for in-frame insertions of amino-acids and fusion proteins.
These variants are named
core protein name [aa prior to insertion_aa following insertion]ins[inserted aas],
as shown below for EGFR V738_K739insKIPVAI. The amino-acid string of the inserted residues is added manually to the EWAS name.
Question Should the end coordinate of variant EWAS be left as WT, or altered to reflect insertion?
This class is also used to annotate proteins that arise as the result of genomic changes that bring two genes together to result in a fusion protein, as in the ZMYM2-FGFR1 fusion described below. This fusion puts the ZMYM2 dimerization region (1-914) together with the kinase domain of the FGFR1 receptor (residues 429-822) and results in constitutive activation of the kinase domain by virtue of ligand-independent dimerization (for reference, the full length aa sequence of these two proteins are 1-1377 and 1-822 for ZMYM2 and FGFR1, respectively).
By somewhat arbitrary convention, the N-terminal most partner of the fusion is set as the reference protein for the variant EWAS while the C-terminal fusion partner sequence is captured in the FragmentInsertionModification record.
These records together show the insertion of residues 429-822 of FGFR1 at position 914 of ZMYM2.
To date, these fusion proteins have been named by the rather minimalist approach [N-terminal core name]-[C-terminal core name] fusion
Question Is this sufficient?
Post-translational modifications to either partner in the fusion protein are numbered according to each respective WT reference gene product and do not reflect the aa position in the fusion. For instance, if in the context of the fusion protein, the FGFR1 partner is phosphorylated at (WT FGFR1 position) Y766, the fusion EWAS would be ZMYM2-pY766-FGFR1, despite the fact that in linear sequence the phosphorylation occurs at residue 1250 of the fusion (913+(766-429).
Frameshift mutations require further deciphering before curating into Reactome. Literature is usually poor with regards to providing exact details of a frameshift mutation. Even if they sometimes give you the frameshifted protein name, the altered amino acids are usually not provided. The class FragmentReplacedModifcation (FRM) is used to create frameshifts in Reactome.
The mandatory attributes that need to be filled for FRMs are indicated by green ms in the greyed area of the instance.
The start and end positions together with the alteredAminoAcidFragment (the string of amino acids that extend from the frameshift till it reaches a stop) are the details that have to be deciphered. The basic steps are
- alter the nucleotide sequence from w/t to the one causing the mutation
- translate the altered nucleotide sequence to the mutant peptide sequence
- compare the resultant mutant peptide sequence with w/t graphically to determine the frameshift.
UniProt records provide the normal peptide sequence in FASTA format. They also provide a link to RefSeq (under the Sequence databases section) which links to the NCBI reference mRNA sequence. This is the starting point from where the nucleotide sequence is copied to Transeq below.
Several EMBOSS programs are available to translate and compare sequences.
Transeq - Translates nucleic acid sequences to their corresponding peptide sequences.
Needle - Aligns two peptide sequences to provide a global sequence alignment.
An example below shows how these steps are utilized to decipher a frameshift mutation.
The disorder Hereditary multiple exostoses 1 (EXT1) is caused by loss of function mutations in Exostosin 1 (EXT1). One mutation is a 1-bp deletion at nucleotide 1469 in the EXT1 gene, resulting in a frameshift mutation with a premature stop codon (Philippe et al. 1997 [PubMed:9326317], Ahn et al. 1995 [PubMed:7550340]).
Go to the UniProt record for human EXT1 here, and scroll to the FASTA sequence for the w/t peptide sequence.
Click the FASTA link to obtain a text version of the sequence.
>sp|Q16394|EXT1_HUMAN Exostosin-1 OS=Homo sapiens GN=EXT1 PE=1 SV=2 MQAKKRYFILLSAGSCLALLFYFGGLQFRASRSHSRREEHSGRNGLHHPSPDHFWPRFPD ALRPFVPWDQLENEDSSVHISPRQKRDANSSIYKGKKCRMESCFDFTLCKKNGFKVYVYP QQKGEKIAESYQNILAAIEGSRFYTSDPSQACLFVLSLDTLDRDQLSPQYVHNLRSKVQS LHLWNNGRNHLIFNLYSGTWPDYTEDVGFDIGQAMLAKASISTENFRPNFDVSIPLFSKD HPRTGGERGFLKFNTIPPLRKYMLVFKGKRYLTGIGSDTRNALYHVHNGEDVVLLTTCKH GKDWQKHKDSRCDRDNTEYEKYDYREMLHNATFCLVPRGRRLGSFRFLEALQAACVPVML SNGWELPFSEVINWNQAAVIGDERLLLQIPSTIRSIHQDKILALRQQTQFLWEAYFSSVE KIVLTTLEIIQDRIFKHISRNSLIWNKHPGGLFVLPQYSSYLGDFPYYYANLGLKPPSKF TAVIHAVTPLVSQSQPVLKLLVAAAKSQYCAQIIVLWNCDKPLPAKHRWPATAVPVVVIE GESKVMSSRFLPYDNIITDAVLSLDEDTVLSTTEVDFAFTVWQSFPERIVGYPARSHFWD NSKERWGYTSKWTNDYSMVLTGAAIYHKYYHYLYSHYLPASLKNMVDQLANCEDILMNFL VSAVTKLPPIKVTQKKQYKETMMGQTSRASRWADPDHFAQRQSCMNTFASWFGYMPLIHS QMRLDPVLFKDQVSILRKKYRDIERL
Copy the peptide sequence to EMBOSS Needle and paste it into the first window.
A mutated nucleotide sequence needs to be created from which the translated mutant peptide sequence will be obtained.
In the UniProt record for EXT1, scroll to Cross-references and select the nucleotide sequence from RefSeq (format NM_xxxxxx.x)
This opens the NCBI nucleotide record for EXT1 mRNA. Scroll down to the ORIGIN section where you will find the nucleotide sequence conveniently numbered. Copy the sequence and paste it into EMBOSS Transeq.
In the Transeq window where the sequence is copied to, edit the nucleotide sequence according to evidence from literature. The coding sequence in the RefSeq record starts at position 774. The nucleotide change, from literature, is a 1-bp thymidine deletion at 1469 (Philippe et al. 1997 [table 2], Ahn et al. 1995 [Fig.6]).
Many times in literature, researchers only use the coding sequence in their experiments and count from that point onwards to the mutation site. As the coding sequence starts at 774, add that to 1469 (the deletion position) which equals 2243. Check this region in the sequence pasted into Transeq. The region matches that found in literature. Delete the t after the 6 cs and submit the job.
Click submit. An alignment result is returned that compares w/t to mutant sequences that you had input into Needle. Follow the sequence (vertical bars between the two sequences indicates similarity) till you reach the first dot (a difference between the two sequences).
The first amino acid to change due to the frameshift is leucine to arginine at position 490. The altered mutant sequence after the frameshift is RSLSPSQC then the mutant sequence terminates. We can now fill in our FragmentReplacedModification (FRM) instance.
After 8 altered amino acids, the 9th position is a stop (*). The name of the EWAS will reflect the mutation.
missense mutation example
The disorder Homocystinuria-Megaloblastic Anemia, cblG Complementation Type (cblG) is caused by loss-of-function mutations in the methionine synthase (MTR) gene. One such mutation causing cblG is the missense mutation pro1173-to-leu (P1173L).
An appropriate modified EWAS must be created. The hasModifiedResidue of this EWAS is filled with a replacedResidue instance.
Clone the parent EWAS (MTR) and rename it with the mutant name
Create a replacedResidue instance. Use PSI-MOD IDs to construct the replaced residue.
This mutant EWAS can now be used in a disease event where there is a loss of function of MTR.
Simple nonsense mutation example
Simple nonsense mutation example
The disorder Ehlers-Danlos syndrome, musculocontractural type 1 (EDSMC1) is caused by loss-of-function mutations in the carbohydrate sulfotransferase 14 (CHST14) gene. A nonsense mutation causing this disorder is a 205A-T transversion in the CHST14 gene, resulting in a lys69-to-ter (K69*) substitution.
An appropriate modified EWAS must be created. The hasModifiedResidue of this EWAS is filled with a replacedResidue instance.
Clone the parent EWAS (CHST14) and rename it with the mutant name
Create a replacedResidue instance. Use PSI-MOD ID to indicate the residue which is being removed (in this case, L-lysine). Currently, a stop is indicated by not having a replacement residue in the psiMod slot. The displayName in this case reads L-lysine 69 replaced with unknown. This construct is being revised to change the currently misleading displayName to display a more accurate name taking into account a stop (*).
Question leave end coordinate as WT EWAS?
The annotation process
1. Create a basic outline of pathway: Regulation of Apoptosis as an example. Further description of the annotation process will focus on the subpathway highlighted below in yellow.
2. Flesh out outline with information including: molecules, compartment, species, text summary, references (PMIDs)
3. Create table of of molecules participating in the pathway
-Each modified form of a protein as a separate entry
-Look up/enter corresponding uniprotID identifiers
Identify existing proteins/molecules
In many cases the proteins on your list will already exist in the database. You should make every effort to reuse existing instances wherever possible to avoid unnecessary and confusing duplications. Use the curator tool to search the ReferenceGeneProduct (RGP) class using Uniprot identifiers as follows:
Searching the database
Choose Class: ReferenceGeneProduct
Choose attribute: identifier
Attribute value: Use REGEXP
Enter your identifier in the search box. You can enter several as a pipe separated list (e.g A1A4S6|O43293|P43146|P43146....).
Select the ReferenceGeneProduct (RGP) returned, if a list select them one at a time, right click and opt to "View referrers". The reulting Referrers Dialog box lists Referrers by property name. At the top of the list can be isoforms of the protein, if present in Uniprot. Do not use isoforms unless you are certain that only specific isoforms have the functionality you intend to represent. Isoforms may have their own referrers and you should check this - if someone took the trouble to create an isoform-specific EWAS they probably had good reason to do so. Items listed with the property name referenceEntity are EntitiesWithAccessionedSequence (EWASs), a Reactome identifier for specific forms/locations of a protein. Often there will be more than one EWAS for a single RGP, because post-translationally modified forms of proteins and proteins in different cellular compartments each have a separate EWAS. If any of the listed EWASs correspond to your needs, right click and opt to "Check Out" that referrer. If the correct molecular compartment or post-translationally modified form is not present, you can still check out an EWAS and later use the Curator tool to clone it and modify it to your needs.
Create new proteins/molecules
Please see [|guidelines]] on naming entities!
If you search for a RGP that does NOT have referrers in the database you will get this message:
In this case, you will need to create the EWAS.
To do this, first check out the RGP of interest into your local project. Then, in the curator tool, select RGP in the class list, then scroll or search for the RGP of interest. Right click and opt to create EWAS from RGP.
It will ask if you want to accept the end coordinates described by Uniprot. Only say yes if you can confirm them to be accurate. If the end coordinates are not certain, the convention is to represent the start as 1, and end as -1.
In the newly created EWAS, the RefereneEntity will have a name and species entered by default. You can add an alternative name if this was specified by the Author, do this by right-clicking on the existing name and select Add. You must define the compartment. To do this, right click in the compartment slot and select the correct compartment in your local repository. Select the compartment and hit OK. If you don't see the desired compartment in your local project , click the "Browse database" button to search in gk_central.
If the protein you want to represent with an EWAS is post translationally modified, that is represented by completing the modified residue slot of the EWAS. For example, to indicate that the protein has a phospho-serine at residue 126, right click on the modified residue slot. You will be prompted to choose a modified residue instance from your local project, browse gk_central or create a new modified residue instance. Almost always you will want to create a new instance, as modified residue instances are specific to the RGP and residue position. Enter the Uniprot identifier for the ReferenceGeneproduct in the ReferenceSequence slot and right click in the PsiMod slot to select a modification type from within the local repository or by searching the gk_central database, and hit ok. Finally, enter the residue number in the coordinate slot and hit ok.
The modified residue instance can now be applied to the EWAS by clicking ok.
The modified EWAS is shown below.
If you are annotating a protein fragment, you can define the start and end coordinates of the fragment as shown below: Don't forget to change the default name to indicate that it is a fragment.
Creating a Complex
This is done in one of two ways. Either:
Go to the Schema view, select Complex, right-click and select Create Instance. The Create A a New Instance dialog box appears. Enter a name in the field for Name. Typically you would also enter the Compartment, Species and identify the entities (EWASes, sets or complexes) that make up this complex using the field hasComponent. All of these fields are completed by either double-clciking to type, or right clicking to select Add and identify the correct item from the local project.
Or, by selecting the appropriate field in the details of an event or entity that contains a complex, right click and select Add. This will produce a 'Select Instance' dialog that preselects the allowed classes that can be added to that field. E.g. if you right click the Output field in a Reaction, the allowed classes include several types of set, complexes, polymers and EWAS. To create a new complex at this point, select Complex in the list of options on the left, and click the New button on the right. The process is then identical to that described above.
Creating a Set
Reactome has several types of set - refer to the Glossary and User Guide for definitions.
The most commonly used sets are Defined Sets and Candidate Sets.
Defined Set members should be proven equivalents, i.e. all of them have been demonstrated to perform the function that is described by the event they participate in.
Candidate Sets have two categories of inclusion, members, equivalent to defined set members, and candidates, members that are not proven to be functionally equivalent, but are believed to be equivalent based on phylogeny, domain structure etc.
Creating sets is a similar process for all subtypes, select the appropriate type in the Schema view, right-click and select Create Instance, fill in the Name and Species, right click to add the set members.
- All members of a set must have the same compartment. The only time a set can have multiple compartment attributes is if its members themselves all have the same multiple compartment attributes, e.g., a set of membrane-spanning complexes with components explicitly located [on this side], in the membrane, and [on that side].
- While sets are, by default, homogeneous (members having the same PhysicalEntity class), they are not required to be. For example, the defined set platelet alpha granule contents (481033) contains, as members, EWASs, Complexes and Sets.
Creating a Pathway
Please see *note* below if you are adding a pathway that will be a new top level pathway.
Here is description of how the outlined mini pathway "Regulation of activated PAK-2p34 by proteasome mediated degradation" is built from its component events in the curator tool. The pathway consists of 2 reactions: "Ubiquitination of PAK-2p34" and "Proteasome mediated degradation of PAK-2p34". For simplicity, the reactions have already been created (see section on creating inferered reaction for an example.)
You can then select the events that you need one at a time or, as a short cut, you can search for your newly created events in your project if they have not yet been submitted to gk_central (they will all have DB_ID attributes that are negative ). To do this, search in your project for events with DB_ID containing - . From this list, you can hold the control key and select the events of interest.
The order of the events in a pathway is described through the use of the "precedingEvent" attribute on the reactions that are components of the pathway. If/when no preceding events is specified the order of the events displayed on the webpage reflects the order in which they are listed as components in the pathway instance that you are creating.
Once the component events have been added, the remaining required attributed are added. If the pathway that you are describing corresponds to a GO biological process, right click on the goBiologicalProcess slot and select set. Select the appropriate GO term from your local repository of gk_central. If you can't find the term of interest, ask for help.
- note: If you are creating a top level pathway(check with Peter if this is appropriate), it must be listed as frontPage item. To do this, check out the frontPage instance from gk_central and add your pathways in the frontPageItem slot. Also please mark this in the editorial calendar as a front page item.
Creating a Reaction
You can create a new reaction by selecting the Reaction class in the Schema view, right-click and select Create Instance, but it is perhaps better practice and more intuitive to create new reactions inside a pathway. To do this, select a pathway in the Event Hierarchical View, select the hasEvent property name in the details panel on the right, right-click and select Add. This leads to a dialogue for providing details of the reaction.
This may be enough detail for the moment, if you are simply creating a placeholder click OK.
To set the species of the reaction right click and select add in the species field.
If the species that you happen to be working with is not in your local project you can opt to search for it in the database using the "Browse Database" Button in the dialog box. When you set/change the species or the compartment of a reaction, it will ask if you want to propagate the species/compartment to all of the component molecules as well.
Use caution when selecting Yes. ONLY say yes here if you know that event and contained molecules have no referrers in the database that would be affected. (In other words ALL other reactions or complexes or sets in the db that make use of these now "changed" moleules would be affected).
Again if you don't have the compartment you need in your project, you can Browse Database to find the one you need.
Now add the input and output molecules. Right click in the respective box and select "Add".
Select the molecule from the appropriate class:
Important: After adding your input and output molecules, it is important to verify that your reaction is balanced (all molecules represented as input are also present as output). See the QA section below for a description of how to do this.
Add the literature reference(s). The references associated with a reaction MUST provide direct experimental evidence for the occurence of that reaction in the species you are annotating (i.e. human for human Reactome). If there is no direct experimental evidence in human then you need to create an inferred human reaction as described in the section below. Enter the PMID for journal articles, and say yes when prompted do have the details filled in automatically. A description of how to add other types of references (Books , URLs) will be added soon.
If you don't see the reference in your local repository then opt to Browse Database.
If you can't find the literature reference in the database either, then you need to create a new one:
Enter the PMID (number only). Click out of the PMID box. You will be asked if you want to import the PMID record information. Say yes.
You will now see the full record:
Look back in your local repository and you will see the new literature reference. You an now add this as a reference for your reaction.
Once you have added your references, you can a text summary for the reaction in the "summation"slot. Right click and select add.
Then input your text and citations. This text should describe the event and optionally provide some background information to give the pathway context. When describing a protein entity, introduce it by using the full UniProt descriptive name, exactly as it appears, followed in brackets by the abbreviated name used for the ewas, which is normally based on the gene symbol. For example, 'Ras GTPase-activating protein 1 (RASA1) stimulates the GTPase of normal but not oncogenic Ras p21....'
Subsequently use the abbreviated (systematic) name. It's good practice to give the full name of every protein mentioned, in every summation.
Once the mandatory attributes have been filled, enter the remaining required attributed. Right click on edited to chose an instanceEdit from your project.
If you haven't created one previously, opt to create a new one by clicking on the New Instance button.
Then right click on author to select a person. If you don't see the one you want locally, browse the database.
The edited slot holds the name of the curator that should be credited with creating and editing the event and the date it was edited. This slot is filled with an instanceedit instance that contains this information.
A date time stamp is created automatically for that instanceEdit and clicking "OK" will add this instanceEdit to the edited slot in your reaction.
The authored and reviewed slots hold the instanceEdits describing the author/reviewer and dates of authoring/reviewing respetively. These are created as described above for the "editor" slot. When the reaction is read for release, the do_Release flag should be set to TRUE and the releaseDate slot should be filled with the appropriate release date.
Important: After adding your input and output molecules, it is important to verify that your reaction is balanced (all molecules represented as input are also present as output). See the QA section below for a description of how to do this.
When you can define preceding events for an event it should be done. One precision on that point is that a preceding event for a reaction should always be a reaction/reaction like event and not a pathway and the preceding event for a pathway should be another pathway when relevant and not a reaction or reaction like event.
Creating an inferred event
When constructing a human pathway, a curator will come across events that have no direct experimental evidence in humans, but have supporting experimental data from another/other species. If experts in the field believe that the event can in fact occur in humans,
the 'other species event' can be used to infer the human event. In the case illustrated below a human reaction is inferred from in vitro experimental results using proteins from human and Oryctolagus cuniculus (rabbit).
Here a reaction "Proteosome mediated degradation of PAK-2p34" is created and the species Homo sapiens and Oryctolagus cuniculus are assigned. To avoid having a human and a non-human reaction with identical names, it can be useful to use capitalized forms of object names for the non-human reaction and all-upercase names for human, e.g. Jak2 and JAK2 for the non-human and human proteins respectively. A text summary and the literature reference providing evidence for this mixed species reaction is added.
...and the mixed species output complex "PAK-2p34" is selected as the output.
Now that the reaction to be used for inference has been created, create the same reaction for human, using human participating molecules. Note, however, that the literature reference is not associated with this human event! Instead, in the "inferredFrom" field, right click and select "Add" to enter the non-human reaction used for inference.
Selecting the mixed species reaction created previously...
...now the link to the inferred reaction has been made.
Adding a cross reverence to a new database (not previously existing in Reactome) to an instance
If you want to add a crossReference attribut you can add this as long as you create an instance for the linkable database.
Connection between a generic and specific reactions
You may come across a situation where it's convenient to create a generic, all-encompassing reaction in which a set of proteins perform the same function. Specific proteins from this set may be used elsewhere in other pathways as specific, single reactions so we want a way to indicate the specific reaction is one reaction derived from the generic reaction. The way to indicate this is to use the "hasMember" property of the generic reaction.
An example is the ABCC family of transporters mediating organic anion transport across the plasma membrane. Three of the proteins from the set of proteins in this reaction are involved in three specific reactions elsewhere. To show there is a connection between them and this reaction, use the hasMember slot to indicate these three reactions are specific examples of this generic reaction (as shown below)
Creating a Catalyst
Reactions that involve a catalyst should include this information. Within the Reaction Details, the field is called catalystActivity. To complete this you need two things: the physicalEntity or object that is acting as catalyst, and the Activity of that object, defined as a GO molecular function. The physicalEntity will be a molecule or set or complex probably in your local project. The GO molecular function can be identified by consulting Uniprot, look at the ontologies section for GO Molecular Function. If none of the listed terms seems to be appropriate, either Browse Database for terms in gk_central, or use the OLS website at http://www.ebi.ac.uk/ontology-lookup/ to identify the correct term first. Use the most specific term possible.
Creating a new ChEBI entry
Using SMILES strings as input for the ChEBI submission tool
To use the submission tool, you must get a user name and password, and log in. Go here to do that. Once you have logged in, click "create a new submission" from the choices at the bottom of the page. That will cause a new line to appear in the table of "your active submissions" on that page. Click the "edit submission" option on that line to open the actual submission form.
Under the ‘Name And Structure’ section of the Submission tool, select ‘Edit Structure’.
Under the ‘Edit’ Menu, select ‘Import Name’.
An input box named ‘The Source – Name’ appears, this is the place to paste your SMILES string.
In this example, the following string for 1-PP-IP5 is used: OP(O)(=O)O[C@H]1[C@H](OP(O)(O)=O)[C@@H](OP(O)(O)=O)[C@H](OP(O)(=O)OP(O)(O)=O)[C@H](OP(O)(O)=O)[C@@H]1OP(O)(O)=O
After the SMILES string has been pasted, select the ‘File’ menu in ‘The Source – Name’ input box. Select ‘Import As’. Maybe displace circle or enlage so as not to hide word “File” in menu
The chemical structure defined by the SMILES string should now be present in the box, ready for editing. Press ‘Update structure’ to obtain details of the structure.
Structure details are now displayed in the right of the page. The structure on the left hand side can now be edited as you wish.
Annotating the regulation of a process
The following organization of regulation events works well for many kinds of processes, and it fits with our view that all parts of a process should be grouped, while respecting GO's view that regulatory events should be distinguishable from the rest of the process.
All about [process] (pathway) --The steps of [process] (pathway)
[process] reaction 2
--Regulation of [process] (pathway)
[process] regulatory reaction 1
[process regulatory reaction 2
Regulation here can include reactions that are themselves concrete molecular transformations whose effect is to modulate one of the main process reactions by activating an enzyme, or providing or sequestering an input molecule. Regulation can also include less specific things like "[this regulatory event] by an unknown molecular mechanism positively or negatively regulates [process reaction #] .
Modifying and Deleting
It is important to understand that if you locally modify or delete an instance you checked out from gk_central, it will also be modified/deleted in gk_central when you synchronize. You must ALWAYS CHECK FIRST that the instance you intend to modify/delete is not in use elsewhere, outside your local project. The best way to do this is to search for it using the Database Browser Schema View, select referrers, if it has any you didn't know about do not modify or delete it! There may be circumstances when you think something should be modified or removed but if it has been used by another curator, check with them first, or contact an experienced curator for advice.
Diagram checks after deleting entities or reactionlikeEvents
If and when you need to delete an entity from gk_central, you must run the deleted object in diagram check over gk_central to find any diagrams that have used those instances. A description of how to run this check is shown here.
More curation examples
Another example of the annotation process can be found here.
Project QA using the Curator Tool QA checks
Within the Tools menu the "QA Check" menu can be found.
This menu has a six separate QA script items within it.
- Imbalance Check (checks that the molecules present as input are also present as output)
- Mandatory Attributes Check (checks that the mandatory attributes for a class have been entered)
- Required Attributes Check (checks that the required attributes for a class have been entered)
- Compartment Check For:
- EntitySet (component of set members matches compartment of set)
- Complex (component of complex matches compartment of individual components)
- Reaction (component of reaction matches compartment of individual components)
- EntitySet (component of set members matches compartment of set)
- Diagram checks
Note: In order for the QA checks to effectively pick up errors, the project that you are working on must be fully extracted from the database. Instructions on how to do a full extraction can be found here.
You must select Reactions in the hierarchy to perform this check.
Reactions are flagged as cleavage reactions if the output differs from the input only that the output contains "fragments" of the input molecule. A true imbalance is shown below:
Mandatory attribute check
A list of instances missing mandatory attributes (by class) is shown. To make the missing attributes of the instance easier to see, you can use the "order attribute" button (downward arrow with circle and triangle) in the upper right side of the tool. This orders the instances by type (mandatory, required, optional...etc)
Required attribute check
This checks work in the same way that the mandatory attribute check works.
These checks work in the same way that compartment checks work with the exception that Pathways are also checked for species conflicts.
- Deleted objects in diagrams
When an entity or reaction is deleted in the instanceview of the curator tool, it must also be removed manually from any diagram that it has been drawn into. This does not happen automatically. This check is run over gk_central and will look for any diagrams that are affected by the deletion of a reaction or reactionlikeevent. This check MUST be run after any deletions of reactionlike events or entities have been committed to gk_central so that the affected diagrams can be identified and the appropriate changes made in any affected diagrams.
Any affected diagrams will be flagged. The DB_ID of the deleted instance will be displayed, but to see the affected "objects" in the diagram, you will need to view the diagram in gk_central.
Select that diagram in gk_central, right click and opt to "Show diagram".
Affected objects will be flagged
and the objects in the diagrams highlighted in red.
QA of projects before release
Because of the nature of the release process and the growing number of curators submitting projects the QA load has become greater. One of the solutions to this problem is for curators enter this data right from the begining and to run the QA checks in the curator tool before finishing their projects
- Top Six List Of Problems Identified During the Slice
- Complex Balances
- UniProt IDs
- Complex Compartment Checks
- Entity Compartments
- Balancing Of Reactions
1. All of the instances must be updated from gk_central in order for these checks to be meaningful
2. QA Checks should be run regularly, once you have created a reaction or even a bunch of EWASs.
Everyday QA includes:
- Complete check-outs (No shell instances)
- Match instance in DB
- QA Tools
- Complete check-outs (No shell instances)
Drawing a pathway diagram
Reactome pathway diagrams are drawn and viewed in the curator tool using the ELV pane of the tool.
This example will shows how the superpathway "Regulation of Apoptosis" is diagrammed. If you are drawing a new diagram, see below. The pathway Regulation of Apoptosis is part of the supercanoical Apoptois pathway. Here, you can tell that Regulation of Apoptosis has been annotated but not yet incorporated in the Apoptosis diagram. You can tell this because the pathway is greyed out in the event hierarchy.
To incorporate this pathway in the Apoptosis diagram, simply click on, hold, and drag the pathway from the hierarchy to the diagram.
If you right click an any of the pathway boxes, you are offered the option to open Diagram. If one has been created it will open.
If it has not, as in the case of Regulation of Apoptosis, you get the below message.
Select "No" and an empty diagram will be opened in the pathway editor pane.
To create the cellular compartments that you need for the diagram, click on the shaded square in the menu bar for the Pathway editor. It is best to create all the compartments that you will need before you start to lay out reactions. Here cytosol is created.
To enlarge the compartment, click on it to select it and then grab the compartment at one of its nodes in the corners. Then drag outward.
To begin drawing, select a reaction from the event hierarchy and drag it onto the diagram.
To see the names of the compartments of the reaction participating moleules, right click anywhere on the diagram and opt to show compartment names. This will make it easier to see that the molecules have been positioned in the correct compartment.
To reposition the reaction you can click and drag different components or you can select the entire reaction by clicking and dragging a selection bax over it. Then the reaction and all its component molecules can be moved as a unit.
Additional reactions are dragged out and positioned one at a time.
Reactions are of one of 5 types: Transition, Association, Dissociation, Omitted process, and Uncertain process. Transitions involve the moleules changing state, Association is a binding reaction, Dissociation is the Dissociation of a complex. Omitted process, and Uncertain process are currently not used. To apply a reaction type to a reaction, right click on on reaction, select change type .
Once all of the reactions have been laid out, the compartment names on molecules can be hidden by right clicking on the diagram anywhere and selecting "Hide compartment in names".
If you want to include a link to a pathway that is not actually "part of" the pathway that you are diagramming, you can do this by checking out both the pathway you are diagramming (pathway A) as well as the pathway you'd like to include as a icon (pathway B) using the Event view. Open the diagram of pathway A in the ELV view. Then, drag the icon of pathway B into the the ELV. Save and commit the changes. Then, redeploy the pathway A diagram.
Drawing a new pathway diagram
If you are creating a diagram for a new top level pathway (check with Peter/Lisa if this is appropriate) ,remember that the pathway itself must be listed as frontPage item in order to see the deployed diagram. To do this, check out the frontPage instance from gk_central and add your pathways in the frontPageItem slot. Also, please mark this in the editorial calendar as a front page item and inform. In order to see the changes the Pathway hierarchy will need to be updated on the 8084 site. Please contact Peter or Lisa to do this.
If you are creating a new diagram diagramming a pathway (and is not a top level pathway, you will have to make sure that the pathway is represented (as an icon) in a diagram that represents (or is part of) a top level pathway.
Preparing for database releases
A full description of the release procedure can be found in the release SOP:
Remote Attribute Search Tool
or on live site:
Examples of how to use the remoteattsearch tool can be found here.
Identifying list members that are unique to one of two lists using Microsoft Excel
Here is a procedure that describes how to take two lists and compare entries to identify those that are present in only one of the two lists. This procedure can be useful, for example, to compare the list of proteins in the gk_central vs. live site to find those that are unreleased.
Create the framework for the pathway(s) you intend to curate before filling in the details. Start by creating a pathway, add to this new or existing reactions in the correct order, complete the summations and literature citations, then identify the EWASES, Complexes, Sets etc. required and complete the details of the reactions consecutively. Cascading signaling processes can involve very complicated Complexes, in these circumstances the Graphic Display in Entity Hierarchical View is very useful as an overview of the order of events.