Glossary Data Model

From ReactomeWiki
Jump to: navigation, search


Data Model Glossary

Reactome Team

Updated: April 2011


Reactome uses a frame-based knowledge representation. The data model consists of classes (frames) that describe the different concepts (e.g., reaction, molecule). These classes are hierarchically arranged into classes and parental superclasses. Thus, the "PhysicalEntity" superclass has as children classes such as Complex and GenomeEncodedEntity. Superclasses are used to organize the data structure, but not for annotation by curators. Data are captured by creating instances of classes. Classes have attributes (slots) which hold properties of the instances (e.g., the names and numbers of copies of the molecules that make up a complex). "Defining" slots are used to identify and distinguish instances in the knowledgebase and help to ensure that essentially identical information is not represented in Reactome in multiple copies. In this glossary, class names are in boldface type (CatalystActivity) and attribute (slot) names are in italic type (text).

Reactome Classes


A class used for database management. Should not be manually edited by curators.


A class used for database management. Should not be manually edited by curators.


[definition goes here]


The name and address of an institution. Any Person can have an affiliation, but at present this information is recorded only for Reactome authors, reviewers, and curators. Example: Cold Spring Harbor Laboratory / 1 Bungtown Road, Cold Spring Harbor NY USA ([1])


Associates a specific PhysicalEntity with a specific GO MolecularFunction. A PhysicalEntity has as many CatalystActivity instances associated with it as it has distinct activities. Example: GTPase activity of 80S ribosome \[cytosol\] ([2]) Overloading: The GO molecular function ontology recognizes fourteen kinds of function in addition to catalysis (e.g., transporter activity, signal transducer activity) and function terms from these fourteen classes also can be used as molecular functions. Example: glucose transporter activity of GLUT1 homotetramer \[plasma membrane\] ([3])


This class is scheduled for removal from the data model. Do not use it.
A ConcurrentEventSet is a set of simultaneous, non-competitive events which all involve the same PhysicalEntity. This entity is the FocusEntity of the EventSet. Example: ConcurrentEventSet:74667 links three Events in the pathway of insulin receptor activation and recycling via the FocusEntity, "activated insulin receptor \[integral to plasma membrane\]" IRS and Shc2 bind the active insulin receptor independently of each other, e.g. binding of IRS and the following IRS-specific signaling events happens regardless of binding of Shc2 and vice versa. Without the ConcurrentEventSet the reactions of Shc2 binding and IRS binding would appear to compete with each other as alternative events, rather than as parallel ones. ([4])


Unique identifiers from external databases (ReferenceDatabases), used to link Reactome entities to these external records. Database identifiers for nucleotide or protein sequences are held in the subclass SequenceDatabaseIdentifier. Example: COMPOUND:C00114 ([5]) links the Reactome record for the molecule choline to the record maintained by KEGG.


Used to specify the disease that is associated with mutant protein or event. Disease instances include the The EBI disease ontology (DO) identifier and name.

Domain \[superclass\]

This superclass and all of its class children, ComplexDomian, GenericDomian, and SequenceDomain, are obsolete.
A part or subregion of a PhysicalEntity that has a distinctive function. It can be attached to a PhysicalEntity via the hasDomain slot.


A domain that consists of two or more non-contiguous parts of a physicalEntity. These individual parts can be subregions of a molecule (e.g., amino acid residues 1 – 109 of an immunoglobulin light chain) or entire molecules in a complex. The parts are identified as values of the hasMember slot of ComplexDomain. Example: Ceruloplasmin 3\’ UTR ([6])


A domain that is shared by multiple physicalEntities. A genericDomain can be attached to an EntitySet, with the relevant Domains of the individual entities that make up the set listed in the in the hasInstance slot of the genericDomain. A genericDomain without instances may be used when the Domain cannot be further specified, other than by its name. Example (with instances): IRS-PTB domain ([7]) Example (without instances): AUG start codon ([8])


A defined subregion of a polypeptide or polynucleotide sequence. The sequence is identified by specifying a referenceEntity and the domain is identified by specifying its startCoordinate and endCoordinate.


These terms associate a variant form of a protein, such as AKT1 with glutamate substituted for lysine at residue 17, with a functional status such as “gain of function via non-conservative missense variant”.

Event \[superclass\]

An event is any biological process in which input entities are converted to output entities in one or more steps.


Deprecated. Due to difficulties in finding clearcut criteria to distinguish such events from Pathways, the ConceptualEvent class has been merged to Pathway, which now has a hasEvent slot rather than hasComponent. Used to be: A set of Events that accomplish conceptually similar things, i.e. where the inputs and outputs are not identical but similar. For example, Pol I, II and III dependent transcription, and mitochondrial transcription, are 4 distinct and independent processes each producing a different type of RNA molecule. However, they all result in RNA (although different types) and it can be useful to annotate the concept of \’transcription\’ generally. The specific events to be linked in this way, which may be reactions or pathways, are entered as values of the hasSpecialisedForm slot. Example: Transcription \[Homo sapiens\] ([9])


Deprecated. A set of Events which accomplish exactly the same thing, i.e. inputs and outputs of all the set members are identical, as for a set of Reactions catalyzed by isozymes. However, the preferred way of dealing with such Reactions is to create a DefinedSet of the isozymes and to use that set as the physicalEntity of a catalystActivity of a single Reaction.


Any collection of related Events. The events in a pathway can be ReactionlikeEvents or other Pathways. This class is very broad because we have not been able to identify precisely defined, widely accepted, and distinct alternative strategies for grouping events. With such a set of strategies, this class could be subdivided. Note: Groups of Events that are very similar, e.g. have same inputs and outputs and only differ in the catalyst, should preferably be represented as ReactionlikeEvents using a DefinedSet for the entities that can vary. If an individual Reaction represented by this group has distinct features like literature references, or regulation, it can be spelled out and attached to the general Reaction via the hasMember slot. Slots: crossReference – Identifiers to point to the equivalent pathway in another database. Not used at present. definition – Deprecated. Do not use. hasEvent - defining attribute; holds reactions or pathways that make up this Pathway. List these events in the order in which they occur in the pathway. Example (a classical biochemical pathway): Fatty Acyl-CoA Biosynthesis \[Homo sapiens\] ([10]) Example (a less rigidly ordered group of events): Formation of Platelet plug \[Homo sapiens\] ([11])

ReactionlikeEvent \[superclass\]

Non-instantiable. Has four subclasses (Reaction, BlackBoxEvent, Polymerisation and Depolymerisation). Conversion of one or more input entities to output entities, possibly facilitated by a catalyst. Most reactions in Reactome involve a) the interaction of entities to form a complex, b) the movement of entities between compartment, or c) the chemical conversion of entities as part of a metabolic process. Example: a) 2 phosphorylated HSL monomers => phosphorylated HSL dimer \[Homo sapiens\] ([12]) b) adenine \[cytosol\] <=> adenine \[extracellular\] \[Homo sapiens\] ([13]) c) Adenine + PRPP => AMP + PPi \[Homo sapiens\] ([14]) Overloading: ReactionlikeEvent can be overloaded to serve as "shorthand" to represent complex processes, such as expression of a specific protein or degradation of a specific protein, pathways whose individual steps are not annotated in Reactome. Example: Insulin degradation \[Homo sapiens\] (


Bona fide reactions, i.e. reactions that have balanced input and output entities. Instances of this class are subjected to rigorous QA, among others by checking for imbalances.


Holds reactions that have imbalances for various reasons, or more complex processes for which we either don't know all details or don't want to describe each individual step. Instances of this class represent 'shortcut' reactions to make a connection between input and output, or to describe the appearance or disappearance of an entity (e.g. protein synthesis or degradation). Slots: hasEvent allows to enter Events that represent steps between input and output, e.g. the Reactions forming one cycle of fatty acid beta oxidation - so this is where some mechanistic detail can be entered. templateEvent allows to enter the general Event that is serving as template for this process, e.g. 'Gene Expression'. catalystActivity is a multivalue slot as more than one catalyst may be involved in the event.


Reactions that essentially follow the pattern: Polymer + Unit -> Polymer (there may also be catalysts or other entities involved). Such reactions have an apparent conflict/imbalance in that a Polymer, even when another Unit is added, results in the same Polymer entity. Serves to describe the mechanistic detail of a polymerisation reaction.


Reactions that follow the pattern: Polymer -> Polymer + Unit (reverse situation of Polymerisation).


GO evidence codes ([15]). At present, used only for flagging inferred orthologous events (IEA, inferred by electronic annotation).


An image in jpg format used to illustrate a Reactome data object. So far, figures have only been associated with events, although they can be associated with entities, persons, and several other classes. The url slot of the figure instance holds the address of the image jpg file in the Reactome CVS website repository, and the figure slot of the data object to be illustrated holds the name of the figure instance. Example: /figures/apoptotic_factor_responses.jpg is used to illustrate the pathway "Apoptotic factor-mediated response \[Homo sapiens\]" ([16])


The topics on the front page of the Reactome website. The single instance of this class is normally maintained by the Editor-in-Chief and Managing Editor.


These terms associate a genetic phenotype such as “loss of function” with a causative molecular feature such as “point mutation”.


A local copy of the GO Biological_Process ontology: [17] Instances of this class are used to create goBiologicalProcess cross-references for Reactome events. Curators should never create new instances of the class, but should request the creation of an appropriate new term in the GO ontology itself.


A local copy of the GO Cellular_Component ontology: [18] Instances of this class are used to specify the compartment (subcellular location) of Reactome events. Curators should never create new instances of the class, but should request the creation of an appropriate new term in the GO ontology itself.


The subset of GO_CellularComponent terms that can be used to characterize the locations of physicalEntities in Reactome. These are non-overlapping so that Entities can be unequivocally assigned to only one compartment. Entities from different non-overlapping compartments are created as separate instances.


A local copy of the GO Molecular_Function ontology: [19] Instances of this class are used to specify the activity of a Reactome physicalEntity, to create an instance of catalystActivity. Curators should never create new instances of the class, but should request the creation of an appropriate new term in the GO ontology itself. Overloading: The Molecular_Function ontology identifies fourteen kinds of function in addition to catalysis (e.g., transporter activity, signal transducer activity). Function terms from these fourteen classes are also acceptable values of the activity slot.


Records the date and time when a Reactome instance was created or modified and identifies the person responsible for the creation or modification. InstanceEdit instances are automatically generated when the central database is updated from the curator tool, and should not be manually edited.


A publication, typically a journal article, cited in a summation or linked to an entity or event instance. Note: A literatureReference instance can be created for a publication in PubMed ([20]) simply by entering the PubMed ID (PMID) into the appropriate slot on the forms provided by the curator and author tools. The tools fetch the bibliographic information and create any person instances needed to annotate authorship. If the publication is not in PubMed, this information must be entered manually.

AbstractModifiedResidue \[superclass\]



The modification of a fragment of an EntityWithAccessionedSequence through deletion or insertion of consecutive amino acid residues, usually in a disease variant of a gene product.


The deletion of a continuous fragment of an EntityWithAccessionedSequence, e.g. deletion of amino acid residues 30 to 297 in the EGFR mutant protein EGFRvIII in glioblastoma. When creating a new FragmentDeletionModification instance, it is necessary to specify the referenceSequence, which should be equal to the referenceEntity (UniProt P00533 EGFR in the case of EGFRvIII), the startPositionInReferenceSequence, which is the first amino acid of the deleted fragment (30 in the case of EGFRvIII), and the endPositionInReferenceSequence, which equals the last amino acid of the deleted fragment (297 in the case of EGFRvIII).


The modification of an EntityWithAccessionedSequence through insertion of a continuous fragment. The continuous fragment can originate from the same EntityWithAccessionedSequence, e.g. duplication of the kinase domain of EGFR in glioblastoma to create a tandem kinase domain TKD-EGFR, where amino acid residues 664 to 1030 are duplicated and inserted at the position 1031 within EGFR. Alternatively, the continuous fragment can originate from a different EntityWithAccessionedSequence, resulting in a fusion protein, e.g. BCR-ABL1 fusion protein in the chronic myeloid leukemia where amino acid residues 1 to 927 of BCR are fused to amino acid residues 27 to 1130 of ABL1. When creating a new FragmentInsertionModification instance, it is necessary to specify the referenceSequence, which is equal to the referenceEntity in the case of an internal duplication (UniProt P00533 EGFR for TKD-EGFR) or different from the referenceEntity in the case of a protein fusion (for BCR-ABL1, UniProt P00519 ABL1 is the referenceSequence for FragmentInsertionModification, while UniProt P11274 BCR is the referenceEntity for the EntityWithAccessionedSequence). A coordinate value of a FragmentInsertionModification instance is the position of the amino acid residue in a referenceEntity at which the inserted fragment starts (1031 for TKD-EGFR; 928 for BCR-ABL1). The startPositionInReferenceSequence and the endPositionInReferenceSequence represent the first and the last amino acid of the inserted fragment, respectively (664 and 1030 for TKD-EGFR; 27 and 1130 for BCR-ABL1).


The replacement of a conventional residue of a polypeptide or polynucleotide with a different conventional residue or residues. For instances of the ReplacedResidue class, the psiMod slot is multivalued (for ModifiedResidue and GroupModifiedResidue instances it is single-valued). The first psiMod slot value is the psiMod identifier for removal of the residue normally found at that position; the second (and optionally third, fourth, ...) value is that for the residue(s) replacing the removed one. The example shows the annotation of an insulin protein in which the proline normally found at position 52 has been replaced by an aspartate.([21]) The history of the modification process is not captured here: this variant insulin protein could equally well be used to annotate the post-translational modification of insulin to convert the proline residue to aspartate and the expression of a gene with a mutant codon. (Historical note: this replaces a previous annotation strategy in which replaced- and modified-residue instances were used to distinguish the degree of chemical change brought about by the modification: modifiedResidue instances were smaller changes (e.g., phosphorylation of tyrosine) and replacedResidue instances larger ones (e.g., replacement of lysine by hypusine). The PSI-MOD ontology does not make this distinction of degree, so we have changed the meaning of the replacedResidue class to enable us to capture substitutions of one amino acid for another in a protein, a form of variation that it was not previously possible to annotate in Reactome.)



The modification of an amino acid residue in protein with a chemical entity that cannot be specified in atomic detail, e.g., the attachment of a dextrin or glycogen moiety to a tyrosine side chain in the protein glycogenin. Such incompletely specified chemical entities are beyond the scope of the PSI-MOD ontology but are available in the ChEBO ontology. The psiMod attribute would take a value such as MOD:00166 O4'-glucosyl-L-tyrosine, which describes the linkage between the protein and the modifying group. The modifying group is then specified with a ReferenceGroup instance (ChEBI terms) such as CHEBI:28912 "limit dextrin", as in the example: limit dextrins on L-Tyrosine [ChEBI:17895] 194 of UniProt:P46976 GYG1 ([22]).


A specific modification of any residue in an EntityWithAccessionedSequence, e.g. gamma-carboxylation of glutamate residue 47 of coagulation factor X, or the conversion of lysine residue 50 of EIF5A to hypusine. A modification instance associates the modification as specified in the PSI-MOD ontology at a specific coordinate. e.g., 47, of a specific EntityWithAccessionedSequence (a ReferenceSequence). The modification itself, an instance of the psiMod class, identifies both the original amino acid residue and the chemical change it has undergone, e.g. L-gamma-carboxyglutamic acid. (Historical note: this replaces a previous annotation strategy in which the identities of the modified amino acid and its modification were specified separately.) ModifiedResidue instances should only be created if the chemical natures of the modifying group and residue can be specified in atomic detail with a psiMod term. If the position is unknown (e.g., a protein is known to be phosphorylated on three of its seven serine residues), this ambiguity can be captured by leaving the coordinate slot empty. If the modification cannot be fully specified, e.g., O-dextrin-tyrosine, where the number of glucose residues in the dextrin is indeterminate, create a GroupModifiedResidue instance instead. Notes: The modifedResidue instance has no subcellular location of its own, but inherits the location of the macromolecule with which it is associated. As shown in the example, this allows the modification details of a protein to be annotated once, and instances of the modified protein in different locations or involved in different complexes can be created without repeated annotation of the invariant modification features of the protein. When you create a new modifiedResidue instance, the form requires you to enter an EnityWithAccessionedSequence value. You must manually select the appropriate one. The curator tool does not prevent you from creating a modifiedResidue instance on hexokinase and then entering information for beta-globin into its EntityWithAccessionedSequence slot, and the confusion that results is substantial! Example: carboxyl group on L-Glutamate \[MOD:00041\] 47 ([23])


The name of a person. Used to identify curators, authors, reviewers, and literatureReference authors, and to associate curator persons with model organism pathway curation projects like FlyBase and Gallus Reactome. Author Identifier such as ORCID can be associated with a person via the CrossReference attribute. When adding a "CrossReference" to the person instance, you will be asked to create a databaseidentifier. For an orcidID, the database to select is ORCID and the identifier will be the number provided by ORCID in the format XXXX-XXXX-XXXX-XXXX .

PhysicalEntity \[superclass\]

Something that can interact physically with something else, including all kinds of small molecules, proteins, nucleic acids, chemical compounds, complexes etc. (even photons).


A physicalEntity formed by the association of two or more other entities (which can themselves be complexes), which are its components. To create a valid complex instance, at least one component must be specified. Even if an entity is known to be composed of subunits, unless the subunits that allow it to be distinguished from other cellular entities can be identified, it cannot be annotated as a complex. Instead it must be an otherEntity (e.g., a generic bit of heterochromatin, or a lipid raft). However, complex instances can legitimately be created in cases in which experimental data suggest that additional subunits remain to be identified or that the stoichiometry of the complex is uncertain. Examples: A complex: Presenilin homodimer \[plasma membrane\] ([24]) A complex assembled from other complexes: pyruvate dehydrogenase complex \[mitochondrial matrix\] ([25]) A complex with several known subunits but uncertain complete composition and stoichiometry: GPI-N-acetylglucosaminyltransferase ([26])

EntitySet \[superclass\]

Two or more physicalEntities grouped because of a shared molecular feature. The superclass for CandidateSet, DefinedSet, and OpenSet. While sets are, by default, homogeneous (members having the same PhysicalEntity class), they are not required to be. For example, the defined set platelet alpha granule contents (481033) contains, as members, EWASs, Complexes and Sets.


A group of entities hypothesized to perform a specified function. These hypothetical members of the set are identified as values of the hasCandidate slot. Entities known to perform the function can be identified as values of the hasMember slot. One or more hasCandidate values are required; hasMember values are optional. Example: Raptor \[cytosol\]. Two splice variants of Raptor mRNA encode closely related proteins. One (member) has been shown to participate in formation of active mTORC complex; the other (candidate) is thought to do so. ([27])


Two or more physicalEntities, grouped to denote interchangeable function. Thus the addition of a single nucleotide residue during RNA transcription could be annotated with the definedSet NTP \[nucleoplasm\] (members ATP, CTP, GTP, and UTP) as input. This is useful to prevent combinatorial explosion. Any kind of physicalEntity can belong to a definedSet. Example: Cdk4/6 \[nucleoplasm\], with Cdk4 and Cdk6 as hasMember values. This set is used to annotate the formation of a single complex, "Cyclin D:Cdk4/6 \[nucleoplasm\]", which in turn is an input entity for a single Reactome event, "Phosphorylation of Cyclin D:Cdk4/6 complexes". ([28]) The creation of single events that have entity sets as inputs, outputs, and the physicalEntity value of catalystActivities is preferred over the creation of eventSets in which each member event involves a different member of an isozyme family as its catalyst.


Deprecated - all instances of this class have been moved to other classes (simpleEntity, genomeEncodedEntity) and this class will shortly be removed from the data model.


A well-characterized polypeptide or polynucleotide whose sequence is unknown and which thus cannot be linked to external sequence databases or used for orthology inference. Example: triokinase \[cytosol\] ([29])


A full-length protein, RNA, or DNA or fragments of them. It must be linked to a protein or polynucleotide sequence in an external database entered as the value of referenceSequence. By default, an EntityWithAccessionedSequence corresponds to the entire protein or polynucleotide described in the external database. To annotate a fragment, the numbers of its first and last residues, following the numbering scheme used in the external database, are entered as values of the startCoordinate and endCoordinate slots. A value of 0 indicates an unknown coordinate. Default start and end coordinates for a full-length sequence entity, assigned automatically if no values are provided by the curator, are 1 and -1, respectively. (-1 is a Perl usage which means the last element of an array; here, the last residue). Separate EntityWithAccessionedSequence instances are needed for each subcellular location (compartment) in which a molecule is found, e.g., kallikrein light chain \[extracellular\] and kallikrein light chain \[plasma membrane\]. Example: A full-length protein: name (url) Example: A protein fragment: name (url)


Entities that we are unable or unwilling to describe in chemical detail and which, therefore, cannot be put in any other class. OtherEntity can be used to represent complex structures in the cell that take part in a reaction but which we can't/don't want to define molecularly.
Example 1: Cell membrane. In a case in which protein X associates with the membrane, but the actual membrane component(s) with which protein X interacts are unknown, the membrane can be represented as an "OtherEntity.
Example 2:kinesin-1, a microtubule motor protein, is involved in all kinds of movement in the cell, by 'walking' along microtubules, while dragging things like mitochondria, secretory vesicles, parts of the golgi, etc. They bind to these complicated structures that we would not want to describe molecularly and which we can create as "otherEntities".
Example 3: Holliday structure \[nucleoplasm\] ([30])


Molecules that consist of indeterminate numbers of repeated units, and complexes whose stoichiometry is variable or unknown. The repeated unit(s) (identified in the repeatedUnit slot) can be any PhysicalEntity. The presence of more than one repeatedUnit value implies that the relative numbers of units in the polymer are unknown. If the units are present in known proportions, form a complex of the appropriate numers of units and use it as the repeatedUnit. The size range of a polymer can be specified with minUnitCount and maxUnitCount values. Examples: -\’glycogen\’ with \’glucose\’ as repeatedUnit. -\’fibrin multimer\’ with \’fibrin "monomer"\’ (itself a Complex) as repeatedUnit. -A microtubule consisting of equal amounts of alpha and beta tubulin would be constructed as polymer containing a Complex of alpha and beta tubulins in the repeatedUnit slot. -Completely hypothetical Example: A complex consisting of 1 "part" of A and "4 "parts" of B (i.e. 1:4 ratio) would be represented as a polymer with a complex of one A and 4 B as its repeatedUnit. -Another hypothetical Example: a complex where the ratio of individual building blocks A and B is unknown or variable is represented as a polymer containing A and B directly in the repeatedUnit slot.


A defined chemical species not encoded directly or indirectly in the genome, typically a small molecule such as ATP or ethanol. The detailed structure of a simpleEntity is specified by linking it to the information provided for the molecule in the ChEBI of KEGG external databases via the referenceEntity slot. (Use of KEGG is deprecated. Use ChEBI entities when available.) Separate simpleEntity instances are needed for each subcellular location (compartment) in which a molecule is found, e.g., ATP \[cytosol\] and ATP \[nucleoplasm\].


This attribute is associated with a person instance. It is used to specify the non-human Reactome project with which a Reactome curator is associated. A curator should create a separate "Person" instance for curation that is done for another project (e.g Gallus) and should use that person instance in any curation done for the Gallus project. If the "project" attribute of a Person instance is not specified, the Person instance is assumed to be associated with the human Reactome project.


A local copy of the PSI-MOD ontology. Instances of this class are used to create descriptions of chemically modified residues of proteins (see xxx). If a needed modification is not already present in gk_central, look up its identifier (a five-digit number) at the PSI-MOD web site; copy-paste it into the "create new instance" form in the curator tool, and allow the wizard associated with the form to retrieve all other needed data from PSI-MOD. Once created, such an instance (like all other reference instances in gk_central) should not be modified. If an instance is needed that is not already in PSI-MOD, or if a PSI-MOD instance appears to be incorrect, contact PSI-MOD to resolve the problem.


This class holds the x-y coordinates of the arrows representing reactions in the "starry-sky" view of the Reactome data set. These coordinates are automatically generated by the visualization tool and should not be manually edited. This class will become obsolete when the new ELV-based web site is fully implemented.


This class describes the source (database) of an identifier in the DatabaseIdentifier (and SequenceDatabaseIdentifier) instance. Generally there shouldn\’t be a need to create RefrenceDatabase instances. Contact if you think you need a new ReferenceDatabase instance. Slots: AccessURL- template used to form the url that will be used to link to a particular record in an external db. Generally this should not need to be touched. Contact the [] list if you think any changes are necessary. URL- URL for site which gives "summary information" about the database including a description of what information the db contains. This slot should not need to be modified manually.

ReferenceEntity \[superclass\]

A ReferenceEntity captures the invariant features of a molecule such as its names, molecular structure and links to external databases like UniProt or ChEBI. The ReferenceEntity forms an explicit link between PhysicalEntities like \’Glucose, extracellular\’ and \’Glucose, cytosolic\’, indicating their identical chemical nature. ReferenceEntities are not used in Reactions directly; they are attached to the PhysicalEntities involved. ReferenceEntities usually don\’t need to be created by curators; they are imported automatically. In the case of as yet unexisting ChEBI entries, contact Bernard to request it from ChEBI.


Chemical groups from the ChEBI database.


Individual chemical molecules from the ChEBI database.


Obsolete (superseded by ChEBI hierarchy) - delete from glossary?
Classes of chemical molecules. Not implemented yet.

ReferenceSequence \[superclass\]

Molecules with an accessionedSequence. This class is a subclass of referenceEntity, but is also the superclass of ReferenceDNASequence, ReferenceGeneProduct, and ReferenceRNASequence, so instances of it should not be created manually..


DNA molecules with an accessionedSequence.


Protein molecules with an accessionedSequence. If the specific isoform of a protein involved in an annotated event is not known, the ReferenceGeneProduct is used as the referenceEntity for an EntitywithAccessionedSequence.


If experimental data show that a function is due to a specific isoform of a protein, then the referenceIsoform is used as the referenceEntity for an EntitywithAccessionedSequence.


RNA molecules with an accessionedSequence.


Contains group count for chemical characterization. Does not need to be filled in by curators.


In Reactome, reactionlike events may be regulated by PhysicalEntities. The description of an instance of regulation includes the regulated entity (Event) and the regulator (PhysicalEntity). Regulation may be positive (if the Regulator facilitates an Event), negative (if the Regulator inhibits an Event) or a Requirement if the regulator is required for the Event to occur. Note that this class is mainly to be used when the exact nature of the regulation is not known as yet. If a mechanistic connection between the two instances can be established, the preferred way is to give these details by creating the appropriate reactions. No instances of the Regulation class need to be created in this case, as the connection becomes evident via the shared physical entities. Attributes: regulator- Event,PhysicalEntity or CatalystActivity which is regulates the regulatedEntity (required).
literatureReference- The reference that describes the Regulation "event"
regulatedEntity- Event or CatalystActivity that is regulated (required)
summation- Text description of the regulatory event (optional).
figure- The URL for the figure describing the regulation event (optional).


This describes an Event that is negatively regulated by the Regulator (e.g., allosteric inhibition, competitive inhibition).


This describes an Event involving gene expression that is negatively regulated by the direct binding of the regulator entity to the gene or its mRNA.


This describes an Event that is positively regulated by the Regulator (e.g., allosteric activation).


A regulator that is absolutely required for an Event to happen.


This describes an Event involving gene expression that is positively regulated by the direct binding of the regulator entity to the gene or its mRNA.


A free text description of the nature of Regulation, e.g allosteric inhibition.


A free text description of an event or physicalEntity. Citations that provide useful background information, but that are not sources of primary data for the event or entity, can be linked to summations. (LiteratureReferences providing experimental data for the details of an event should be linked directly to the event.) Slots: Text- description of the event LiteratureReference- paper(s) that support the text description, but not the primary evidence for the event Example: "The binding of RAD51 to ssDNA may be facilitated by associat ..." ([31])


Taxon describes the organism in which Reactome events occur PhysicalEntities exist. The list of Taxons in Reactome is a subset of those listed in the NCBI Taxonomy database. Although other groupings (SuperTaxons) are listed under Taxon (i.e., Class, Family, Genus) only the instances of species should be applied. The species slot should be filled out for all Events and all PhysicalEntities which are or involve species specific PhysicalEntities (i.e. EntityWithAccessionedSequence).


see Taxon, this class contains the instances that can actually be applied to reactions and entities

Reactome Slots


This slot contains the name of the instance that will be displayed on the Reactome web site. It is automatically filled in by the curator tool each time a new instance is created, using slot values supplied for the instance by the curator. For example, the _displayName of an event consists of the event\’s manually generated name and its manually selected species.


Holds the numerical portions of identifiers assigned to GO terms in the local copies of the GO ontologies. Should not be manually edited.


template used to form the url that will be used to link to a particular record in an external db. Generally this should not need to be touched. Contact Imre if you think any changes are necessary.


contains the GO Molecular Function ontology. An Activity term is used to cross-reference a Reactome CatalystActivity. Curators should never create instances of class Activity directly. If curators want to use a GO term that is not in Reactome (yet) they should mail the list about it.


This is a slot for the affiliation class. Enter the address of the research institute (etc).


An Affiliation is the name and address of an institution, e.g., Cold Spring Harbor Laboratory / 1 Bungtown Road, Cold Spring Harbor NY USA. Affiliations can be created for any Person, but at present are created only for Persons actively involved in the Reactome project as authors or curators.


This slot must be filled in for instances of LiteratureReference (to provide the names of the authors of a particular reference) and for InstancEdit (to provide the name of the curator who generated the instanceEdit).


This slot is used for SequenceDatabaseIdentifier class instances and contains any sequence related comments that are provided by the reference database. It is filled in automatically. Curator should not handle this.



Choose a CP or a GP that have this CR as a component reaction. Will be removed as this is captured by HasComponent for the superpathway.


The concurrentEventSet class is obsolete - remove this slot deinition from the glossary?
This slot contains the events that constitute a concurrentEventSet. A concurrent event set consists of 2 or more Events that occur simultaneously and which utilize one (or more?) PhysicalEntity referred to as the FocusEntity, is common to all of them.


refers to the amino acid residue location at which a modification occurs within a protein.

created (on Summation)

Automatically filled in. This shows up on the frontpage if this summation belongs to an event on the Reactome front page.


This slot is for holding references to the equivalent things in other databases. If the instance this slot is attached to represents an event the DatabaseIdentifier put into this slot must also point to some sort of event/processs/reaction/happening in some other db.


Holds database identifiers from external databases like UniProt and ChEBI (but not GO – see "Accession" slot). Should not be manually edited.


Obsolete slot - delete from glossary?
Put in the date the topic was accepted, in the format YYYY-MM-DD


Obsolete slot - delete from glossary?
Put in the date the topic was revised, in the format YYYY-MM-DD


Obsolete slot - delete from glossary?
Put in the date the topic was submitted, in the format YYYY-MM-DD

dateTime (on InstanceEdit)

A timestamp generated automatically when the central database is updated from a curator tool project. Do not edit it manually.


These are the unique stable identifiers applied to each instance in the Reactome database. They are generated automatically when newly created instances are first submitted to the database. They should never be modified or manually created by a curator.


This slot is used in Event, PhysicalEntity, Activity, Evidencetype and GO_BiologicalProcess instances. The definition of a Reactome Activity is the official Gene Ontology definition for the equivalent GO Molecular Function term. The GO_BiologicalProcess definition is the offcial Gene Ontology definition for the corresponding Biological process term. The Evidencetype definition is taken from the GO evidence code definitions.


This is found in the class \’SequenceDBI\’. This is for the \’Description DE\’ line from SwissProt.


Choose or create a Person instance/s for the editor/s. Used to determine the information for the frontpage.


This entry corresponds to the e-mail address of a "person"


the amino acid residue location at which an EntityWithAccessionedSequencepart ends.


choose or add the url for the figure that represents this CR, in the format, figures/xxxx.jpg.


The first name of a person. Optional slot – can be left blank. Reactome uses only the first letter of the first name (entered separately in the initial slot) to create names of people for display on the web site.


The concurrentEventSet class is obsolete - remove this slot deinition from the glossary?
In a ConcurrentEventSet, the focusEntity is the PhysicalEntity that is common to all of the concurrent events.


The chemical formula of a simpleEntity. For example, ATP = C10H16N5O13P3.


This slot is for storing information about what is the gene that this transcript/protein originates from. This is really just a shortcut. If the current instance is a gene identifier then this should be left empty. Accepts multiple values in order to be able to point to the SAME gene in different databases, e.g. EMBL, HUGO, Ensembl.


GN lines from sequence record.


holds instances of PhysicalEntity that are components of a Complex.


holds instances of Event that are grouped into a Pathway. They should be entered in the order as they appear in the Pathway. On BlackBoxEvent this slot holds Event instances to describe known steps within the BlackBoxEvent, e.g. the reactions of the fatty acid beta oxidation cycle.


takes individual reactions for an instance of ReactionlikeEvent that represents a group of reactions via the use of an EntitySet. Such individual reactions do not need to be spelled out, unless they have a distinct feature like a literature reference or specific regulation.


this slot holds the actual database identifier number for a given databaseidentifier instance.


points to the event or entity in another species that this event/entity has been inferred from. If the inference is based on computation only, this is indicated under evidenceType (= IEA).


This slot holds the initials of a person.


The "input" physicalentities of a given event are each entered individually.


Only used for GO classes

journal (on LiteratureReference)

The name of the scientific journal in which the reference was published (or the title of a book). For references retrieved from PubMed with the curator and author tools, this slot is filled automatically.


contains keywords associated with sequence. This is pulled in (if available) automatically from the external DB

literatureReference (on event)

The primary paper(s) that provide evidence for the event.

literatureReference (on summation)

paper/s that support the text description, but not the primary evidence for the event.This shows up as hyperlinked lines on a web browser.


For a given modifiedResidue instance, enter the simpleEntity that represents the specific modification of the residue within the modified protein. For example, for the "ModifiedResidue" instance: "Orthophosphate on Serine \[nucleus\] 428 of SPTREMBL:O46469", the modification would be "Orthophosphate".


Filled in automatically.


a short textual description of the event/entity/etc. For some classes it is defining attribute, so it should be chosen carefully to be unique.


points to equivalent events in other species. In contrast to \’inferredFrom\’ this attribute is attached to the events in both species - it only indicates equivalence, not inference.


The "output" physicalEntities of a given event are each entered individually.


The inclusive page numbers of a literatureReference. For references retrieved from PubMed with the curator and author tools, this slot is filled automatically.


points to the preceding event(s), which is usually events whose output is used as input for the present event. The preceding event can also point to a pathway, but make sure the connection is always given on the reaction level as well (this is important when it comes to visualization).


Obsolete slot - remove this definition from glossary?
This slot is for storing information about what are the protein that this transcript/gene produces. This is really just a shortcut. If the current instance is a protein identifier then this should be left empty. Accepts multiple values in order to be able to point to the same proteins in different databases, e.g. EMBL, HUGO, Ensembl, RefSeq.

pubMedIdentifier (on literatureReference)

The pubmed identifier (number only). This information must be manually supplied by the curator.


Deprecated. Do not use this slot. Remove this entry from glossary?


takes an input component that is essential for this event to happen, e.g. a defined domain of a protein. Use Domain, with the appropriate coordinates filled in.
Note: To be used only for \’non-trivial statements\’. For example, repeating an entity here that is identical to a PhysicalEntity given for input is not helpful.

relatedSpecies (on event)

This slot is used in events involving more than one species (e.g event involving host-pathogen or symbiotic interactions) to denote the "bystander" species in the event. For example, in the event such as flagellin of Escherichia coli binds to human TLR5, the process is occurring on the human cell membrane and involves protein from the bystander bacterial species. Thus, human would be entered in "species" slot and E.coli would be entered as "relatedSpecies". For additional information and use cases, see this document.


This is to hold the residue (like serine or tyrosine) which is modified. The value is instance of class ReferenceGroup or ReferenceMolecule, i.e. an instance representing serine.


Choose a Reaction that is the reverse of the current Reaction.

reviewer (on Summation)

The name of the person who reviewed the event described in the summation. These slot values are the source of the reviewers given for high-level events listed on the Reactome table of contents.


Generated automatically.


This slot holds the name of the species in which the described physicalentity or Event is occurring. The “species” slot on event identifies the environment where the reaction is happening. This should almost always be human for human Reactome curation. The exceptions are when we have annotated pathways in other species for inference (e.g Mus ) or when we are specifically annotating another species pathways as part of another project (e.g Gallus and Mycobacterium tuberculosis). In disease or immune system processes, the host would be listed as the “species”.
When annotating a process involving host-pathogen interactions, the relatedSpecies slot should be used to list the pathogen (bystander) species. See the relatedSpecies attribute definition and use cases for a more detailed description. **Note: When creating a chimeric reaction to be used for inference (i.e one that involves entities from multiple species and (usually) observed in vitro) ALL species should be listed in the species slot.


the amino acid residue location at which an EntityWithAccessionedSequence starts.


Text that succinctly describes the Event.


This refers to the "parent" for a given taxon within the taxonomy hierarchy. For example the SuperTaxon for both Rattus and Mus is Murina and the SuperTaxon for Murina is Muridae. These entries are assigned automatically as dictated by the imported NCBI_Taxonomy hierarchy and should not be altered manually.


The last name of a person.


Holds the text of the summation.


allows making a reference to a general process that underlies a BlackBoxEvent, e.g. 'Gene Expression' as template for the synthesis of a specific protein.


The title of a literatureReference. For references retrieved from PubMed with the curator and author tools, this slot is filled automatically.


This slot is for storing information about what is the transcript that this protein originates from or that this gene produces. This is really just a shortcut. If the current instance is a transcript identifier then this should be left empty. Accepts multiple values in order to be able to point to the same gene and proteins in different databases, e.g. EMBL, Ensembl, RefSeq.


Holds URL for site which gives "summary information" about the database including a description of what information the db contains. This slot should not need to be modified manually. Contact Imre if any changes are necessary.


The volume number of the journal in a LiteratureReference. For references retrieved from PubMed with the curator and author tools, this slot is filled automatically.


The year a LiteratureReference was published. For references retrieved from PubMed with the curator and author tools, this slot is filled automatically.