Systematic Peptide Names

From ReactomeWiki
Jump to: navigation, search

Introduction

The vast majority of the molecules participating in Reactome pathways are proteins. Surprisingly, there is no universal authoritative source of names for proteins and no agreed vocabulary that encompasses cleaved peptide fragments or post-translationally modified forms. Reactome frequently represents a protein in many forms, perhaps as the initial translated form, as fragments following processing or cleavage, or as a peptide that has a post-translational modification. To improve naming consistency and avoid ambiguity in names we have developed a systematic nomenclature that can be used to name peptides. We also have have a simple set of rules for naming mRNA molecules, genes and small molecules

Process

The majority of peptide names have been generated by a scripted process, new peptide instances are named manually and verified at the time they are first made visible as part of a Reactome quarterly update. Some peptides are exempt from the naming process to prevent name duplications or because the peptide represents a modification or state that is not currently included in the naming process. See the Exemptions section below for more details.

Explanation Of Systematic Names

Gene symbol core

Reactome peptide names use HGNC gene symbols as the 'core' of the name. We obtain these indirectly from UniProt via the Reactome referenceEntity.

Peptide coordinates suffix

Reactome peptides refer to UniProt. Unless otherwise indicated the peptide sequence we represent is that given by UniProt's'Chain' feature, part of an annotation group called Molecular Features. This represents the 'default' peptide. When the peptide represented in Reactome is identical to the peptide represented by the UniProt Chain, the name used in Reactome is the gene symbol. If the UniProt record has no chain feature, more than one chain feature, or the start and end peptide coordinates of the Reactome peptide do not agree with the UniProt Chain, the start and end coordinates of the peptide are added in brackets as a suffix to the gene symbol. Unknown coordinates are represented as '?' symbols.

e.g. Caspase-9 precursor, with peptide coordinates start:1 end:416 is named CASP9. The large and small subunits of caspase-9 are respectively named CASP9(1-315) and CASP9(316-416).

An N-terminal fragment of Aggrecan, where the exact cleavage position is unknown would be named ACAN(17-?).

Note that Reactome peptide coordinates always refer to the UniProt peptide, even when the literature convention is to number a cleaved fragment following the removal of a signal peptide or initiating methionine. This combination of gene symbol and coordinates is usually sufficient to generate a unique name but can fail if a peptide is cleaved at multiple unknown locations. When this is the case, peptides are named manually, while following the sytematic naming as closely as possible.

Post-translational modification

Post-translational modifications (PTMs) are shown as a prefix to the gene symbol. To see the full list of post-translational modification prefixes see http://wiki.reactome.org/index.php/Systematic_Peptide_Names

Reactome annotation identifies the coordinate postions of PTMs when this is known but for brevity, most PTM prefixes do not include the modified peptide coordinate. The exceptions are di- and tri- lysine methylation, lysine acetylation, ubiquitination and phosphorylation. For these PTM types the coordinates are necessary to avoid name duplications.

PTM prefixes for phosphorylation include, when known, the coordinate and a residue letter to indicate the residue that is phosporylated. Phosphorylations are ordered by peptide coordinate.

If there are more than 4 occurrences of any PTM type, or in the case of phosphorylation subtype, the coordinates are not included, instead the prefix code is preceded by the number of occurrences and 'x'.

The Reactome database represents PTMs as modifiedResidue annotations. These use PSI-MOD terms as their primary external reference. PSI-MOD terms can be searched here. PSI-MOD terms are cross-referenced to the RESID database. The PTM prefix(es) used in Reactome lookup table (see below). Some infrequently used PTM types are not represented here.


Examples of phosphorylation prefixes:

  • p-Y139-DAPP1 is DAPP1 phosphorylated on tyrosine-139
  • p-Y150,S343,T346-WASF2 is WASF2 phosphorylated on tyrosine-150, serine-343 and threonine-346. Note that the phosphorylations are ordered by coordinate.
  • p-Y55,S112,S121,Y227-SPRY2 - note that the ordering is by coordinate, phosphorylations are not grouped by subtype.
  • p-Y-GAB2 is GAB2 phosphorylated on a tyrosine, but the coordinate position of this tyrosine is unknown.
  • p-GLI3 is GLI3 phosphorylated but both the subtype and position are unknown.
  • p-7Y-KIT is KIT phosphorylated on seven tyrosines. The coordinates are omitted from the name as there are more than 4 tyrosine phosphorylations.

Ubiquitination commences with the attachment of ubiquitin to a lysine residue, often followed by the addition of multiple ubiquitin peptides, which can be cross-linked at several positions in the ubiquitin protein.

K63polyUb-13,57-p-Y200-XYZ1 is XYZ1 with K63 cross-linked polyubiquitin attached to residues 13 and 57 and a phosphorylation on Y200.

When phosphorylation and other PTMs occur in combination, the phosphorylations are detailed last in the prefix:

2xPalmC-MyrG-p-S1177-NOS3(2-1203) is NOS3 peptide fragment 2-1203 with 2 two palmitoylated cysteines, one myristoylated glycine and a phosphorylation on serine-1177.

Exemptions

A small number of Reactome peptides do not currently follow the systematic naming described above.

Note that referenceEntity is a Reactome term describing a key external reference, from which our internal molecular records are derived. For most proteins this is UniProt.

Exemptions are made when:

  1. The peptide has a universally understood common name. In these cases the systematic name will be retained as an alias name.
  2. The peptide has the word 'mutant' in its name, indicating that the peptide has a disease-associated mutation.
  3. The peptide has an annotation in the Disease field, again indicating that it is an abnormal peptide associated with a disease process.
  4. The referenceEntity is a referenceIsoform with variantIdentifier > 1. This avoids applying coordinates for the canonical peptide to an isoform.
  5. The peptide has a modification that is not a simple modifiedResidue instance. This applies to peptides with unusual modifiedResidue types such as GroupModifiedResidues and Internal peptide crosslinks.
  6. The peptide name contains the word 'active', which is used in Reactome to indicate a peptide that has an active conformation, but has a peptide chain that is identical to an inactive precursor.
  7. The peptide is cleaved at more than one unknown position.

A spreadsheet listing all current exemptions is available here

Gene and mRNA names

Genes are named using the gene symbol followed by the word 'gene' in lowercase. Messenger RNA molecules are named using the gene symbol followed by the abbreviation 'mRNA'.


PTM Lookup Table

MOD Prefix Letter MOD preferred name
MOD:00036 3D (2S-3R)-3-hydroxyaspartic acid
MOD:00037 5Hyl 5-hydroxy-L-lysine
MOD:00038 3Hyp 3-hydroxy-L-proline
MOD:00039 4Hyp 4-hydroxy-L-proline
MOD:00041 CbxE L-gamma-carboxyglutamic acid
MOD:00046 p- S O-phospho-L-serine
MOD:00047 p- T O-phospho-L-threonine
MOD:00048 p- Y O4'-phospho-L-tyrosine
MOD:00064 AcK N6-acetyl-L-lysine
MOD:00065 AcC S-acetyl-L-cysteine
MOD:00068 MyrG N-myristoylglycine
MOD:00083 Me3K N6,N6,N6-trimethyl-L-lysine
MOD:00084 Me2K N6,N6-dimethyl-L-lysine
MOD:00085 MeK N6-methyl-L-lysine
MOD:00087 Myri N6-myristoyl-L-lysine
MOD:00091 ArgN L-arginine amide
MOD:00111 FarC S-farnesyl-L-cysteine
MOD:00113 GGC S-geranylgeranyl-L-cysteine
MOD:00115 PalmC S-palmitoyl-L-cysteine
MOD:00125 Hypu Hypusine
MOD:00126 Btn N6-biotinyl-L-lysine
MOD:00127 Lipo N6-lipoyl-L-lysine
MOD:00128 PXLP N6-pyridoxal phosphate-L-lysine
MOD:00130 Alys L-allysine
MOD:00134 GlyK N6-glycyl-L-lysine
MOD:00159 PpantS O-phosphopantetheine-L-serine
MOD:00160 N4GlycN N4-glycosyl-L-asparagine
MOD:00162 GlcGalHyl O5-glucosylgalactosyl-L-hydroxylysine
MOD:00163 GalNAc O-(N-acetylamino)galactosyl-L-serine
MOD:00164 GalNAc O-(N-acetylamino)galactosyl-L-threonine
MOD:00166 GlcY O4'-glucosyl-L-tyrosine
MOD:00167 GPIN N-asparaginyl-glycosylphosphatidylinositolethanolamine
MOD:00168 GPID N-aspartyl-glycosylphosphatidylinositolethanolamine
MOD:00170 GPIG N-glycyl-glycosylphosphatidylinositolethanolamine
MOD:00171 GPIS N-seryl-glycosylphosphatidylinositolethanolamine
MOD:00239 MetC S-methyl-L-cysteine
MOD:00274 CysS L-cysteine persulfide
MOD:00300 ADPRib L-glutamyl-5-poly(ADP-ribose)
MOD:00314 CHOL glycine cholesterol ester
MOD:00342 MeL N-methyl-L-leucine
MOD:00369 AcS O-acetyl-L-serine
MOD:00390 DecS O-decanoyl-L-serine
MOD:00437 Far farnesylated residue
MOD:00438 MYS myristoylated residue
MOD:00465 dHF dihydroxyphenylalanine (Phe)
MOD:00599 Me monomethylated residue
MOD:00685 dNQ deamidated L-glutamine
MOD:00696 p- phosphorylated residue
MOD:00752 RibC adenosine diphosphoribosyl (ADP-ribosyl) modified residue
MOD:00798 HC half cystine
MOD:00803 CysY 3-(S-L-cysteinyl)-L-tyrosine
MOD:00804 GlcS O-glucosyl-L-serine
MOD:00812 FucS O-fucosyl-L-serine
MOD:00813 FucT O-fucosyl-L-threonine
MOD:00814 XylS O-xylosyl-L-serine
MOD:00835 OxA L-3-oxoalanine (Ser)
MOD:00971 OxoH 2-oxo-histidine
MOD:01024 HP monohydroxylated proline
MOD:01148 Ub ubiquitinylated lysine
MOD:01152 CO carboxylated residue
MOD:01228 IY monoiodinated tyrosine
MOD:01381 PalmS O-palmitoleyl-L-serine
MOD:01625 SOG 1-thioglycine
MOD:01688 HN 3-hydroxy-L-asparagine
MOD:01699 H+ protonated residue
MOD:01777 CysO S-(glycyl)-L-cysteine (Cys-Gly)
MOD:01880 Dhp L-deoxyhypusine
MOD:01914 GalHyl O5-galactosyl-L-hydroxylysine
MOD:00076 Me2sR symmetric dimethyl-L-arginine
MOD:00077 Me2aR asymmetric dimethyl-L-arginine
MOD:00078 MeR omega-N-methyl-L-arginine
MOD:00219 Cit L-citrulline