SBML At Reactome

From ReactomeWiki
Jump to: navigation, search

SBML At Reactome

Introduction

SBML stands for "Systems Biology Markup Language", it is an XML format used for exchanging biological models. You can find out more about its development and uses at www.sbml.org. Reactome is not per se a database of models, but it captures in some detail the events which occur in a cell at a molecular level. Hence, it contains information that is likely to be useful to the modeling community. So, in order to make it easy for this community to extract information from Reactome, we have implemented tools for exporting Reactome content as SBML.

In Reactome, we think in terms of pathways and reactions. Pathways may contain sub-pathways, which may in turn contain sub-sub-pathways, and so on, in a hierarchical manner. The pathways at the roots of these hierarchical trees are our "canonical" pathways. These encompass areas of biology that scientific consensus has agree upon. They also have curator-drawn pathway diagrams associated with them. Reactome has a well-developed concept of multi-component entities, which includes hieraarchically constructed complexes, and sets.

An SBML file defines a single model, which is made up of reactions. Reactions can have kinetics, can be located in specific subcellular compartments, and can have reactants, products and catalysts ("species"). Composite species are allowed in SBML, though they have no internal structure. SBML has a plugin, called the "Layout Extension", which allows 2D layout information for reactions to be recorded.

Reactome does not know anything about kinetics, but it does know about compartments, reactants, products and catalysts. SBML does not have the concept of hierarchical pathways, nor does it know about hierarchical complexes or sets. So, the SBML that we generate from Reactome:

  • Contains a single model. If the model was generated from a Reactome pathway, it will be given the same name as the pathway.
  • Has no kinetics associated with any of the reactions - these will need to be added by hand.
  • Does not contain subpathway information.
  • Does not preserve the detailed structure that Reactome's composite entities provide.

How to Get SBML Out of Reactome

There are a couiple of ways that you could use Reactome to generate SBML for you, and in this section, the options will be explained in some detail.

Pathway Based SBML Export

If you are interested in a specific area of biology, then there may already be a Reactome pathway that covers it. To start off with, go to the Reatome front page at www.reactome.org, and enter a few terms that you think are specific to your biology into the search box. If Reactome has something relevant, you may get many results back, possibly a mixture of proteins, small molecules, reactions and pathways. Generally, the pathways will appear right at the beginning of the list of results, but you can also use the filtering feature built into the search to limit the results to just pathways.

If you click on one of the pathways, you will be taken to a page which shows a diagram for that pathway. At the foot of the page will be a so-called details pane, showing a tabular summary of the pathway. If you do not see this, look for the oarange arrow at the bottom of the page containing a "+" symbol. Clicking on this will display the details pane.

Scroll to the bottom of the pane. You will see a row labelled with "Download pathway in one of the formats". Click on the [SBML] link. This will generate SBML relevant to the selected pathway.

Identifier Based SBML Export

If you have a list of identifiers for the species in the model that you wish to build, then you could use these to query Reactome. Rectome is build mainly on human data, but uses orthology projection to construct pathways for about 20 other organisms, mainly popular reference organisms, such as mouse or yeast.

From the Reactome front page, click on the button labeled "Analyse Expression Data". This analysis can also deal with simple identifier lists, so don't worry about the "Expression" part of the name. You will be taken to a page where you can upload your identifier list. You can either paste the identifiers into the text area provided, or upload them as a file. Reactome can automatically recognise a wide range of identifier types, such as UniProt or Entrez Gene, your best bet is to try it out with the identifiers that you have, and see what you get.

Click on the "Analyze" button and wait. It may take 30 seconds or more before the results are returned in tabular form. What you will see is a list of Reactome's canonical pathways, arranged alphabetically by name. The column that you will be most interested in is labeled "% in data". This tells you what percentage of the species in this pathway match with the species in your identifier set. If you click on the title of the "% in data". column, the table will reorder itself, putting the pathways with the highest percentages at the top.

If you are lucky, one pathway will contain a much higher percentage of overlap than any of the others. This is the pathway to use as a starting point for creating your model. Click on the "View" button, and you will be taken to the pathway diagram page for that pathway. From this point on, you can follow the instructions in the section Pathway Based SBML Export to export SBML.

If more than one pathway has a high percentage, then you will need to repeat the above process, and combine the models by hand. This will require some caution. Reactome SBML generation will ensure that reactions, species and compartments have unique IDs, based on Reactome's internal IDs for those things, but the meta-IDs may overlap in unpredictable ways, and you will probably need to generate a new set of unique meta-IDs.

Using the SBML Servlet

Reactome provides a servlet that it uses for generating SBML, based on user-supplied parameters, e.g. pathway DB_ID. The SBML is returned to the sender of the request. Both GET and POST requests are accepted, and are dealt with in a similar way, though you get a small amount of extra functionality if you use POST requests, see below for more details. You can also access this servlet from within your tools and web pages, if you wish.

E.g. if you want to use this servlet for doing GET requests, you might have a URL like this:

http://www.reactome.org/ReactomeGWT/entrypoint/sbmlRetrieval?LAYOUT=SBGN&ID=535734&merge=1

You can also set the servlet's URL to perform the action in an HTML form. In this case, you might wish to use a POST request. This makes it easier if you have large numbers of identifiers to send, because there is no limit to the amount of information that you can send using a PORT request, wheras most browsers limit the length of URLs to a few hundred characters.

The following parameters are understood by this servlet (case is not important):

  • host The name of the host providing the MySQL server. You would not normally need to provide this.
  • db The name of the Reactome database that is being used as a source of information for generating SBML. You will normally want to provide this.
  • user Username for the MySQL server. You would not normally need to provide this.
  • pass Password for the MySQL server. You would not normally need to provide this.
  • port Port for the MySQL server. You would not normally need to provide this.
  • rid A list of Reactome reaction DB_IDs that will be used to create an SBML model. These may be comma-separated in both GET and POST requests; in POST requests, newlines may also be used as separators.
  • id A list of Reactome pathway DB_IDs that will be used to create an SBML model. These may be comma-separated in both GET and POST requests; in POST requests, newlines may also be used as separators.
  • level Specify the SBML level to be generated. Optional.
  • version Specify the SBML version to be generated. Optional.
  • engine Specify the SBML engine to be used. libSBML and JSBML are currently available; defaults to JSBML.
  • concat Merge chains of reactions into single reactions. Optional.
  • layout Include pathway layout information into SBML. By default, no ayout will be added. The options are "Extension" and "SBGN". Note that "Extension" is only available if you use the libSBML engine.
  • filter Define filters to constrain what gets put into the SBML. By default, no filters are applied.
  • squeeze Auto-generate plausible kinetics for reactions using SBMLsqueezer. By default, no kinetics is generated.
  • squeezesvlt Specify a servlet URL for SBMLsqueezer. By default, use the one running on www.reactome.org.
  • LIST_[_A-Z]+_NAMES Request information useful for building filters - only available with POST requests.

Filters allow you to filter reactions and pathways prior to SBML generation. E.g. you may only want pathways from a given species, such as Homo sapiens. Or you may wish to constrain your reactions to a given subcellular compartment, such as the cytosol. Filters are defined as lists of comma-separated parameters.These are repeating sequences of the form:

<filter type>, <Instance class>, <Attribute>, <Term1>, <Term2>, ....

E.g. "inc", "Pathway", "species", "Homo sapiens", "exc", "ReactionlikeEvent", "name", "Decarboxylation", "Electron transport", ....

Filter type can be either "inc" or "exc", standing for inclusion or exclusion, respectively. Instance class should be a Reactome instance class, and Attribute should be an attribute appropriate for that instance class. Any number of Terms can be specified, these are the values that are filtered against.

Note: "inc" and "exc" are reserved words in the filter parameter list, you should avoid using these words in Instance classes, Attributes or Terms.

In order to help you to build forms incorporating filters, the following parameter can be sent to the servlet:

LIST_<Instance class>_<attribute>_NAMES

The Instance class should be one that is valid for Reactome, e.g. "Pathway", and the attribute should be appropriate for the instance class, e.g. "name". The servlet will return a list of all values known for this attribute, separated by newlines. You can use these values to populate e.g. a dropdown in your form, allowing the user to choose one.

The Contents of a Reactome SBML File

The SBML generated by Reactome is structured roughly as follows:

  1. Reaction layout as SBGN, embedded in model annotation (optional).
  2. Authors of Reactome data, plus list of change dates, embedded in model annotation.
  3. Reaction layout as Layout Extension (optional).
  4. List of compartments.
  5. List of species.
  6. List of reactions.

Compartments, species and reactions will be described in more detail in the following sections.

List of Compartments

Reactome assigns all of its reactions and species to subcellular compartments, and the compartments relevant to the model are added to the list of compartments. A compartment generated from Reactome will provide the following information:

  1. A unique ID, based on Reactome's internal ID for the compartment (DB_ID).
  2. A name, e.g. cytosol.
  3. A meta-id.
  4. An SBO term, which basically says "this is a subcellular compartment".
  5. A GO ID for the compartment.

List of Species

The listed species will derive from the reactants, products and catalysts that participate in the listed reactions. Species can be subdivided into small compounds, proteins, nucleic acid polymers and composites. This will be reflected in the assigned SBO term. A species generated from Reactome will provide the following information:

  1. A unique ID, based on Reactome's internal ID for the species (DB_ID).
  2. A name, e.g. water.
  3. A meta-id.
  4. An SBO term, which reflects the species type.
  5. A compartment, specifying where the species is found.
  6. Notes, which indicate the Reactome type that the species was derived from, and, if the species is composite, provide information about the hierarchical composition of the set or complex.
  7. A list of bqbiol:is descriptors, which provide IDs for the species in external databases. This will include UniProt ID, if the species is a protein and ChEBI ID if the species is a small compound. The Reactome stable ID for this species is also included in the list.
  8. For composite species, a list of bqbiol:hasPart descriptors, which give the components that make up the composite.

List of Reactions

There is a close match between the information that Reactome stores about a reaction and the information that can be represented by SBML. A reaction generated from Reactome will provide the following information:

  1. A unique ID, based on Reactome's internal ID for the reaction (DB_ID).
  2. A name, e.g. "Cleavage of DNA by DFF40", or, in the case of a merged reaction, contains concatenated IDs of all of the reactions making up the merge.
  3. A meta-id.
  4. reversible=false - Reactome's reactions are always uni-directional.
  5. Notes. Either taken directly from the summary in the source Reactome reaction, or, in the case of a merged reaction, contains the summaries of all of the reactions making up the merge.
  6. A list of bqbiol:is descriptors, which provide IDs for the reaction in external databases. These will include the Reactome stable ID for the reaction, plus GO terms for Biological process and/or Molecular function.
  7. A list of bqbiol:isDescribedBy descriptors, which provide PubMed literature references supporting the reaction.
  8. A list of reactants, which will contain references to species (with stoichiometry, when appropriate).
  9. A list of products, which will contain references to species (with stoichiometry, when appropriate).
  10. A list of modifiers, which will contain references to species.

Configurable Features of the SBML Generator

A number of options and tools are available within the Reactome SBML generator, which are described in the following sections.

SBML Engine

Reactome does not generate SBML from scratch, we prefer to use existing APIs, which we call SBML generation engines, or simply, engines. We currently have 2 engines available, which are user selectable:

  1. JSBML This is the one we use by default. It is written in native Java, which is convenient for us, because our code is also Java, and to use it, we only need to import a JAR file. It is open source.
  2. libSBML Offers a much wider range of functionality than JSBML, but it is written in C++, so we need to use JNI to connect to it, and the libraries need to be installed somewhere on the system. Also, we have found that it occasionally crashes our Tomcat server.

We also plan to support the CellDesigner engine at some point, watch this space.

SBML Level and Version

Both engines support levels 1 and 2 for all versions; JSBML tends to lag behind libSBML in the number of versions that it supports at level 3. Both level and version are user selectible, if you select something beyond the capabilities of the engine, then expect some odd behavior.

Reaction Concatenation

In Reactome, we attempt to be as complete as possible about all of the reactions that occur in a given pathway. This can sometimes lead to very long unbranching chains of reactions - e.g. take a look at:

http://www.reactome.org/PathwayBrowser/#FOCUS_PATHWAY_ID=535734&ID=77286

This amount of detail may not be convenient for modelers, so we have incorporated a feature which concatenates all of the reactions in such chains into a single reaction. This reaction has as its reactants the reactants of the first reaction in the chain. It's products are the products of the last reaction in the chain. Its catalysts are the catalysts of all of the reactions in the chain.

If you use this tool, all unbranching chains in your selected pathway will automatically be merged into single reactions.

Including Reaction Layout

Reactome curators create diagrams for the pathways that they build. The layout for these diagrams is stored, and the SBML generator has features which allow the layout to be added to the SBML file. There are currently 2 layout generators available:

  • SBGN This is included by default if the user generates SBML via Reactome's pathway browser. It is inserted into the model's annotation in its own namespace. This is not completely kosher, but it does at least produce valid SBML.
  • Layout Extension This is the "official" way of putting layout into SBML files. Unfortunately, the JSBML engine doesn't support it, so it is only available if you use the libSBML engine.

We are also planning to incorporate CellDesigner layout at some point.

Filtering

When building a model, you may find that Reactome is providing you with a lot more than you really need. Filtering might help you to reduce the unnecessary clutter in your exported models. You can use it to:

  • pick out reactions that occur only in specific subcellular compartments;
  • constrain the model to one specific organism;
  • and much much more.

In order to effectively use filtering, you need a reasonably good understanding of Reactome's data model.

Generating Reaction Kinetics

Reactome is primarily a pathway database, and as such, does not provide kinetics for the reactions that it contains. However, there is a utility, SBMLsqueezer (check it out), which can be used to find plausible kinetics for reactions. The Reactome SBML exporter gives you the option to run SBMLsqueezer, if you want to.

The current implementation uses SBMLsqueezer 1.3, encapsulated as a servlet. You can supply an optional servlet URL if you want to be specific about where the conversion gets done.

A new version of SBMLsqueezer is currently in development, which will not only provide improved parameter guessing, it will also give the option to import kinetics from SABIO-RK, if available. We intend to switch to this version once it is out, watch this space!