Reactome FI Cytoscape Plugin 4
- 1 Overview
- 2 Download and Launch the Reactome FI plugin
- 3 Use Reactome Pathways
- 4 Use the Reactome Functional Interaction (FI) Network
- 5 Other Features Related to the FI Network
Reactome Cytoscape Plugin is designed to find pathways and network patterns related to cancer and other types of diseases. This plugin accesses the Reactome pathways stored in the database, help you to do pathway enrichment analysis for a set of genes, visualize hit pathways using manually laid-out pathway diagrams directly in Cytoscape, and investigate functional relationships among genes in hit pathways. The plugin can also access the Reactome Functional Interaction (FI) network, a highly reliable, manually curated pathway-based protein functional interaction network covering over 50% of human proteins, and allows you to construct a FI sub-network based on a set of genes, query the FI data source for the underlying evidence for the interaction, build and analyze network modules of highly-interacting groups of genes, perform functional enrichment analysis to annotate the modules, expand the network by finding genes related to the experimental data set, display pathway diagrams, and overlay with a variety of information sources such as cancer gene index annotations. For an example how we use Reactome FIs for cancer data analysis, please see our publication: A human functional protein interaction network and its application to cancer data analysis.
Download and Launch the Reactome FI plugin
Reactome FI Cytoscape Plugin 4 needs Cytoscape 3.0 or above. If you have not installed Cytoscape 3.0 or above, please download it from Cytoscape's web site: http://www.cytoscape.org. After launching Cytoscape, use menu "Apps/App Manager" to open the "App Manager" dialog, and search for "ReactomeFIPlugIn". You should see the Reactome FI plugin listed in the middle panel (See figure below. You may see a different version number). Choose the found plugin, and then click the "Install" button at the bottom of the dialog. Follow the procedures to finish the installation.
Note: The Java Web Start version supports Cytoscape 2.7 with Reactome FI Plugin 3.0 or below (See Download and Launch FI Plugin 3.0). To use our Reactome pathway features, please use plugin 4 with Cytoscape 3.0 by installing a desktop version of Cytoscape 3.0 or above.
Use Reactome Pathways
Showing Reactome pathways directly in Cytoscape is a new feature in the Reactome FI plugin version 4.0. Using this feature, you can load pathways in the Reactome database into Cytoscape, visualize Reactome pathways in either the native pathway diagram view or the FI network view, do pathway enrichment analysis for a set of genes, and check genes from your list in hit pathways.
Explore Reactome Pathways
- Load Reactome pathways: Use menu "Apps/Reactome FI/Reactome Pathways" to load pathways into Cytoscape. The loaded pathways are organized in a hierarchical way as in the Reactome web application (http://www.reactome.org/PathwayBrowser/), and listed in the left side "Control Panel" as a tab called "Reactome".
- View pathways in Reactome: After selecting a pathway in the pathway hierarchy, you can choose "View in Reactome" from the popup menu (right click in Windows or Control-click in Macs to get the popup menu)
Note: The ancestor pathways (container pathways) for a selected pathway are displayed in the middle panel, "Selected Event Branch", in the Reactome tab. You can click an ancestor pathway in this middle panel to view the clicked pathway's location in the original pathway hierarchical tree. However, the ancestor pathway will not be selected in the original tree. This is a designed behavior to keep the selection in the original tree.
- Search pathways: Choose "Search" in the popup menu to bring up the search dialog. The found pathway(s) will be highlighted in blue in the pathway tree.
Note: Search will be against all loaded pathways, not limited to the selected pathway and its contained sub-pathways.
- Open pathway diagram: Pathways in Reactome are organized in a hierarchical way. Not all pathways have their own pathway diagrams. A smaller pathway (called sub-pathway) may be drawn in a bigger pathway, which has its own pathway diagram. Most of top-level pathways (called modules or super pathways) are used to organize related pathways (e.g. Disease, Signaling Transduction), and therefore contain only rectangle boxes representing canonical pathways.
- Show Diagram: If a selected pathway has its own pathway diagram, you can choose "Show Diagram" in the popup menu to open its pathway diagram into the central Cytoscape desktop.
- View in Diagram: If a selected pathway is laid-out as a sub-pathway in a bigger one, you can choose "View in Diagram" in the popup menu to view its drawing in its container pathway. Reactions contained by the selected pathway will be highlighted in blue after the diagram is opened. For example, see pathway "G1/S DNA Damage Checkpoints" opened in pathway "Cell Cycle Checkpoints" below:
- Search diagram: Objects displayed in a pathway diagram can be searched using "Search Diagram" from the popup menu (Right click in Windows or Control click in Macs without selecting any object in the pathway diagram to get the popup menu). The found objects will be selected and highlighted in blue.
Note: reactions will not be searched in the diagram. Use the search feature in the pathway tree to search for reactions.
- Export diagram: Displayed diagram can be exported as a PDF, JPG or PNG file. Use "Export Diagram" from the popup menu to export the displayed diagram.
- View in Reactome: Select an object, and then right-click (or control click) to get the popup menu. Choose "View in Reactome" to view the selected object in the Reactome web application.
- List Genes: Genes contained by a complex or protein set, or a gene related by a displayed protein can be viewed by using a menu item "List Genes" after selecting an object. For example, the following dialog shows genes contained by complex hBUBR1:hBUB3:MAD2*:CDC20. Clicking a gene symbol will bring you to the web page for that gene in the GeneCard web site.
Display Reactome Pathways in the FI Network View
- Display pathway in the FI network view: A Reactome pathway can be converted into a functional interaction network using the method we have established (see A human functional protein interaction network and its application to cancer dat analysis). Use "Convert as FI Network" in the popup menu brought up by right-clicking (Windows) or control-clicking (Macs) an empty area without any selection in the pathway diagram panel. The original pathway diagram will be moved to the bottom-left corner, and a new FI network will be generated based on the original pathway diagram, which will be displayed in a new network panel.
Note: sub-pathways contained by the displayed pathway will be extracted into the FI network too.
- Explore objects in the pathway and network views: Object selection in three views has been synchronized. Objects that can be selected include: events in the pathway tree view, objects in the pathway view at the bottom-left corner, and genes and FIs in the network view. You can select an object in one of three views, and corresponding objects in other two views should be selected too. Also you should use features implemented in popup menus in each individual view to explore objects as in a single view.
Note: Using Cytoscape's built-in "Saving Session" feature can save the converted FI networks from pathways. However, displayed pathways cannot be saved into a session file for the time being. We will implement this function in a future release.
Pathway Enrichment Analysis
- Pathway enrichment analysis: A list of genes can be used to check if Reactome pathways have been enriched. To do this, use the popup menu item, "Analyze Pathway Enrichment" (below left figure), to get the dialog for choosing a gene set file (below right figure). You can use a gene set file in one of three file formats: one gene per line, all genes in the same line and delimited by commas, or all genes in the same line and delimited by tabs.
Note: Dependent on the size of your gene list, it may take over 1 minute for running the pathway enrichment analysis. Pathways used in this feature are different from Reactome pathways for annotating a FI network or network modules. Here all about 1,000 pathways have been used. For annotation, only a subset of Reactome pathways, which have been pre-selected for a certain size, are used.
- View enrichment analysis results: Pathway enrichment results are displayed as a table in the "Table Panel" at the bottom of the main Cytoscape window as pathway annotation results. You can use "View in Diagram" to view hit pathways in the pathway diagram view, and use "Export Annotations" to save the results in the table. Pathways in the Reactome pathway tree are highlighted in different colors based on their FDR values. Objects containing genes from your gene list are highlighted in a purple background with a white font in the pathway diagram view. Hit genes are displayed in a thick purple border in the FI network view for a hit pathway.
Note: Hit genes are displayed with same colors in the "Gene List" dialog from the "List Genes" feature.
Probabilistic Graphical Model based Pathway Analysis
Warning: This is an experimental feature. Please use results with caution. The implementation of this feature may change in the future.
We adapted the PARADIGM approach for Reactome pathways by converting reactions drawn in pathway diagrams into factors in factor graphs, a type of probabilistic graphical models (PGMs). For details about the PARADIGM approach, see: Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. For introduction to factor graphs, see this wikipedia entry: Factor Graph. For test purposes, you can download two sample data files for 100 TCGA ovarian cancer patients: CNVs and mRNA gene expression. The original TCGA OV files were downloaded from the Broad Institute Firehose web site.
- Convert a pathway to a factor graph: As before, open the diagram for a selected pathway. In the pathway diagram panel, use popup menu, "Convert to Graphical Model" to convert a pathway diagram into a factor graph. You can provide a list of small molecules specified by their names that should not be included in the converted factor graph to control the final size of the factor graph by using the following "Escape Names" dialog:
The implementation is based on the PARADIGM approach. A protein in the pathway diagram has been extended into four related nodes in the factor graph based on the central dogma: node for protein activity, node for protein expression, node for protein's mRNA, and node for protein's DNA. The following screenshot shows protein "YAP1" in pathway, YAP1- and WWTR1 (TAZ)-stimulated gene expression, has been converted into four nodes. We use YAP1's DB_ID as labels for its related nodes in the converted factor graph to get a compact display of the factor graph.
You may view pre-assigned values for a selected factor node, which is rendered as a small open rectangle without label in the factor graph view, by using a popup menu, Reactome FI/View Factor Values. The dialog will be popped up to show values for the selected factor (see the dialog at the bottom-right corner in the following screenshot).
- Load observation data: In order to calculate pathway activities for a data set, we need to load observation data. Use popup menu in the factor graph view, Reactome FI/Load Observation Data, to enter a CNV file and a gene expression file using the browse buttons in the "Load Observation Data" dialog (see below). (Note: you may load either a CNV file or a gene expression file, or both). The factor graph inference implemented in this app uses discrete values (three states 0, 1 and 2 for down, normal and up) by discretizing CNV and gene expression values based on the entered threshold values. (Note: you may try our example data files for testing by following links at the top of this section.)
After the observation data is loaded into the factor graph, extra variable nodes (i.e. nodes ending with _mRNA_obs and _DNA_obs) are created and attached to central dogma variable nodes. For example, for YAP1, two observation nodes and two factor nodes linking these observation nodes to central dogma nodes have been added: 1253322_mRNA_obs and 1253322_DNA_obs. The values for these two factors (see two dialogs at the bottom of the following screenshot) have been pre-trained by using the TCGA data sets with the EM algorithm. We will provide features for users to enter factor values and learn these parameters for their own data sets in the future.
Note: You may try some auto-layout feature provided by Cytoscape to make the factor graph viewable after loading observation data.
- Run inference: Use popup menu, Reactome FI/Run Inference, to run inference. You can choose an inference algorithm from the "Inference Algorithm Configuration" dialog and then specify properties for the selected algorithm. We use an open source factor graph C++ library, libdai, for inference. For a detailed explanation about properties for each inference algorithm, please click the "Help" button in the dialog (see below):
It may take a while for performing inference depending on the structure of the factor graph and the sizes of your data files. You may abort the inference if it takes too long. After the inference is finished, you can investigate the results in three places: the IPA Pathway Analysis tab, the IPA Node Values tab, and the Output Analysis Results dialog (the dialog at the top-left corner of the second screenshot below). The two tabs are displayed at the Cytoscape's Table Panel, and the dialog is invoked by clicking the "View Details" button at the top of the IPA pathway analysis tab.
Note: We use the Mann-Whitney U test to calculate p-values for IPA differences between the user's data set and a random data set generated from the user's file in the Output Analysis Results dialog. All other p-values are calculated based on results from 1000 random samples generated from the user's uploaded data files. FDRs are calculated using the Benjamini-Hochberg procedure. Use the popup menu, "Show/Hide Columns for pValues/FDRs" to show or hide p-Values and FDRs (see the above screenshots).
Use the Reactome Functional Interaction (FI) NetworkAfter the Reactome FI Plugin installed, you should see a menu item called "Reactome FI" under the Apps menu. Clicking this menu, you will see 5 sub-menus: Gene Set/Mutation Analysis, HotNet Mutation Analysis, Microarray Data Analysis, Reactome Pathways and User Guide. Gene set/mutation analysis is for doing FI network-based data analysis for a set of genes or a mutation data file, HotNet mutation analysis for the HotNet algorithm to search for network modules (see http://compbio.cs.brown.edu/projects/hotnet/), microarray data analysis for doing MCL (Markov Graph Clustering, http://micans.org/mcl/) based FI network clustering analysis by converting a non-weighted FI network to weighted network using correlations among genes in the network, Reactome pathways for loading pathways from the Reactome database, visualizing Reactome pathways directly in Cytoscape in a their native way, and doing pathway enrichment analysis, and user guide brings you to this user guide.
Gene Set/Mutation Analysis
- Currently FI plug-in supports three file formats for gene set/mutation analysis:
- Simple gene set: one line per gene. For example, GWASFuzzyGenes.txt, a list of T2D GWAS genes.
- Gene/sample number pair. For example, GeneSampleNumber.txt, which contains two required columns, gene and number of samples having gene mutated, and an optional third column listing sample names (delimited by ";").
- NCI MAF (mutation annotation file). For example, GlioblastomaMutationTable.txt, the mutation file from the TCGA GBM project.
- Choose a FI network version from listed three versions.
Note: you may get different results using different FI network versions because a later version may contain more proteins/genes and more FIs. But based on our experience, a significant FI network module is usually stable across multiple versions.
- Choose a file containing genes you want to use to construct a functional interaction network. Select an appropriate file format and parameters to load genes and construct FI network in the dialog. Click the "OK" button to start the FI network building process.
- The constructed FI network will be displayed in the network view panel. A FI specific visual style will be created automatically for the FI network.
- The main features of Reactome FI plug-in should be invoked from a popup menu, which can be displayed by right clicking an empty space in the network view panel.
- Fetch FI annotations: query detailed information on selected FIs. Three FI related edge attribues will be created: FI Annotation, FI Direction, and FI Score. Edges will be displayed based on FI direction attribute values. In the following screenshot, "->" for activating/catalyzing, "-|" for inhibition, "-" for FIs extracted from complexes or inputs, and "---" for predicted FIs. See the "VizMapper" tab, Edge Source Arrow Shape and Edge Target Arrow Shape values for details.
- Analyze network functions: pathway or GO term ennrichment analysis for the displayed network. You can choose to filter enrichment results by a FDR cutoff value. Also you can choose to display nodes in the network panel for a selected row or rows by checking "Hide nodes in not selected rows". The letter in parentheses after each pathway gene set name corresponds to the source of the pathway annotations: C - CellMap, R – Reactome, K – KEGG, N – NCI PID, and B – BioCarta. The following screenshot shows results from a pathway enrichment analysis.
Tip: To analyze pathway or GO term enrichment on a set of genes that are not linked together, select the "Show genes not linked to others" option in the "Set Parameters for FI Network" dialog.
- Cluster FI network: run a network clustering algorithm (spectral partition based network clustering by Newman 2006) on the displayed FI network. Nodes in different network modules will be shown in different colors (different colors used only for first 15 modules based on sizes).
- Analyze module functions: pathway or GO term enrichment analysis for each individual network modules. You can select a size cutoff to filter out network modules that are too small, choose a FDR cutoff to view enriched pathways or GO terms under a certain FDR value, and view nodes in a selected row or rows only in the network diagram.
- Load Cancer Gene Index: load cancer gene index annotations. For details, see section Load Cancer Gene Index.
HotNet Mutation Analysis
Reactome FI Cytoscape plug-in implements the algorithm developed by Raphael's group at Brown University, called "HotNet", for doing cancer mutation data analysis. For details about this algorithm, please see Algorithms for detecting significantly mutated pathways in cancer, and Discovery of mutated subnetworks associated with clinical data in cancer.
- Select a mutation data file and run HotNet algorithm: After selecting sub-menu "HotNet Mutation Analysis" from menu Plugins/Reactome FIs, you would see the following dialog. Choose a version of FI Network, a mutation file from your local file system, and set parameters required by the HotNet algorithm. Currently the plug-in supports the NCI MAF mutation file only. We are going to support more file formats in the future. If you are not sure what delta value should be used, you may choose "Auto" in the dialog. However, using "Auto" takes much longer time to run the algorithm. Random permutation is used to calculate p-values and FDR values. The largest number of permutation is 1000. For details about permutation, please see the above two papers. After entering all required parameters, click the "OK" button to start HotNet analysis. It may take several minutes. If you choose "Auto" for delta, it takes even longer time. For a test run, you may use the TCGA GBM mutation file, GlioblastomaMutationTable.txt, and choose the 2012 version of FI Network with delta 1.0e-4.
- Select network modules and build a FI sub-network: The generated FI network modules from the HotNet analysis are listed in the HotNet result dialog (see below). In the dialog, you can choose a size cutoff, or a FDR cutoff. The displayed selected network modules will be used to build a FI sub-network after you click the "OK" button. In the dialog, you can also see the chosen delta value and the number of permutations. You may try different delta values for better results.
Microarray Data Analysis
The Reactome FI Cytoscape plugin can load gene expression data file, calculate correlations among genes involved in the same FIs, use the calculated correlations as weights for edges (i.e. FIs) in the whole FI network, apply MCL graph clustering algorithm to the weighted FI network, and generate a sub-network for a list of selected network modules based on module size and average correlation. The generated FI sub-network will be displayed in the network panel, and can be used for analysis as in Gene Set/Mutation Analysis. For details about this method, please see our publication: A network module-based method for identifying cancer prognostic signatures.
An array data file should be a tab-delimited text file with table headers. The first column should be gene names. All other columns should be expression values in different samples. The data set in the file should be pre-normalized. For example, see this gene expression file for breast cancer: NejmLogRatioNormGlobalZScore_070111.txt.zip. This data set was download from van de Vijver et al in 2002, and has been normalized.
- Select a microarray data file and run MCL network clustering: After selecting sub-menu "Microarray Data Analysis" from menu Plugins/Reactome FIs, you should see the following dialog. Choose a microarray data file, check if you want to use absolute values as weights for edges, and input an inflation parameter (-I) for the MCL clustering algorithm. The smaller the inflation parameter is, the bigger the average size of generated network modules. Based on our own experience, we use 5.0 for the inflation parameter, the highest recommended value, and choose the absolute value for edge weights. For more details on how to choose the inflation parameter, please see http://micans.org/mcl/. After you have set these parameters, click the OK button to load the data file, calculate correlations, and apply the MCL clustering algorithm.
- Select network modules and build a FI sub-network: The generated network modules are listed in the MCL clustering results dialog (see below). Only modules having more than 2 genes can be listed, and used in the FI sub-network building. You can choose a module size or an average correlation value (absolute value if absolute has been checked before) to filter out modules that may not be significant (Note: after set these cutoff values, please press the "Enter" key to commit your changes.). In our analysis, we choose modules having 7 or more genes with average correlation values no less than 0.25. These values have been used as default in the dialog. In the dialog, you can see how many modules and genes will be chosen for building FI sub-network under your selected filter values. Click the OK button to start the sub-network building. The built sub-network will be displayed, and can be analyzed as with sub-networks generated from the gene set/mutation analysis.
Other Features Related to the FI Network
Query FI SourceSelect an edge and right click it to get the popup menu for edge. Select a menu called "Reactome FI/Query FI Source". If a FI is extracted from curated pathways or reactions, a dialog for the original data source(s) will be displayed. Double click a row in the displayed table to show a detailed web page for the source of the FI. If the selected FI is a predicted one, the evidence for this FI should be displayed.
Fetch FIs for NodeAll FIs for a node can be queried. Select a node in the network panel, and right click it to get the popup menu for node. Select a menu called "Reactome FI/Fetch FIs". FI partners for the selected node will be displayed in two sections: partners have been displayed in the network and partners not displayed in the network. You can select partners from the second sections to expand the displayed network.
Show Pathway DiagramPathway diagrams can be shown for pathway hits. Select a pathway in the "Pathways in Network" or "Pathways in Modules" tab, and right click to get the popup menu for pathway. Select "Show Pathway Diagram" from the popup menu
Load Cancer Gene Index Annotations
Reactome FI plug-in can load NCI cancer gene index annotations for genes/proteins displayed in the network. There are two ways to show these annotations: use a popup menu called "Load Cancer Gene Index" when no object is selected (left figure), and use another popup menu "Fetch Cancer Gene Index" for a selected node (right figure).
By using the first method, the user can load the tree of NCI disease terms and display the tree in the left panel. The user can select disease term in the tree, all genes or proteins have been annotated for the selected disease and its sub-terms will be selected.
By using the second method, the user can view detailed annotations for the selected gene or protein. The user can sort these annotations based on PubMedID, Cancer type, and annotation status, and also filter annotations based on several criteria.
Survival AnalysisSurvival analysis is based on a server-side R script to do either coxph or Kaplan-Meier survival analysis. To do survival analysis, a tab-delimited text file containing at least three columns should be provided. The names of three columns should be: Samples, OSDURATION, and OSEVENT. For example, see this survival information file downloaded from van de Vijver et al in 2002: Nejm_Clin_Simple.txt, which has been simplified for our analysis purpose. To do survival analysis, use the popup menu "Analyze Module Functions/Survival Analysis..." (see below)
In the survival analysis dialog (below), double click the text field to select a file containing survival information for samples used to build the displayed FI sub-network (Note: you cannot do survival analysis if you use a gene set file only to construct the displayed FI subnetweork). You can choose either coxph or Kaplan-Meier model to do survival analysis. If you choose the Kaplan-Meier model, you have to select a module for analysis. In the Kaplan-Meier analysis, all samples will be divided into two groups: samples having no mutated genes in the selected module (group 1) and samples having mutated genes in module (group 2). It is recommended to run the coxph module first without selecting any module in order to see which module is most significantly related to survival times. After that, you can focus on some specific modules for survival analysis.
The results from survival analysis will be displayed in the right Results Panel with a tab labeled "Survival Analysis" (below left). You can do multiple survival analyses. All results returned from the server-side R script will be displayed in this panel with labels based on your parameter selections in the survival analysis dialog. The last result will be selected as default. At most three sections are displayed in the result panel for each analysis: Output, Error, and Plot. If no warning or error returned from an analysis, the error section may not be shown. Rows for modules having p-values less than 0.05 from coxph (all modules) analysis are displayed in blue with text underlined. You can click these modules to do a quick single-module based survival analysis without going through the above steps. Single module-based Kaplan-Meier analysis will show a plot file. You can click the file to view the actual plot (below right). You may need to save the plot file for your future use.