File/Import virtual machine
infgraph
, and the password inf135
etomp4v7
system-config-keyboard
system-config-language
your machine/configuration.../USB
, then uncheck everything.start menu/control panel/system and security/system
, on Linux type cat /proc/cpuinfo
in a terminal). You processor must be 64bits with virtualiztion support. If your processor fill this requirements, you have to enable virtualization in your BIOS (next step), else you will not be able to install the virtual machine on this host.Devices/Shared clipboard/bidirectional
. Then try to copy paste somethingHGNC_id
HGNC_id Sign
Database_id Molecule_id
Database_id Molecule_id Sign
HGNC_id
the alphanumeric identifiant derived from HUGO Gene Nomenclature CommitteeSign
is the observed variation of the molecule quantity between two conditions: +
for an increase, -
for a decrease and ?
for unobserved or unrelevant variations.Database_id
describes where the id comes from. Valid values are Transpath
and HGNC
.Molecule_id
is the ID of a molecule from the database.# your comment
. If you do so, everything after the #
will be ignored until the end of the line. You do not have any way to escape the #
, so do not use it in your IDs..gz
files are assumed to be compressed, .anything-else
are assumed to be uncompressed.PPARA #The molecule PPARA in HGNC
PPARG + #The molecule PPARG in HGNC increases
Transpath MO000000327 #The molecule MO000000327 (here ATP) in Transpath
Transpath MO000000328 - #The molecule MO000000327 (here ADP) in Transpath decreases
in/data
.
in/data/empty.txt
An empty file.in/data/blacklist-remove.txt
Very common molecules, that have been removed from our analysis before network construction (i.e. ATP, ADP, protein remnants, NDP, NTP, sp1, P, CoA, H2O and H+).in/data/hubs/blacklist-xxx.txt
The top xxx
molecules involved in most reactions (so called hubs). xxx
values are 100, 500, 1000, 1500, 1800, 2000, 2200, 2500, 3000 and 4000.A job is a text file containing one or more job{...}
sections. Each of them describes where to write the results [1], how to select relevant regulated reactions from Transpath [2,3,4,5,6], how to convert them into an influence graph, and how to analyze this graph [7,8,9]. In order to get a working job file, copy paste the following template (or download it). You only have to change the numbered commented lines.
This template search for key regulator of glycolysis enzymes described in in/data/glycolysis.txt
(download). After defining where to put the result files [1], the first step is dedicated to totally ignore very common molecules described in in/data/blacklist-remove.txt
(download) [2]. Then, a subgraph of Transpath is computed by taking the two first levels of neighborhood of the molecules implied in glycolysis [3,4], without using the top
xxx
(in this template, xxx
= 2000) hubs[5]. This graph is then converted to an influence graph, which is used to find key regulator of enzymes participating to glycolysis [8]. In this context, no more observations are available on variations of molecules [9].
job{ | |
name = out/glycolysis-results | #[1] default output files prefix |
filter_spaimr{ | #Totally ignore some molecules |
type= no_effect | |
blacklist = in/data/blacklist-remove.txt | #[2] Remove the molecules from this list |
} | |
filter_spaimr{ | #Work the neighborhood of a list of molecule |
type = neighbor_nohub | |
num = 1 | #[3] neighborhood level (WARNING see details) |
max_hub = 0 | |
roles = spaim | |
startlist = in/data/glycolysis.txt | #[4] take the neighborhood of molecules from this list |
blacklist = in/data/hubs/blacklist-2000.txt | #[5] compute the neighborhood without using this molecules |
output = out/glycolysis-neighbor.txt | #[6] log neighborhood results here |
} | |
compute_filter_spaimr{} | |
compute_influences{ | #build an influence graph from the selected regulated reactions |
max_balance = 0 | |
compute_balance = no | |
compute_prod_by_unknown = yes | |
} | |
stats_influence{} | |
cneighbor{ | #search for key regulators (you can write this block multiple times, with differents output file prefix) |
id=out/glycolysis-key | #[7] output file prefix for key regulators results |
targets =in/data/glycolysis.txt | #[8] a list of molecules describing the regulated molecules |
observed=in/data/empty.txt | #[9] a list of molecules (that may be empty), containing additionals informations on molecules variations |
} | |
write_full_graph{} | |
} | #end job |
num
reactions from the molecules described in the startlist
. The parameter value num=0
means all molecules from the list, all the reactions where this molecules are implied and all molecules implied in such reactions. This is illustrated on the following picture that describes two reactions (r1:A+B→C and r2:C→D), where the startlist contains only A and the blacklist is empty. With num=0
, the reaction r1 and the molecules A,B and C are selected ; with num=1
the reactions r1 and r2, and the molecules A,B,C and D are selected.
num=1
the reactions r1 and r2 are selected. Then all the molecules implied in these reactions are taken into account, therefore A,B,C and D will be selected.
infgraph/bin
. Then right click on the folder background and choose open a terminal here
./cpp-4b-Graph in/your_job_file.conf
. You can add multiple job files by separating each file path by a space. For example, to launch the gylcolysis example type ./cpp-4b-Graph in/glycolysis.conf
.out/joblog.txt
collects job status of all launched jobs. In order to be sure that everything worked, you should start by reading it.xxx.log
contain details of the job named xxx
.targets
and observed
of the cneighbor{...}
block) are summarized in a file called xxx.mapping
, where xxx
is the filepath defined in the id
field of the cneighbor{...}
block. Unmapped molecules are ignored during the analysis.
xxx-stats-yyy.txt
files, where xxx
is the job name.
nb_inf
Total number of influencesnb_inf+
Number of positive influencesnb_inf-
Number of negative influencesnb_inf?
Number of unknown influencesnb_nodes
Number of nodesnode_id
The node IDedges_in
Number of edges where this node is a targetedges_out
Number of edges where this node is a sourceedges_sub
edges_in + edges_outscore_coverage_all
is the total number of regulated molecules. High score means ubiquitous regulators of everything.score_coverage_targets
is the number of regulated molecules among the molecules described in the list defined in targets =mylist.txt
. High score means generic regulators of the targets i.e., regulators that regulates many target but maybe many other things.score_hypergeometric
is computed by an hypergeometric test between the number of regulated target (i.e., score_coverage_targets), and the number of regulated molecules (i.e, score_coverage_all). High value means that the molecules regulate more targets than non targets. Note : a molecule that only regulate a single target, and nothing else has an high score_hypergeometric value.score_product
This is the product between score_hypergeometric and score_coverage_targets. High value means specific regulators : i.e., regulators that regulates many targets and few other things.score_rnd
A random value, used for debug purpose onlyxxx.scores
where xxx
is the filepath defined in the id
field of the cneighbor{...}
block. This file is a tabulation separated array composed of the following columns.
molecule
The Transpath ID of the moleculename
The Transpath name of the moleculescore_id
The corresponding score (ids are explained in the above section)score_value_all
When the molecule is unobserved, the max of score_value_plus and score_value_minus. when the molecule is observed, the score corresponding to its variation.score_value_plus
The score value when the molecule increasesscore_value_minus
The score value when the molecule decreasesxxx-score-score_zzz.graphml.gz
, where xxx
is the output prefix used in the job description with name=xxx
and zzz
is the score type. Such files can be opened by any graph program that is compatible with graphml format, as Cytoscape with the graphml plugin, and uncompressed with gunzip yyy.gz
or 7zip. Each file contains a graph where molecules are annotated with the corresponding score. Depending on your analysis, the program can produce very large graph that can overflow the capacity of your viewing software or your RAM, so check the generic statistics before loading this graphs.