TXG-MAPr Manual

Introduction
TXG-MAPr
Module
Gene
Enrichment
Transcription factor
Experiment correlation
Module correlation
Heatmap In development
Preservation In development
Pathology In development
Upload
Authors and contact
List of abbreviations

Introduction

Toxicogenomic data in safety testing represent a critical source to uncover underlying mechanisms of drug-induced toxicities. Co-regulated gene network approaches can organize high dimensional toxicogenomic data, while not being biased by prior gene annotations, and help in identifying novel mechanisms and regulators of toxicity. We applied weighted gene co-expression network analysis (WGCNA) on the publicly available datasets in TG-GATEs on primary human hepatocytes (PHH). This dataset was processed with an unsigned WGCNA analysis that clustered the genes into functional modules, representing the bridge between individual gene variations and emergent global properties. The modules serve as a dynamic visualization of the transcriptome under experimental conditions (compounds, concentrations and timepoints). We developed a user friendly tool using the R Shiny package for visualization of the toxicogenomic network and analyzing the mechanism of toxicities, called the TXG-MAPr.

TXG-MAPr

The ‘TXG Mapr’ can help users to find modules showing high EGs for a specific compound and condition by selecting one of the tested compound, timepoint and concentration level in the model system.

TXG-MAPr module dendrogram

The EGs x treatment data matrix is organized into a folded hierarchical tree, or dendrogram, based on Ward’s hierarchical clustering of pair-wise Pearson correlations for each module across all treatment conditions. The module dendrogram displays modules EGs with a color and size scale: high absolute EGs correspond to larger circles, while small asbolute EGs correspond to smaller circles. EGs scores are depicted ranging from deep red (high positive EGs scores) to deep blue (high negative EGs scores). The user can filter the displayed modules by selecting an EGs cut-off value with the slider on the left. When selecting a (rectangle) area of the dendrogram, a table with the selected modules will be shown underneath the dendrogram (second table from above, on the right side). By double clicking on the rectangle selection, the area is zoomed. To visualize again the entire dendrogram, double click on any part of the tree after selection.

Example module dendrogram

Figure 1 shows an example of a module dendrogram that is generated in the ‘TXG Map’ tab (A), an example on how to select some modules from the module dendrogram (B), and an example of the generated table (C):

**Figure 1**

Module eigengene score table

Upon the selection of a compound and specific condition (time and concentration level), a table is generated (upper/first table), showing modules ranked for the highest absolute module eigengene score (eg_score) for the chosen condition. The number of genes for every module (module size) and the annotation (functional information of the corresponding module) are also shown in the table. The amount of entries for the table can be adjusted, and the user is able to search for a specific module within the search bar. Upon the selection of one of the modules (by clicking on the row), the position of the module in the dendrogram will be highlighted and other extra tables will appear underneath, which will be discussed in the next paragraphs.

Module gene expression table (log₂FC)

When a row in the module EGs table is selected, the gene expression data of the selected module genes will be shown in the second table (from above, on the left side). More details about the gene expression for all treatment conditions can be found in the “Genes” tab.

Module enrichment table

When a row in the module EGs table is selected, the terms enriched in the gene set (the module) are shown in the module enrichment table. More details about the module gene enrichment can be found in the “TXG-MAP enrichment” tab.

Enrichment

The ‘Enrichment’ tab allows the user to search for specific stress or toxicity response-associated terms and their associations with modules from the module dendrogram.

Modules were annotated by performing Over Representation Analysis (ORA) via Consensus Pathway DB (cpdb version 34, Kamburov et al., 2013). We included enriched terms showing hypergeometric test p-value of < 0.01 from the databases: BioCarta, EHMN, HumanCyc, INOH, KEGG, NetPath, Reactome, Signalink, SMPDB, Wikipathways, UniProt, InterPro, GO. Only terms-module combination showing p value < 0.001 are shown.

Enrichment term table

The user can type into the text search bar (search GO term) any text in order to look for terms containing such search. After searching for the term of interest, a table is generated with specific terms that contain the searched text in the term name. The source (database) can also be found in the table. Note that only exact matches with the text search are given, but it is not case sensitive. The output of the search also includes results that contain parts of the search. As an example, when searching a term like ‘endoplasmic reticulum’, results are given that contain the search term in their output (see Figure 2). When searching for ‘UPR’, which stands for Unfolded Protein Response, the output also includes results that contain the letters UPR, so this term might be too broad and the full term is advised to be used instead of an abbreviation (see Figure 3).

TXG-MAP module enrichment dendrogram

When selecting a specific term by clicking on a row in the table, the module dendrogram will be populated with circles corresponding to enriched modules for that selected term. The bigger and darker the circle is, the lower the p-value. A second table will also appear and will show the same information as the module dendrogram; the associated modules with accompanying p-value and adjusted p-value.
Finally, the user is able to choose a specific log₁₀ p-adj treshold in order to selectively show only some of the module-to-term associations.

Enriched modules table and plot

The (second) table describes the enrichment terms and shows the enriched modules, as was mentioned previously. By selecting one of the enriched modules, the module EGs plot is shown for that module (on the right). Besides, a third table will appear underneath that contains all information regarding the selected module. Selecting specific conditions in the graph (rectangle selection) will only show the EGs of the selected condition.

Example of a text search output

Below are two examplary figures shown of a text search output. The first search shows the output of text ‘endoplasmic reticulum’ and the generated output table is shown (Figure 2). The second search example shows the output for the text search ‘UPR’ and ‘unfolded protein response’ and the differences obtained in the resulting search terms (Figure 3). When selecting a term in the table, the enriched module for that term is shown in the TXG-Map dendrogram together with the enriched module plot upon the selection of one of the modules (Figure 4):

**Figure 2**

**Figure 3**

**Figure 4**

Module

The ‘Module’ tab allows the user to look at a specific module and learn, for example, which are the top activating compounds and which enrichment terms and TFs are associated with it.

Module circle plot

A specific module is selected and additional modules can be added (optional). The chosen module gene structure is then schematically shown in a circle plot (see Figure 5). This figure shows all genes clustered in the module and the thickness of the edges between the nodes (genes) is proportional to the extent of correlation between the genes across all conditions.

Module dose- and time-response plot

The dose- and time-response plots of the EG scores for all treatment conditions for the selected modules are visualized in the graphs. To minimize the visualization of all treatment conditions, the user is able to select a specific subset of compounds in the menu at the left side. These graphs are interactive and individual data points can be selected, whith the corresponding data displayed in the table underneath (left side) showing textual information. When selecting a specific condition from this table, the nodes in the module gene plot are colored and sized based on their log₂FC; the bigger the node, the higher the log₂FC and the more red-colored the gene is, the higher the log₂FC value is for that gene. An examplary figure is shown in Figure 5.

Module enrichment table

Below the module figure, the enrichment table is shown for the chosen module(s). Terms associated with the genes in the module are displayed, along with their source and number of overlapping genes, as was mentioned in the previous paragraph.

Transcription factor enrichment table

A transcription factor enrichment table is also shown for the module that can be found at the lowest part of the tab. This table shows which transcription factors are associated with the chosen module. Their p-values are also shown in the table, as well as the genes in the module that they are highest associated with. More details about the TF enrichment can be found in the “TF” section of this help page.

Example of physical interaction module network

Below are two examples shown of the same module network. Figure 5A is an example of a module network showing included genes (hPHH:62).Figure 5B shows the same module network when selecting for a specific condition (tunicamycin, 0.4 ug/mL, 8hr).

**Figure 5**

Gene

The ‘Genes’ tab allows the user to focus on specific genes, check which modules they belong and inspect compounds effect.

The user can select a specific gene and can further add additional genes to visualize (optional). All or one specific compound can be selected. After these choices, a table is generated showing the log₂FC and p-value of the gene for every compound and condition.

Dose- and time-response plots

Interactive graphs are shown at the left side that visualizes the up- or downregulation of the selected gene based on the log₂FC on the tested concentration levels and timepoints. The data used to generate these graphs can be found in the table below.

Module circle plot

The module network (at the right side) shows all genes involved in the module, together with selected gene(s). Upon selection of a specific condition in the compound-gene interaction table, colors will be indicated to the plot. A darker (red) color corresponds to a more positive log2FC and a lighter (green) color corresponds to a more negative log2FC, while the size of the circle corresponds to the EG scores.

Example of compound-gene interaction table

Figure 6 shows an example of a table that can be generated in the ‘Genes’ tab. For this example, the gene ATF6 was selected.

**Figure 6**

Transcription factor

The ‘TF’ tab allows the user to learn about specific transcription factors: which modules enrich for targets of selected TF and what is the activity of a selected TF based on the transcriptional variation of its targets.

The user can select a transcription factor and a specific compound. TFs scores, or activities, were estimated as normalized enrichment scores using the function viper from the viper package (Alvarez et al., 2016) with two confidence sets of TF-regulon from DoRothEA: “high confidence” = A,B,C and “high coverage” = A,B,C,D. The set of transcription factor regulons type can be selected from the drop down menu on the left side.

TF activity dose- and time-response plots

Dose- and time-response plots are shown for the selected TF of all possible conditions (treatment conditions, concentration levels and timepoints. These graphs are interactive and specific data points can be selected for additional textual details.

TF-activity table

The TF-score for the selected transcription factor is shown for the selected compound(s) at all available dose-levels and time points.

TF hypergeometric enrichment table

Below the plots a table showing the modules statistically associated with the chosen transcription factor is shown. A hypergeometric test was performed on gene members in each WGCNA module to identify its regulatory TFs using the function phyper in the stat package in R. The gene-set of TFs and their regulated genes (regulons) are derived from DoRothEA (Garcia-Alonso et al., 2018) with two sets of confidence levels: “high confidence” = A,B,C and “high coverage” = A,B,C,D. The enriched TFs with p-value less than 0.01 were included in the study. The number and identity of overlapping genes are provided for each TF-module pair.

Example of TF-score based graphs

Figure 7 shows an example of graphs that can be generated in the ‘TF’ tab. These graphs are generated based on the TF-score and with all the possible dose and time conditions. For this example, transcription factor ATF4 and its estimated activity at all possible compounds were chosen.

**Figure 7**

Experiment correlation

The ‘experiment correlation’ tab allows the user to compare different compounds based on the modules EGs and highlight the similar or dissimilar mechanisms.

To investigate the correlation between compounds, can select two compounds at a specific concentration level and exposure time in the menu on the left side.

Experiment correlation tables and frequency plots

For the two selected treatment conditions, a table is generated showing the Pearson correlation coefficients of modules EGs of the chosen condition with all the other conditions in the database. The table is ranked for the Pearson correlation coefficients (decreasing). A frequency plot indicates the Pearson correlation on the x-axis, and the frequency of that value on the y-axis. The red vertical line indicates the Pearson correlation value of the two selected conditions.

Experiment correlation plot

Right below the Experiment correlation table, the experiment correlation graph is displayed to visualize the modules EGs of the two chosen conditions. This graph illustrates how well the data points of both experiments are in agreement with each other. The straight blue line represent a linear fit of the data points. Grey shades areas represent the confidence interval. The graph is interactive, allowing the user to select specific data points from the graph that will then be shown in a table on the right side. Selecting one of the rows within this table which depicts the EGs for the treatment conditions for the selected modules, will show a more detailed overview of the modules and its correlating genes at the lower part of the tab.

Example of experiment correlation graph

Figure 8 shows two exemplary figures, with the graph showing an example of a compound correlation coefficient graph. This graph displays the two chosen compounds against each other to illustrate the similarity between the two data sets. For this example, the compounds Acetaminophen and Colchicine were selected with conditions 24-hour exposure time and a high dose. The resulting table shows the data after manual selection of six individual data points from the first figure.

**Figure 8**

Module correlation

The ‘Module Correlation’ tab allows the user to compare two different modules to find potential similarities in behaviour and function.

Module correlation tables and frequency plots

The user can choose two different modules to be displayed. For every chosen module, a table is generated, containing both the Pearson and the Spearman correlation coefficients for the chosen module and every other module in the network. The table is ranked for Pearson correlation coefficient (decreasing). The graph is a frequency graph showing the Pearson correlations on the x-axis, and the accompanying frequencies on the y-axis. These graphs also contain a red, vertical line, that indicates the position of the pearson correlation of the two selected conditions.

Module correlation plot

Below the module correlation table, a scatter plot displaying the EGs of the two chosen modules is shown (left side). This graph illustrates how well the two modules are in agreement with each other. The x-axis of the graph represents the data for the first selected module and the y-axis represents the second selected module, with each dot representing one treatment condition. The straight blue line of the graph represent a linear fit of the data points. Grey shades areas represent the confidence interval. This graph is also an interactive graph, in which selections can be made within the graph to select only a specific data points. The table on the right side will show more detailed information on the selected set of compounds.

Example of module correlation graph

Figure 9 shows an example of the module correlation plot and table and corresponding table. The graph displays the two chosen modules against eachother to illustrate potential similarity between the two modules. For this example, module 13 and 109 were chosen at the conditions of 24 hr exposure time with the highest treatment dosage (7500 ug/mL). The table shows the data points after manual selection of two individual data points depicted in the graph.

**Figure 9**

Heatmap

The ‘Heatmap’ tab is in development.

Preservation

The ‘Preservation’ tab is in development.

Pathology

The ‘Pathology’ tab is in development.

Upload

Background

In the upload tab it is possible to calculate new EGS on the modules using a new set of data uploaded by the user. In order to calculate new EGs for each module, log₂FC data is processed as follows. For each gene, a modified Z-scored log₂FC is calculated by dividing the log₂FC value for the standard deviation of that gene log₂FC values, calculated including all TG-GATEs data:

The Z-score of the log₂FC is not corrected for the mean log₂FC, since this is expected to be not equal to zero for the TG-GATEs dataset and would introduce a bias in the external uploaded data. The Z-score is further weighted by the gene correlation eigengene score (corEG), calculated as the correlation of each gene log₂FC profiles with the module EGs.

New EGs for each module are calculated by summing the Z-scored gene log₂FC values of the genes included in that module, normalized by the standard deviation all the raw scored for that module in the TG-GATEs dataset:

When data of certain genes in a module is not available in the uploaded dataset, then the Z-score for that gene is assumed to be zero, which may create an underestimation of the final module EGs. See Sutherland et al., 2016 for more details about the calculation of new EGS.

Uploading new data

New data can be uploaded by uploading one single or multiple .txt files.

The user is REQUIRED to upload a .txt file with the following structure:

experiment	gene_id	time	conc	log2fc	pvalue	padj
experiment	1	8	10	1	0.3	0.4
experiment	23	8	20	2	0.2	0.3
experiment	2	24	30	3	0.02	0.03
experiment	50	24	10	4	0.001	0.002

In addition, we encourage the user to also include time and conc to increase integration of uploaded data with the tool. Note: for time use the same time unit as the tool for optimal data integration.

experiment	gene_id	time	conc	log2fc	pvalue	padj
experiment	1	8	10	1	0.3	0.4
experiment	23	8	20	2	0.2	0.3
experiment	2	24	30	3	0.02	0.03
experiment	50	24	10	4	0.001	0.002

Rules for uploading

The experiment column needs to contain a character string which can only contain alphanumerical values (a-z A-Z 0-9). The addition of specials symbols (/ : ; . and -) is not allowed. For example, a treatment condition with tunicamycin 10 μM for 24h, will result in the experiment ID: tunicamycin_24_10.

The gene_id column needs to contain either a gene symbol, entrez identifiers or ensembl identifiers

If time amd conc are unknown, do not include the columns

Column names are case sensitive (please use exactly the same header/column names).

Upload a .txt file, tab-separated.

The uploaded conditions can be explored in the TXG-MAPr when processing of the upload file is completed.

To delete your uploaded data and abandon the session, press the Reset button. Data will not be stored or saved.

For every experiment uploaded, the user should include as many rows as the genes measured. Genes are identified in the gene_id variable. The user can upload as gene identifier the Entrez ID (http://www.ncbi.nlm.nih.gov/gene) or the HGNC gene names (first option suggested). Different experiments may have different genes measured/included.

For every gene, log₂ of the fold changes is included in column log2fc, together with pvalue and a p-value adjusted for multiple comparisons (padj). Only the log₂FC is used for the calculations of the EGs. When the other values are missing, the columns can be filled with NA or 1.

Authors and contact

Callegaro Giulia^1,#^ (g.callegaro@lacdr.leidenuniv.nl), Kunnen Steven J.^1,#^ (s.j.kunnen@lacdr.leidenuniv.nl), Mollon Jennifer², Trairatphisan Panuwat², den Hollander Wouter¹, Grosdidier Solène ³, Guney Emre⁴, Piñero Gonzalez Janet⁴, Webster Yue⁵, Sutherland Jeffrey J.⁶, Stevens James L.^1,5, van de Water Bob¹

^#^ Corresponding authors, Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands

¹ Leiden University, Leiden, The Netherlands

² AbbVie Deutschland GmbH & Co KG, Ludwigshafen, Germany

³ Erasmus MC, Rotterdam, The Netherlands

⁴ Hospital del Mar Research Institute (IMIM), Pompeu Fabra University (UPF), Barcelona, Spain

⁵ Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN, United States of America

⁶ Novartis Institutes for BioMedical Research (NIBR), Lincoln, Massachusetts, United States

The authors want to ackowledge Nadine Bergmann for her contribution to the help page.

This work was supported by Cosmetics Europe/CEFIC liver ontology project and by the EU-EFPIA Innovative Medicines Initiative 2 (IMI2) Joint Undertaking for as part of the TransQST project (grant number 116030), the Horizon 2020 EU-ToxRisk project (grant number 681002) and, eTRANSAFE project (grant number 777365). This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation program and EFPIA.

List of abbreviations

EG: eigengene score, a value that describes the level of (de)activation of a module in a particular condition and is equal to the first principal component of gene expression matrix of that module (see Horvath & Langfelder WGCNA tutorials 2011 for more information). EGs for uploaded data are calculated as described in the upload section.
TF-score: transcription factor score, this score quantifies the TF activity based on the expression level of its associated genes (see https://saezlab.github.io/dorothea/#dorothea for more information)
PearsonR: Pearson correlation, primary method to calculate module and compound correlation
log₂FC: log₂ fold change, measure used to illustrate the gene expression changes
FDR: false discovery rate, used in statistical tests to indicate the rate of type I errors. We here applied the Benjamini-Hochberg multiple testing correction to adjust the p-value.

TXG-MAPr Manual

Table of contents

Introduction

TXG-MAPr

TXG-MAPr module dendrogram

Example module dendrogram

Module eigengene score table

Module gene expression table (log2FC)

Module enrichment table

Enrichment

Enrichment term table

TXG-MAP module enrichment dendrogram

Enriched modules table and plot

Example of a text search output

Module

Module circle plot

Module dose- and time-response plot

Module enrichment table

Transcription factor enrichment table

Example of physical interaction module network

Gene

Dose- and time-response plots

Module circle plot

Example of compound-gene interaction table

Transcription factor

TF activity dose- and time-response plots

TF-activity table

TF hypergeometric enrichment table

Example of TF-score based graphs

Experiment correlation

Experiment correlation tables and frequency plots

Experiment correlation plot

Example of experiment correlation graph

Module correlation

Module correlation tables and frequency plots

Module correlation plot

Example of module correlation graph

Heatmap

Preservation

Pathology

Upload

Background

Uploading new data

Rules for uploading

Authors and contact

List of abbreviations

Module gene expression table (log₂FC)