Sotiris Touliopoulos

Logo

Contributor to GSoC 2024 with the GeomScale organization

View the Project on GitHub SotirisTouliopoulos/dingo

Pre- and post-sampling features to leverage flux sampling at both the strain and the community level

A contribution to the Google Summer of Code 2024 program

Overview

This document summarizes the methods implemented and integrated into the dingo package:

Preprocess

The concept of reduction in metabolic models is illustrated in the figure below created with BioRender [1]:

Network Reduction Concept

I implemented a PreProcess class that identifies and removes 3 types of reactions:

Example of using the reduce function of the PreProcess class on the E. coli core model:

cobra_model = load_json_model("ext_data/e_coli_core.json")

obj = PreProcess(cobra_model, tol = 1e-6, open_exchanges = False)

removed_reactions, reduced_dingo_model = obj.reduce(extend = False)

Explanation of the parameters and returned objects:

Users can choose to remove an additional set of reactions, by setting the extend parameter to True. These reactions do not affect the value of the objective function, when removed.

Reduction with the PreProcess class has been tested with various models [2], some of which are included in dingo’s [3] publication article too. A figure below shows the number of remained reactions, after calling the reduce function with extend set both to False and True. Note that when extend is set to True, the reduction is based on sampling, so slight changes in the number of remaining reactions may occur. It is recommended to use extend set to False for large models due to higher computational time.

Reduction_Results_Plot Reduction_Results_Plot2

Correlated Reactions

I implemented a correlated_reactions function that calculates reactions steady states using dingo’s PolytopeSampler class and creates a correlation matrix based on the pearson correlation coefficient between pairwise reactions. This function also calculates a copula indicator to filter correlations greater than the pearson cutoff.

Example of using the correlated_reactions function on the E. coli core model:

dingo_model = MetabolicNetwork.from_json("ext_data/e_coli_core.json")

sampler = PolytopeSampler(dingo_model)
steady_states = sampler.generate_steady_states()

corr_matrix, indicator_dict = correlated_reactions(
                              steady_states,
                              pearson_cutoff = 0.99,
                              indicator_cutoff = 2,
                              cells = 10,
                              cop_coeff = 0.3,
                              lower_triangle = False)

Explanation of the parameters and returned objects:

I also implemented a plot_corr_matrix function to visualize the correlation matrix as a heatmap plot. In reduced models, there is an option to plot only the remaining reactions’ names if they are provided in a list.

Example of using the plot_corr_matrix function:

plot_corr_matrix(corr_matrix,
                 reactions,
                 format = "svg")

Explanation of the parameters:

We will examine the capabilities of the correlated_reactions function using heatmap plots from the E. coli core model.

Heatmap from a symmetrical correlation matrix without pearson and indicator filtering: corr_matrix_no_cutoffs

Heatmap from a symmetrical correlation matrix with pearson_cutoff = 0.7 and without indicator filtering: corr_matrix_pearson

Heatmap from a symmetrical correlation matrix with pearson_cutoff = 0.7 and with indicator_cutoff = 100: corr_matrix_pearson_indicator

Heatmap from a triangular correlation matrix with pearson_cutoff = 0.7 and with indicator_filtering = 100: corr_matrix_pearson_indicator_triangle

Now we will examine heatmap plots from reduced E. coli core models.

Heatmap from a symmetrical correlation matrix with pearson_cutoff = 0.7 and with indicator_filtering = 100, from a reduced E. coli core model with extend set to False: corr_matrix_extend_false

Heatmap from a symmetrical correlation matrix with pearson_cutoff = 0.7 and with indicator_filtering = 100, from a reduced E. coli core model with extend set to True: corr_matrix_extend_true.png

We observe that the additional reaction removed with extend set to True is FRD7, the least correlated across the matrix. FRD7 is a fumarate reductase, appearing in a loop with SUCDi in the E. coli core model, as seen in the following figure obtained from ESCHER [4]: escher_frd7.png

Clustering

I implemented a cluster_corr_reactions function that takes a correlation matrix as input and calculates a dissimilarity matrix. The dissimilarity matrix can be calculated in two ways: by subtracting each value from 1 or by subtracting each absolute value from 1.

Example of using the cluster_corr_reactions function:

dissimilarity_matrix, labels, clusters = cluster_corr_reactions(
                                         corr_matrix,
                                         reactions = reactions,
                                         linkage = "ward",
                                         t = 10.0,
                                         correction = True)

Explanation of the parameters and returned objects:

I also implemented a plot_dendrogram function to plot a dendrogram from a dissimilarity matrix created by the cluster_corr_reactions function.

Example of calling the plot_dendrogram function:

plot_dendrogram(dissimilarity_matrix,
                reactions,
                plot_labels = True,
                t = 10.0,
                linkage = "ward")

Explanation of the parameters and returned objects:

All dendrograms and graphs in this document are created from correlation matrices of the E. coli core model.

Dendrogram created from the plot_dendrogram function from a correlation matrix without pearson filtering and with correction = True: dendrogram_no_cutoffs.png

Dendrogram from a correlation matrix with pearson_cutoff = 0.9999 and with correction = True: dendrogram_pearson.png

We observe distinct clusters. Graphs will reveal if these clusters interact with other clusters or reactions.

Graphs

I implemented a graph_corr_matrix function that takes a correlation matrix as input and creates network graphs. This function also splits the initial graph into subgraphs.

Example of using the graph_corr_matrix function:

graphs, layouts = graph_corr_matrix(corr_matrix,
                                    reactions,
                                    correction = True,
                                    clusters = clusters)

Explanation of the parameters and returned objects:

I also implemented a plot_graph function to plot graphs given a graph object and its corresponding layout.

Example of using the plot_graph function:

plot_graph(graph, layout)

Explanation of the parameters:

Example of calling the plot_graph function recursively for every subgraph returned:

for i in range(len(graphs)):
    graph = graphs[i]
    layout = layouts[i]
    plot_graph(graph, layout)

Graph created from the plot_graph function from a correlation matrix without pearson filtering and with correction = True: graph_no_cutoffs.png

Graph created from a correlation matrix with pearson_cutoff = 0.9999 and with correction = True: graph_pearson.png

A subgraph created from a correlation matrix with pearson_cutoff = 0.9999 and with correction = True: subgraph_pearson.png

This subgraph has 9 nodes that correspond to 9 reactions close to each other in the topology of the E. coli core model. These reactions are: PGI, G6PDH2R, PGL, GND, RPE, RPI, TKT1, TALA, TKT2. Their topology can be seen in the figure below from ESCHER: escher_subgraph.png

We can see that PGI seems to contribute to a different pathway. However it shares a common metabolite with G6PDH2r. If we apply a stricter pearson cutoff (e.g., 0.99999), this reaction is removed from this subgraph, leaving only the remaining reactions. This is an important observation: looser cutoffs lead to wider sets of connected reactions, forming larger metabolic pathways.

Conclusion

References