Getting started

Load and Modify models

The function load_model is used to load a metabolic model (cobra.Model object) from a sbml format file and return it to the user (1st arg.) alongside the model’s reactions (2nd arg.) and their corresponding string ids (3rd arg.).

filepath is the path to the cobra model in sbml format

ec_cobra_model, ec_cobra_reactions, ec_cobra_reaction_ids = load_model(
    filepath = "../ext_data/models/e_coli_core.xml")

The function get_objective_functions is used to get the assigned objective function(s) from the given cobra model

cobra_model is the corresponding model in cobra format

objective_functions = get_objective_functions(
    cobra_model = ec_cobra_model)

print(objective_functions)

['BIOMASS_Ecoli_core_w_GAM']

The function get_reaction_bounds is used to create a python dictionary with reaction IDs as keys and the corresponding reaction bounds (lower/upper) as values.

cobra_model is the corresponding model in cobra format

default_reaction_bounds = get_reaction_bounds(
    cobra_model = ec_cobra_model)

print(default_reaction_bounds.get("BIOMASS_Ecoli_core_w_GAM"))

(0.0, 1000.0)

The modify_model function modifies a given cobra model. Users can change objective function, optimal percentage (lower bound of the objective will be fixed based on the optimal percentage), define custom reaction bounds and change objective direction.

cobra_model is a cobra model object
objective_function is a string with the requested objective function ID to be assigned to the model
optimal_percentage is the percentage of the FBA solution to be set as lower bound to the objectie function.
reaction_bounds is a dictionary of custom reaction bounds to be assigned to the model’s reactions
objective_direction defines the direction to optimize the model: max and min are available options.

This function enables the user to create metabolic models simulating different conditions. Here, 2 new models are created based on updating the initial model:

One that maximizes for biomass production (asking at least almost 100% of the biomass maximum FBA value)

ec_cobra_model_condition_100 = modify_model(
    cobra_model         = ec_cobra_model,
    objective_function  = "BIOMASS_Ecoli_core_w_GAM",
    optimal_percentage  = 100,
    objective_direction = "max"
)

One that maximize for biomass production (asking at least 0% of the biomass maximum FBA value)

ec_cobra_model_condition_0 = modify_model(
    cobra_model         = ec_cobra_model,
    objective_function  = "BIOMASS_Ecoli_core_w_GAM",
    optimal_percentage  = 0,
    objective_direction = "max"
)

Perform Sampling

For sampling, dingo-stats has various functions that integrate known samplers and make their outtput on a desired format for further analysis. However the user can use whichever sampler he wants, as long as final samples are concerted in a numpy 2D array.

The sample_optgp function performs sampling using the OptGp sampler. Users can change the n_samples, thinning, parameters:

cobra_model is the corresponding model in cobra format
n_samples – defines the number of samples returned to the user.
thinning – defines the thinning parameter. Default to 100 means samples are returned every 100th step.
reaction_in_rows – boolean variable that if True reactions are rows in the final sampling dataframe.

samples_optgp_condition_100 = sample_optgp(
    cobra_model = ec_cobra_model_condition_100, 
    n_samples = 3000, 
    thinning=100, 
    reaction_in_rows = True)

samples_optgp_condition_0 = sample_optgp(
    cobra_model = ec_cobra_model_condition_0, 
    n_samples = 3000, 
    thinning=100, 
    reaction_in_rows = True)

The sample_dingo performs sampling with dingo. Users can define the ess and psrf parameters.

dingo_model is the corresponding model in dingo format
ess stands for the effective sample size (ESS) (default value is 1000).
psrf is a flag to request an upper bound equal to 1.1 for the value of the potential scale reduction factor of each marginal flux (default option is False).
solver – defines the linear programming solver
final_n_samples – integer that defines the final number of samples returned to the user (post sampling step)
reaction_in_rows – boolean variable that if True reactions are rows in the sampling dataframe.

First convert the modified cobra models to dingo models:

from dingo import MetabolicNetwork

ec_dingo_model_condition_100 = MetabolicNetwork.from_cobra_model(
    ec_cobra_model_condition_100)

ec_dingo_model_condition_0 = MetabolicNetwork.from_cobra_model(
    ec_cobra_model_condition_0)

And then perform sampling on the dingo models:

samples_dingo_condition_100 = sample_dingo(
    dingo_model = ec_dingo_model_condition_100, 
    reaction_in_rows = True, 
    ess=1000, 
    psrf = False, 
    solver="gurobi", 
    final_n_samples = None)

samples_dingo_condition_0 =   sample_dingo(
    dingo_model = ec_dingo_model_condition_0, 
    reaction_in_rows = True, 
    ess=1000, 
    psrf = False, 
    solver="gurobi", 
    final_n_samples = None)

Inspect Sampling distributions

The plot_grid_reactions_flux function plots a grid of selected flux distributions and enables inspections of sampling results to get early insights. Distributions here are chosen based on a list of IDs derived from KEGG pathway information. The corresponding KEGG pathways tutorial is presented in the next section. From the grid we can detect: Left/Right Shifted, Normal, Fixed (transparent based on the tolerance parameter) or other uncommon distributions.

samples is a numpy 2D array of the samples
model_reactions is a list containing strings of reaction IDs
tolerance is a tolerance level to make distribution plot transparent based on flux range
nrows is the number of rows for the plot
ncols is the number of columns for the plot
title is the title of the plot

In the following chunk the grid from Glycolysis and Pentose-Phosphate Pathway (PPP) pathways is shown:

plot_grid_reactions_flux(
    samples=samples_dingo_condition_100,
    model_reactions=ec_cobra_reaction_ids, 
    nrows=5, 
    ncols=5,
    tolerance=1e-3,
    title="Sampling Distributions")

grid_flux_distributions

The sampling_statistics function calculate statistics on flux distributions for a reaction of interest. This information can be used to identify significantly altered flux distributions between different conditions

samples is a numpy 2D array of the samples
model_reactions is a list containing strings of reaction IDs
reaction_id is a reaction ID of the selected reaction to calculate statistics on

mean, min, max, std, skewness, kurtosis = sampling_statistics(
    samples = samples_optgp_condition_100, 
    model_reactions=ec_cobra_reaction_ids,
    reaction_id="FRD7")

print(mean, min, max, std, skewness, kurtosis)

479.1369619828837 0.14995320273677437 993.9157983127856 289.3315287517938 0.09951034266820558 -1.2064534542990524

mean, min, max, std, skewness, kurtosis = sampling_statistics(
    samples = samples_optgp_condition_0,
    model_reactions=ec_cobra_reaction_ids,
    reaction_id="FRD7")

print(mean, min, max, std, skewness, kurtosis)

474.7669865820229 0.10135434309104707 997.5267773446074 278.80950212031655 0.07570051318545512 -1.1733598969449195