Reduce thermodynamic infeasible samples

Identify and remove loopy reactions

The loops_enumeration_from_fva function computes 2 times a Flux Variability Analysis (with loopless parameter to False and True) and identifies possible loopy reaction in given model and returns them in a list. Each item in the list is a tuple of a string (reaction ID) and a float (loopy score returned from the function)

ec_cobra_model is a cobra model object
fraction_of_optimum is a float variable that defines the fraction_of_optimum parameter used in flux_variability_analysis

loopy_reactions_fva = loops_enumeration_from_fva(
    ec_cobra_model = ec_cobra_model, 
    fraction_of_optimum = 0.999)

print(loopy_reactions_fva)

[('SUCDi', 994.7794007141794), ('FRD7', 995.0539767141795)]

The loopy_reactions_turned_off_in_pfba function identifies which reactions from the loopy reactions set calculated with the loops_enumeration_from_fva function can be turned off when performing a loopless_solution applied to pFBA results

loopy_cobra_model is a cobra model object possibly containing loops (default model without any preproccess)
loopy_reactions is a list with loopy_reactions with reactions classified as loopy
tol is a tolerance value used to classify a numeric change as important or not

turned_off_reactions = loopy_reactions_turned_off_in_pfba(
    loopy_cobra_model = ec_cobra_model, 
    loopy_reactions = loopy_reactions, 
    tol = 1e-6)

print(turned_off_reactions)

['FRD7']

Now, we know which reaction to turn off in the given model to reduce the thermodynamic infeasible solutions from sampling

ec_cobra_model_frd7_removed = ec_cobra_model.copy()
ec_cobra_model_frd7_removed.reactions.get_by_id("FRD7").bounds = (0, 0)

Quantify the effect of loopy reactions in sampling

First we need to perform sampling to produce 2 sampling arrays: one from the default model and one from the default model but with the FRD7 reaction removed (model with no active loopy reactions).

samples_optgp_default = sample_optgp(
    cobra_model = ec_cobra_model, 
    n_samples = 3000,
    thinning = 100, 
    reaction_in_rows = True)

samples_optgp_frd7_removed = sample_optgp(
    cobra_model = ec_cobra_model_frd7_removed, 
    n_samples = 3000, 
    thinning = 100, 
    reaction_in_rows = True)

The get_loopless_solutions_from_samples function calculates the loopless_solution from each sample and saves results into a new numpy 2D array. Note that the new loopless sampling dataframe does not contain uniform samples

samples is a numpy 2D array of the samples
cobra_model is a cobra model object

samples_default_loopless_solutions = get_loopless_solutions_from_samples(
    samples = samples_default, 
    cobra_model = ec_cobra_model)

samples_frd7_removed_loopless_solutions = get_loopless_solutions_from_samples(
    samples = samples_frd7_removed,
    cobra_model = ec_cobra_model_frd7_removed)

We can also check if the FRD7 loopy reaction affected few or most of the samples. For this, we can calculate the difference across its samples before and after applying the get_loopless_solutions_from_samples function. If FRD7 affected few samples the histogram from these differences would have a peak near 0. If it affected most samples the peak would be far from 0 or the differences distribution would be uniform with a wide range.

change_per_reaction = abs(samples_default) - abs(samples_default_loopless_solutions)

reaction_index = ec_cobra_reaction_ids.index("FRD7")
change_FRD7 = change_per_reaction[reaction_index]

plt.hist(change_FRD7)
plt.show()

In the plot below, we can see a uniform distribution (ranging from 0 to 1000) that validates the effect of FRD7 in sampling results

frd7_sampling_differences_histogram

The calculate_affected_samples function works given the default and the samples from the get_loopless_solutions_from_samples function and identifies how many samples have significant differences. This is done by calculating (for each sample) the distance (change) of the reactions from the given arrays and normalizing by its range. Reactions that have a difference higher than a given threshold are counted and the corresponding sample can potentially be classified as a loopy sample. However, if user does not consider importance when only 1 reaction overcomes this tolerance, he can specify the least number of reactions needed for this classification (e.g 2, 5, 10 …).

initial_samples is a numpy 2D array of the default samples (before applying the get_loopless_solutions_from_samples)
loopless_samples is a numpy 2D array of the default samples after application of the get_loopless_solutions_from_samples
cobra_model is a cobra model object
tol_reaction_difference is a tolerance value to classify the difference in reactions before and after the get_loopless_solutions_from_samples function as important or not
tol_reactions_count is a tolerance value to classify a sample as loopy based on the least amount of significantly altered reactions

(affected_reactions_count_default,
total_affected_samples_default) = calculate_affected_samples(
    initial_samples = samples_default, 
    loopless_samples = samples_default_loopless_solutions, 
    cobra_model = ec_cobra_model, 
    tol_reaction_difference = 0.5, 
    tol_reactions_count = 1)

(affected_reactions_count_frd7_removed,
total_affected_samples_frd7_removed) = calculate_affected_samples(
    initial_samples = samples_frd7_removed, 
    loopless_samples = samples_frd7_removed_loopless_solutions, 
    cobra_model = ec_cobra_model, 
    tol_reaction_difference = 0.5, 
    tol_reactions_count = 1)

plt.hist(affected_reactions_count_default)
plt.show()

plt.hist(affected_reactions_count_frd7_removed)
plt.show()

affected_reactions_count_default

affected_reactions_count_frd7_removed

From the above plots, we can see that all samples from the modified model with FRD7 removed are loopless samples (all values at 0). However, samples from the non-modified model have a second peak outside 0, so loops affected them.

The calculate_distances_from_samples function is an alternative approach to quantify the thermodynamic infeasible samples. This function calculates the euclidean distance between a sampling dataset before and after applying the get_loopless_solutions_from_samples. Distances are calculated between different samples, so if user has 3000 samples, he will end up with 3000 distances.

initial_samples is a numpy 2D array of the default samples (before applying the get_loopless_solutions_from_samples)
loopless_samples is a numpy 2D array of the default samples after application of the get_loopless_solutions_from_samples
cobra_model is a cobra model object

distances_array_samples_default = calculate_distances_from_samples(
    initial_samples = samples_default, 
    loopless_samples = samples_default_loopless_solutions, 
    cobra_model = ec_cobra_model)

distances_array_samples_frd7_removed = calculate_distances_from_samples(
    initial_samples = samples_frd7_removed, 
    loopless_samples = samples_frd7_removed_loopless_solutions, 
    cobra_model = ec_cobra_model)

The violin_plot_samples_distances function creates a violin plot from a given distances array

distances_array is a numpy 1D array of either the reactions’ or samples’ euclidean distances before and after the get_loopless_solutions_from_samples
title is the title for the plot
exponentformat is a parameter used to determine the numeric scientific notation in the plot (determines a formatting rule for the tick exponents). Available options: “e”, “E”, “power”, “SI”, “B”

violin_plot_samples_distances(
    distances_array = distances_array_samples_default,
    title = "",
    exponentformat = "none")

violin_plot_samples_distances(
    distances_array = distances_array_samples_frd7_removed,
    title = "",
    exponentformat = "none")

violin_plot_distances_samples_default

violin_plot_distances_samples_frd7_removed

From the above plots, we observe that most samples from the modified model with FRD7 removed, have a 0 distance with themselves after applying the get_loopless_solutions_from_samples function. This is not the case for samples of the default (non-modified) model, which have a distances distribution shifted to significantly greater values.

The calculate_distances_from_reactions function calculates the euclidean distance between a sampling dataset before and after applying the get_loopless_solutions_from_samples. Distances are calculated between different reactions, so if user provides samples of 100 reactions, he will end up with 100 distances.

initial_samples is a numpy 2D array of the default samples (before applying the get_loopless_solutions_from_samples)
loopless_samples is a numpy 2D array of the default samples after application of the get_loopless_solutions_from_samples
cobra_model is a cobra model object

distances_array_reactions_default = calculate_distances_from_reactions(
    initial_samples = samples_default, 
    loopless_samples = samples_default_loopless_solutions, 
    cobra_model = ec_cobra_model)

distances_array_reactions_frd7_removed = calculate_distances_from_reactions(
    initial_samples = samples_frd7_removed,
    loopless_samples = samples_frd7_removed_loopless_solutions, 
    cobra_model = ec_cobra_model)

violin_plot_samples_distances(
    distances_array = distances_array_reactions_default,
    title = "",
    exponentformat = "none")

violin_plot_samples_distances(
    distances_array = distances_array_reactions_frd7_removed,
    title = "",
    exponentformat = "none")

violin_plot_distances_reactions_default

violin_plot_distances_reactions_frd7_removed

From the above plots, we observe that the samples of the default (non-modified) model, result in 2 reactions with a significantly higher distance than the rest. These are the FRD7 with SUCDi reactions pair. As expected, turning off FRD7 also reduces the distance of SUCDi.