KEGG pathways information
Map model reactions IDs to KEGG terms
The map_model_to_kegg_reactions_dictionary function will create a dictionary that will assign KEGG terms (values) to BiGG/SEED ids (keys) only from given model’s information (without searching on online databases)
cobra_modelis a cobra model object
initial_bigg_to_kegg_dictionary = map_model_to_kegg_reactions_dictionary(
cobra_model = ec_cobra_model)
print(initial_bigg_to_kegg_dictionary.get("PFL"))
'R00212'
The read_json_file function reads the JSON file where the KEGG information is stored and saves it in a JSON format and a pandas dataframe objects. This JSON file is available here and has information that helps with mapping BiGG IDs, SEED IDs, KEGG terms and KEGG pathway names.
filepathis the path where thereactions.jsonfile is located
reactions_json, reactions_pandas = read_json_file(
filepath = "../ext_data/reactions/reactions.json")
print(reactions_pandas['aliases'][0])
print(reactions_pandas['linked_reaction'][0])
['AraCyc: INORGPYROPHOSPHAT-RXN', 'BiGG: IPP1; PPA; PPA_1; PPAm', 'BrachyCyc: INORGPYROPHOSPHAT-RXN', 'KEGG: R00004', 'MetaCyc: INORGPYROPHOSPHAT-RXN', 'Name: Diphosphate phosphohydrolase; Inorganic diphosphatase; Inorganic pyrophosphatase; Pyrophosphate phosphohydrolase; diphosphate phosphohydrolase; inorganic diphosphatase; inorganic diphosphatase (one proton translocation); inorganicdiphosphatase; pyrophosphate phosphohydrolase']
rxn27946;rxn27947;rxn27948;rxn32487;rxn38157;rxn38158
The dictionary_reaction_id_to_kegg_id function given the pandas dataframe created from the reactions.json file, builds two dictionaries for fast lookup of KEGG reaction IDs from BiGG or SEED IDs. These dictionaries will be used as input in the reaction_id_to_kegg_id function.
reactions_pandasis a pandas dataframe created form theread_json_filefunction and thereactions.jsonfile
bigg_to_kegg, seed_to_kegg = dictionary_reaction_id_to_kegg_id(
reactions_pandas = reactions_pandas)
print(bigg_to_kegg.get("IPP1"))
print(seed_to_kegg.get("rxn27946"))
R00004
R00004
The reaction_id_to_kegg_id function takes as arguments: a BiGG or a SEED id, the modeltype and the mapping dictionaries created above. It returns the corresponding KEGG id.
reaction_idis the BiGG or SEED reaction ID (e.g., “SUCDi” or “rxn12345”)modeltypeis the model type (either “BiGG” or “SEED” and determines which dictionary to use)bigg_to_keggis a dictionary mapping BiGG IDs to KEGG IDsseed_to_keggis a dictionary mapping SEED IDs to KEGG IDs
kegg_id_from_bigg = reaction_id_to_kegg_id(
reaction_id = "IPP1",
modeltype = "BiGG",
bigg_to_kegg = bigg_to_kegg,
seed_to_kegg = seed_to_kegg)
kegg_id_from_seed = reaction_id_to_kegg_id(
reaction_id = "rxn19264",
modeltype = "SEED",
bigg_to_kegg = bigg_to_kegg,
seed_to_kegg = seed_to_kegg)
print(kegg_id_from_bigg)
print(kegg_id_from_seed)
R00004
R00009
The fill_missing_kegg_ids_in_initial_dictionary function is used to further map KEGG to BiGG/SEED ids, in cases where the initial model lacks some information. It fills in missing KEGG IDs (NAs) in the initial mapping dictionary
initial_bigg_to_kegg_dictionaryis a dictionary with reaction IDs as keys and KEGG IDs (or None) as values created from themap_model_to_kegg_reactions_dictionaryfunction and includes only the default mapping information from the modelmodeltypeis the model type (either “BiGG” or “SEED”)bigg_to_keggis a dictionary mapping BiGG IDs to KEGG IDsseed_to_keggis a dictionary mapping SEED IDs to KEGG IDs
final_bigg_to_kegg_dictionary = fill_missing_kegg_ids_in_initial_dictionary(
initial_model_to_kegg_dictionary = initial_bigg_to_kegg_dictionary,
modeltype="BiGG",
bigg_to_kegg = bigg_to_kegg,
seed_to_kegg = seed_to_kegg)
print(initial_bigg_to_kegg_dictionary.get('PFK'))
print(final_bigg_to_kegg_dictionary.get('PFK'))
None
R00756
The get_kegg_pathways_from_reaction_ids function fetches KEGG pathway information for a set of model reactions
and creates a pandas dataframe with the following columns: with columns: ‘model_reaction’, ‘kegg_reaction’, ‘pathway_ids’, ‘pathway_names’.
final_bigg_to_kegg_dictionaryis a dictionary with reaction IDs as keys and KEGG IDs (or None) as values created from thefill_missing_kegg_ids_in_initial_dictionaryfunction and includes updated mapping information from the KEGG database.max_workerscorresponds to the number of threads for parallel downloading
df_kegg_pathways = get_kegg_pathways_from_reaction_ids(
final_model_to_kegg_dictionary = final_bigg_to_kegg_dictionary,
max_workers = 8)
print(df_kegg_pathways["model_reaction"].iloc[0])
print(df_kegg_pathways["kegg_reaction"].iloc[0])
print(df_kegg_pathways["pathway_ids"].iloc[0])
print(df_kegg_pathways["pathway_names"].iloc[0])
PFL
R00212
[rn00620, rn00650, rn01100, rn01120]
[Pyruvate metabolism, Butanoate metabolism, ...]
Subset reactions from pathways
The subset_model_reactions_from_pathway_info function given a dataFrame created wuth the get_kegg_pathways_from_reaction_ids function, returns all reaction IDs affiliated with a given KEGG pathway name or ID.
PPP_from_name = subset_model_reactions_from_pathway_info(
kegg_info_df = df_kegg_pathways,
pathway_query = "Pentose phosphate pathway")
Glycolysis_from_name = subset_model_reactions_from_pathway_info(
kegg_info_df = df_kegg_pathways,
pathway_query = "Glycolysis / Gluconeogenesis")
Glycolysis_from_id = subset_model_reactions_from_pathway_info(
kegg_info_df = df_kegg_pathways,
pathway_query = "rn00010")
print(PPP_from_name)
print(Glycolysis_from_name)
print(Glycolysis_from_id)
['FBA', 'FBP', 'GND', 'PFK', 'PGL', 'RPE', 'RPI', 'TKT1']
['ALCD2x', 'ENO', 'FBA', 'FBP', 'GAPD', 'PFK', 'PGK', 'PGM', 'PPCK', 'PPS', 'PYK', 'TPI']
['ALCD2x', 'ENO', 'FBA', 'FBP', 'GAPD', 'PFK', 'PGK', 'PGM', 'PPCK', 'PPS', 'PYK', 'TPI']
The sort_reactions_by_model_order function flattens the lists provided in the subsets argument (corresponding to reactions from different pathways) in a single list and then orders the element of the new list based on the order of the reaction in the initial model. If any duplicates exist they are not removed by this function, so an additional step is required if user wants to exclude duplicate reaction IDs.
full_listis the reference list that defines the desired order. Usually corresponds to the model reactions*subsetsis/are one or more subset lists to be merged and ordered. Usually corresponds to reactions from pathways of interest.
reactions_in_pathways_ordered_duplicates = sort_reactions_by_model_order(
full_list = ec_cobra_reaction_ids,
Glycolysis,
PPP)
# Additional step to remove duplicates
reactions_in_pathways_ordered = []
[reactions_in_pathways_ordered.append(val) for val in reactions_in_pathways_ordered_duplicates if val not in reactions_in_pathways_ordered]
The dictionary_reaction_id_to_pathway function takes one or multiple lists containing reaction IDs (corresponding to KEGG pathways and creates a dictionary that maps the IDs to pathway names. If a reaction appears in more than 1 pathway, it is classified with the term Multiple-Pathways. This is useful for plotting to work with subsets of reactions and to replace names from the df_kegg_pathways dataframe like Glycolysis / Gluconeogenesis to Glycolysis and Pentose phosphate pathway to PPP.
**named_listsare named lists where each argument is a list of reaction IDs and the argument name represents the pathway name.
bigg_to_pathway_dict = dictionary_reaction_id_to_pathway(
Glycolysis = Glycolysis,
PPP = PPP)
print(bigg_to_pathway_dict.get("GND"))
print(bigg_to_pathway_dict.get("ENO"))
print(bigg_to_pathway_dict.get("FBA"))
"Pentose phosphate pathway"
"Glycolysis / Gluconeogenesis"
"Multiple-Pathways"
The reaction_in_pathway_binary_matrix function is used to create a new pandas dataframe with reactions as rows and different pathways as columns. The corresponding cell of the dataframe will show if a reaction belongs to a certain pathway (1) or not (0). If a reaction belongs to more than one pathways, then the column: Multiple-Pathways is created and the reaction matching this will only get True (1) there and not in the individual pathway columns (e.g. 1 in Multiple-Pathways, 0 in Glycolysis and 0 in PPP).
reaction_id_to_pathway_dictis dictionary mapping reaction IDs to pathway names created with thedictionary_reaction_id_to_pathwayfunction
binary_df = reaction_in_pathway_binary_matrix(
reaction_id_to_pathway_dict = bigg_to_pathway_dict)
The plot_reaction_in_pathway_heatmap function is used to plot a heatmap of the binary_df created from the reaction_in_pathway_binary_matrix function to better illustrate the connection between reactions and pathways.
binary_dfis a pandas dataFrame with binary values (0 or 1)font_sizeis the font size for axis labels and ticksfig_widthis the width of the figure in pixelsfig_heightis the height of the figure in pixelstitleis the title of the plot
plot_reaction_in_pathway_heatmap(
binary_df = binary_df,
font_size = 8,
fig_width = 600,
fig_height = 600,
title = "" )

The subset_sampling_array_from_reaction_ids function subsets a sampling 2D array (with reactions as rows and samples as columns) to include only reactions of interest.
samplesis a sampling 2D array with reactions as rows and samples as columnsmodel_reactionsis a list containing the model’s reactionssubset_reactionsis a list containing reactions of interest to subset the sampling array
subset_pathways_optgp_condition_100 = subset_sampling_array_from_reaction_ids(
samples = samples_optgp_condition_100,
model_reactions = ec_cobra_reaction_ids,
subset_reactions = reactions_in_pathways_ordered)
subset_pathways_optgp_condition_0 = subset_sampling_array_from_reaction_ids(
samples = samples_optgp_condition_0,
model_reactions = ec_cobra_reaction_ids,
subset_reactions = reactions_in_pathways_ordered)