pathways_utils

Functions

read_json_file(→ Tuple[Dict, pandas.DataFrame])

Function that loads and returns the JSON file with KEGG information in JSON and pandas dataframe format

map_model_to_kegg_reactions_dictionary(→ Dict[str, str])

Function that creates a dictionary that will assign KEGG terms (values) to BiGG/SEED ids (keys)

dictionary_reaction_id_to_kegg_id(→ Tuple[Dict[str, ...)

Function that given the reactions.json file builds two dictionaries for fast lookup of KEGG reaction IDs from BiGG or SEED IDs.

reaction_id_to_kegg_id(→ str)

FUnction that performs lookup to get KEGG ID for a given BiGG or SEED reaction ID.

fill_missing_kegg_ids_in_initial_dictionary(→ Dict)

Function that fills in missing KEGG IDs (NAs) in the initial mapping dictionary using a function that maps

fetch_kegg_pathway(→ Tuple[str, List, List])

Function that extracts associated KEGG pathways from a given KEGG id.

get_kegg_pathways_from_reaction_ids(→ pandas.DataFrame)

Function that fetches KEGG pathway information for a set of model reactions using parallel requests.

subset_model_reactions_from_pathway_info(→ List)

Function that given a DataFrame with columns ['model_reaction', 'kegg_reaction', 'pathway_ids', 'pathway_names'],

dictionary_reaction_id_to_pathway(→ Dict[str, str])

Function that takes one or multiple lists containing reaction IDs (corresponding to KEGG pathways

reaction_in_pathway_binary_matrix(→ pandas.DataFrame)

Function that given a mapping dictionary, builds a binary matrix where rows (reactions) are keys,

plot_reaction_in_pathway_heatmap(binary_df[, ...])

Function that plots a binary mapping matrix created from the reaction_in_pathway_binary_matrix function.

sort_reactions_by_model_order(→ List)

Function that flattens the lists provided in the subsets argument (corresponding to reactions from different pathways)

subset_sampling_array_from_reaction_ids(...)

Function that takes a sampling array with reactions as rows and samples as columns and subsets it

dictionary_map_reverse_reaction_id_to_pathway(...)

Function that is used when we split bidirectional reactions to separate forward and reverse reactions.

Module Contents

pathways_utils.read_json_file(filepath: str) Tuple[Dict, pandas.DataFrame]

Function that loads and returns the JSON file with KEGG information in JSON and pandas dataframe format

Keyword arguments: filepath (str) – path to the JSON file

Tuple[dict, pd.DataFrame]

reactions_json (dict) – The raw content of the JSON file as a Python dictionary. reactions_pandas (pd.DataFrame) – A pandas DataFrame constructed from the JSON content.

pathways_utils.map_model_to_kegg_reactions_dictionary(cobra_model: cobra.Model) Dict[str, str]

Function that creates a dictionary that will assign KEGG terms (values) to BiGG/SEED ids (keys) only from model information (without searching in online databases or in external files)

Keyword arguments: cobra_model (cobra.Model) – cobra model object

Returns: Dictionary (dict) – maps each model’s BiGG/SEED ID → KEGG ID

pathways_utils.dictionary_reaction_id_to_kegg_id(reactions_pandas: pandas.DataFrame) Tuple[Dict[str, str], Dict[str, str]]

Function that given the reactions.json file builds two dictionaries for fast lookup of KEGG reaction IDs from BiGG or SEED IDs. These dictionaries will be used as input in the reaction_id_to_kegg_id function.

Keyword arguments: reactions_pandas (pd.DataFrame) – DataFrame with columns including:

‘aliases’ (list of strings): may contain entries like “BiGG:SUCDi” or “KEGG:R00010” ‘linked_reaction’ (str): may contain SEED reaction IDs like “rxn12345;rxn67890”

Returns: Tuple[dict, dict]:

bigg_to_kegg (dict) – maps each BiGG ID → KEGG ID seed_to_kegg (dict) – maps each SEED ID → KEGG ID

pathways_utils.reaction_id_to_kegg_id(reaction_id: str, modeltype: str, bigg_to_kegg: Dict[str, str], seed_to_kegg: Dict[str, str]) str

FUnction that performs lookup to get KEGG ID for a given BiGG or SEED reaction ID. This function is used inside the fill_missing_kegg_ids_in_initial_dictionary function

Keyword arguments: reaction_id (str) – The BiGG or SEED reaction ID (e.g., “SUCDi” or “rxn12345”) modeltype (str) – Either “BiGG” or “SEED” (determines which dictionary to use) bigg_to_kegg (dict) – Dictionary mapping BiGG IDs to KEGG IDs seed_to_kegg (dict) – Dictionary mapping SEED IDs to KEGG IDs

Returns: str – The corresponding KEGG ID (e.g., “R00010”), or “NA” if not found

pathways_utils.fill_missing_kegg_ids_in_initial_dictionary(initial_model_to_kegg_dictionary: dict[str, str], bigg_to_kegg: Dict[str, str], seed_to_kegg: Dict[str, str], modeltype: str = 'BiGG') Dict

Function that fills in missing KEGG IDs (NAs) in the initial mapping dictionary using a function that maps BiGG/SEED IDs to KEGG IDs.

Keyword arguments: initial_model_to_kegg_dictionary (dict) – Dictionary with reaction IDs as keys and KEGG IDs (or None) as values. bigg_to_kegg (dict) – Dictionary mapping BiGG IDs to KEGG IDs seed_to_kegg (dict) – Dictionary mapping SEED IDs to KEGG IDs modeltype (str) – Either “BiGG” or “SEED” (determines which dictionary to use)

Returns: final_model_to_kegg_dictionary (dict) – Updated dictionary with KEGG IDs filled where possible.

pathways_utils.fetch_kegg_pathway(kegg_rxn: str) Tuple[str, List, List]

Function that extracts associated KEGG pathways from a given KEGG id.

Keyword arguments:

kegg_rxn (str) – KEGG reaction ID (e.g., ‘R00010’).

Returns: Tuple[str, List, List]

kegg_rxn (str) – The KEGG reaction ID pathway_ids (list) – The corresponding KEGG pathway IDs pathway_names (list) – The corresponding KEGG pathway names

pathways_utils.get_kegg_pathways_from_reaction_ids(final_model_to_kegg_dictionary: Dict[str, str], max_workers: int = 8) pandas.DataFrame

Function that fetches KEGG pathway information for a set of model reactions using parallel requests.

Keyword arguments: final_model_to_kegg_dictionary (dict) – dictionary where keys are model reaction IDs and values are KEGG reaction IDs (e.g., ‘R00010’). max_workers (int) – Number of threads for parallel downloading (default = 8).

Returns: kegg_info_df (pd.DataFrame) – A DataFrame with columns: [‘model_reaction’, ‘kegg_reaction’, ‘pathway_ids’, ‘pathway_names’]

pathways_utils.subset_model_reactions_from_pathway_info(kegg_info_df: pandas.DataFrame, pathway_query: str) List

Function that given a DataFrame with columns [‘model_reaction’, ‘kegg_reaction’, ‘pathway_ids’, ‘pathway_names’], created wuth the get_kegg_pathways_from_reaction_ids function returns all reaction IDs affiliated with a given KEGG pathway name or ID.

Keyword arguments: kegg_info_df (pd.DataFrame) – Output from get_kegg_pathways_from_reaction_ids, must contain ‘pathway_ids’ and ‘pathway_names’. pathway_query (str) – Exact KEGG pathway name or ID to match (e.g., ‘Glycolysis / Gluconeogenesis’ or ‘rn00010’).

Returns: List[str] – List of reaction IDs affiliated with the exact given pathway.

pathways_utils.dictionary_reaction_id_to_pathway(**named_lists: List[str]) Dict[str, str]

Function that takes one or multiple lists containing reaction IDs (corresponding to KEGG pathways and creates a dictionary that maps the IDs to pathway names. If a reaction appears in more than 1 pathway, it is classified with the term “Multiple-Pathways”

Keyword arguments: **named_lists: List[str] – Named keyword arguments where each argument is a list of reaction IDs

and the argument name represents the pathway name.

Returns: reaction_id_to_pathway_dict (dict) – dictionary mapping reaction id to pathway name

pathways_utils.reaction_in_pathway_binary_matrix(reaction_id_to_pathway_dict: Dict) pandas.DataFrame

Function that given a mapping dictionary, builds a binary matrix where rows (reactions) are keys, columns (pathways) are unique values, and the cell is 1 if the key maps to that value.

Keyword arguments: mapping_dict (Dict) – dictionary mapping reaction id to pathway name

Returns: binary_df (pd.DataFrame) – DataFrame with binary values (0 or 1) matching reactions to pathways

pathways_utils.plot_reaction_in_pathway_heatmap(binary_df: pandas.DataFrame, font_size: int = 12, fig_width: int = 600, fig_height: int = 400, title: str = '')

Function that plots a binary mapping matrix created from the reaction_in_pathway_binary_matrix function.

Keyword arguments: binary_df (pd.DataFrame) – DataFrame with binary values (0 or 1) font_size (int) – Font size for axis labels and ticks fig_width (int) – Width of the figure in pixels fig_height (int) – Height of the figure in pixels title (str) – Title of the plot

pathways_utils.sort_reactions_by_model_order(full_list: List, *subsets: List) List

Function that flattens the lists provided in the subsets argument (corresponding to reactions from different pathways) in a single list and then orders the element of the new list based on the order of the reaction in the initial model

Keyword arguments: full_list (List) – The reference list that defines the desired order. *subsets (List) – One or more subset lists to be merged and ordered.

Returns: sorted_merged – a single merged list of all subsets sorted by the order in full_list.

Example Usage: Glycolysis = [“PGI”, “PFK”, “FBA”, “TPI”, “GAPD”, “PGK”, “PGM”, “ENO”, “PYK”] PPP = [“G6PDH2r”, “PGL”, “GND”, “RPE”, “RPI”, “TKT1”, “TKT2”, “TALA”] reactions_ordered = sort_reactions_in_pathways_by_reactions_in_model_order(ec_cobra_reactions_str, Glycolysis, PPP)

pathways_utils.subset_sampling_array_from_reaction_ids(samples: numpy.typing.NDArray[numpy.float64], model_reactions: List, subset_reactions: List = []) numpy.typing.NDArray[numpy.float64]

Function that takes a sampling array with reactions as rows and samples as columns and subsets it to include only reactions of interest

Keyword arguments: samples (Numpy 2D array) – A sampling 2D array with reactions as rows and samples as columns model_reactions (List) – A list containing the model’s reactions subset_reactions (List) – A list containing reactions of interest to subset the sampling array

Returns: subset_samples (NDArray[np.float64]) – subset of the sampling dataframe containing only reactions of interest

pathways_utils.dictionary_map_reverse_reaction_id_to_pathway(reaction_id_to_pathway_dict: Dict, for_rev_reactions: List) Dict[str, str]

Function that is used when we split bidirectional reactions to separate forward and reverse reactions. It maps the reverse reaction to the corresponding pathway (the one that the forward reactions maps to) It enriches the dictionary created from the dictionary_reaction_id_to_pathway function

Keyword arguments: reaction_id_to_pathway_dict (Dict) – Dict mapping reaction IDs to pathway names for_rev_reactions (List) – List of the splitted reactions

Returns: reaction_id_forward_reverse_to_pathway_dict (Dict) – Dictionary containing reaction-pathway mapping information

for forward and reverse reactions separately