pathways_utils ============== .. py:module:: pathways_utils Functions --------- .. autoapisummary:: pathways_utils.read_json_file pathways_utils.map_model_to_kegg_reactions_dictionary pathways_utils.dictionary_reaction_id_to_kegg_id pathways_utils.reaction_id_to_kegg_id pathways_utils.fill_missing_kegg_ids_in_initial_dictionary pathways_utils.fetch_kegg_pathway pathways_utils.get_kegg_pathways_from_reaction_ids pathways_utils.subset_model_reactions_from_pathway_info pathways_utils.dictionary_reaction_id_to_pathway pathways_utils.reaction_in_pathway_binary_matrix pathways_utils.plot_reaction_in_pathway_heatmap pathways_utils.sort_reactions_by_model_order pathways_utils.subset_sampling_array_from_reaction_ids pathways_utils.dictionary_map_reverse_reaction_id_to_pathway Module Contents --------------- .. py:function:: read_json_file(filepath: str) -> Tuple[Dict, pandas.DataFrame] Function that loads and returns the JSON file with KEGG information in JSON and pandas dataframe format Keyword arguments: filepath (str) -- path to the JSON file Tuple[dict, pd.DataFrame] reactions_json (dict) -- The raw content of the JSON file as a Python dictionary. reactions_pandas (pd.DataFrame) -- A pandas DataFrame constructed from the JSON content. .. py:function:: map_model_to_kegg_reactions_dictionary(cobra_model: cobra.Model) -> Dict[str, str] Function that creates a dictionary that will assign KEGG terms (values) to BiGG/SEED ids (keys) only from model information (without searching in online databases or in external files) Keyword arguments: cobra_model (cobra.Model) -- cobra model object Returns: Dictionary (dict) -- maps each model's BiGG/SEED ID → KEGG ID .. py:function:: dictionary_reaction_id_to_kegg_id(reactions_pandas: pandas.DataFrame) -> Tuple[Dict[str, str], Dict[str, str]] Function that given the `reactions.json` file builds two dictionaries for fast lookup of KEGG reaction IDs from BiGG or SEED IDs. These dictionaries will be used as input in the `reaction_id_to_kegg_id` function. Keyword arguments: reactions_pandas (pd.DataFrame) -- DataFrame with columns including: 'aliases' (list of strings): may contain entries like "BiGG:SUCDi" or "KEGG:R00010" 'linked_reaction' (str): may contain SEED reaction IDs like "rxn12345;rxn67890" Returns: Tuple[dict, dict]: bigg_to_kegg (dict) -- maps each BiGG ID → KEGG ID seed_to_kegg (dict) -- maps each SEED ID → KEGG ID .. py:function:: reaction_id_to_kegg_id(reaction_id: str, modeltype: str, bigg_to_kegg: Dict[str, str], seed_to_kegg: Dict[str, str]) -> str FUnction that performs lookup to get KEGG ID for a given BiGG or SEED reaction ID. This function is used inside the `fill_missing_kegg_ids_in_initial_dictionary` function Keyword arguments: reaction_id (str) -- The BiGG or SEED reaction ID (e.g., "SUCDi" or "rxn12345") modeltype (str) -- Either "BiGG" or "SEED" (determines which dictionary to use) bigg_to_kegg (dict) -- Dictionary mapping BiGG IDs to KEGG IDs seed_to_kegg (dict) -- Dictionary mapping SEED IDs to KEGG IDs Returns: str -- The corresponding KEGG ID (e.g., "R00010"), or "NA" if not found .. py:function:: fill_missing_kegg_ids_in_initial_dictionary(initial_model_to_kegg_dictionary: dict[str, str], bigg_to_kegg: Dict[str, str], seed_to_kegg: Dict[str, str], modeltype: str = 'BiGG') -> Dict Function that fills in missing KEGG IDs (NAs) in the initial mapping dictionary using a function that maps BiGG/SEED IDs to KEGG IDs. Keyword arguments: initial_model_to_kegg_dictionary (dict) -- Dictionary with reaction IDs as keys and KEGG IDs (or None) as values. bigg_to_kegg (dict) -- Dictionary mapping BiGG IDs to KEGG IDs seed_to_kegg (dict) -- Dictionary mapping SEED IDs to KEGG IDs modeltype (str) -- Either "BiGG" or "SEED" (determines which dictionary to use) Returns: final_model_to_kegg_dictionary (dict) -- Updated dictionary with KEGG IDs filled where possible. .. py:function:: fetch_kegg_pathway(kegg_rxn: str) -> Tuple[str, List, List] Function that extracts associated KEGG pathways from a given KEGG id. Keyword arguments: kegg_rxn (str) -- KEGG reaction ID (e.g., 'R00010'). Returns: Tuple[str, List, List] kegg_rxn (str) -- The KEGG reaction ID pathway_ids (list) -- The corresponding KEGG pathway IDs pathway_names (list) -- The corresponding KEGG pathway names .. py:function:: get_kegg_pathways_from_reaction_ids(final_model_to_kegg_dictionary: Dict[str, str], max_workers: int = 8) -> pandas.DataFrame Function that fetches KEGG pathway information for a set of model reactions using parallel requests. Keyword arguments: final_model_to_kegg_dictionary (dict) -- dictionary where keys are model reaction IDs and values are KEGG reaction IDs (e.g., 'R00010'). max_workers (int) -- Number of threads for parallel downloading (default = 8). Returns: kegg_info_df (pd.DataFrame) -- A DataFrame with columns: ['model_reaction', 'kegg_reaction', 'pathway_ids', 'pathway_names'] .. py:function:: subset_model_reactions_from_pathway_info(kegg_info_df: pandas.DataFrame, pathway_query: str) -> List Function that given a DataFrame with columns ['model_reaction', 'kegg_reaction', 'pathway_ids', 'pathway_names'], created wuth the `get_kegg_pathways_from_reaction_ids` function returns all reaction IDs affiliated with a given KEGG pathway name or ID. Keyword arguments: kegg_info_df (pd.DataFrame) -- Output from `get_kegg_pathways_from_reaction_ids`, must contain 'pathway_ids' and 'pathway_names'. pathway_query (str) -- Exact KEGG pathway name or ID to match (e.g., 'Glycolysis / Gluconeogenesis' or 'rn00010'). Returns: List[str] -- List of reaction IDs affiliated with the **exact** given pathway. .. py:function:: dictionary_reaction_id_to_pathway(**named_lists: List[str]) -> Dict[str, str] Function that takes one or multiple lists containing reaction IDs (corresponding to KEGG pathways and creates a dictionary that maps the IDs to pathway names. If a reaction appears in more than 1 pathway, it is classified with the term "Multiple-Pathways" Keyword arguments: **named_lists: List[str] -- Named keyword arguments where each argument is a list of reaction IDs and the argument name represents the pathway name. Returns: reaction_id_to_pathway_dict (dict) -- dictionary mapping reaction id to pathway name .. py:function:: reaction_in_pathway_binary_matrix(reaction_id_to_pathway_dict: Dict) -> pandas.DataFrame Function that given a mapping dictionary, builds a binary matrix where rows (reactions) are keys, columns (pathways) are unique values, and the cell is 1 if the key maps to that value. Keyword arguments: mapping_dict (Dict) -- dictionary mapping reaction id to pathway name Returns: binary_df (pd.DataFrame) -- DataFrame with binary values (0 or 1) matching reactions to pathways .. py:function:: plot_reaction_in_pathway_heatmap(binary_df: pandas.DataFrame, font_size: int = 12, fig_width: int = 600, fig_height: int = 400, title: str = '') Function that plots a binary mapping matrix created from the `reaction_in_pathway_binary_matrix` function. Keyword arguments: binary_df (pd.DataFrame) -- DataFrame with binary values (0 or 1) font_size (int) -- Font size for axis labels and ticks fig_width (int) -- Width of the figure in pixels fig_height (int) -- Height of the figure in pixels title (str) -- Title of the plot .. py:function:: sort_reactions_by_model_order(full_list: List, *subsets: List) -> List Function that flattens the lists provided in the `subsets` argument (corresponding to reactions from different pathways) in a single list and then orders the element of the new list based on the order of the reaction in the initial model Keyword arguments: full_list (List) -- The reference list that defines the desired order. *subsets (List) -- One or more subset lists to be merged and ordered. Returns: sorted_merged -- a single merged list of all subsets sorted by the order in full_list. Example Usage: Glycolysis = ["PGI", "PFK", "FBA", "TPI", "GAPD", "PGK", "PGM", "ENO", "PYK"] PPP = ["G6PDH2r", "PGL", "GND", "RPE", "RPI", "TKT1", "TKT2", "TALA"] reactions_ordered = sort_reactions_in_pathways_by_reactions_in_model_order(ec_cobra_reactions_str, Glycolysis, PPP) .. py:function:: subset_sampling_array_from_reaction_ids(samples: numpy.typing.NDArray[numpy.float64], model_reactions: List, subset_reactions: List = []) -> numpy.typing.NDArray[numpy.float64] Function that takes a sampling array with reactions as rows and samples as columns and subsets it to include only reactions of interest Keyword arguments: samples (Numpy 2D array) -- A sampling 2D array with reactions as rows and samples as columns model_reactions (List) -- A list containing the model's reactions subset_reactions (List) -- A list containing reactions of interest to subset the sampling array Returns: subset_samples (NDArray[np.float64]) -- subset of the sampling dataframe containing only reactions of interest .. py:function:: dictionary_map_reverse_reaction_id_to_pathway(reaction_id_to_pathway_dict: Dict, for_rev_reactions: List) -> Dict[str, str] Function that is used when we split bidirectional reactions to separate forward and reverse reactions. It maps the reverse reaction to the corresponding pathway (the one that the forward reactions maps to) It enriches the dictionary created from the `dictionary_reaction_id_to_pathway` function Keyword arguments: reaction_id_to_pathway_dict (Dict) -- Dict mapping reaction IDs to pathway names for_rev_reactions (List) -- List of the splitted reactions Returns: reaction_id_forward_reverse_to_pathway_dict (Dict) -- Dictionary containing reaction-pathway mapping information for forward and reverse reactions separately