# Graphs from correlation matrix

## Prerequisites for annotating the produced graph

The prerequisites for annotating the produced graph are the same as the ones described at the `## Prerequisites for annotating the produced dendrogram` section of the `Clustering from correlation matrix` unit:

- Make sure you have used the `correlated_reactions` function after split of the reversible reactions to separate forward and reverse reactions
- Create a dictionary from the `dictionary_map_reverse_reaction_id_to_pathway` function

## Construct a graph

The `construct_graph` function creates a networkx graph from a linear correlation matrix or from both a linear correlation and a non-linear copula dependencies matrix. In this graph reactions are nodes and correlation values are edges. Users can also provide reaction-pathway mapping information in the `group_map` dictionary parameter and this will be saved in the graph for potential vizualization.

- `linear_correlation_matrix` is a numpy 2D array corresponding to the linear correlation matrix
- `non_linear_correlation_matrix` is a numpy 2D array (optional) corresonding to the non-linear copula dependencies matrix
- `reactions` is a list of reaction names (ordered like matrix indices)
- `remove_unconnected_nodes` is a boolean variable that if `True`, removes isolated nodes
- `correction` is a boolean variable if `True`, absolute values of correlations (edges) are used
- `group_map` is a dictionary mapping reaction names to group names (pathways)

```python
G100_full, pos100_full = construct_graph(
    linear_correlation_matrix = linear_correlation_matrix_100_full,
    non_linear_correlation_matrix = non_linear_correlation_matrix_100_full,
    reactions = extended_reactions_100,
    remove_unconnected_nodes = False,
    correction = False,
    group_map = group_map_100_full)
```


The `compute_nodes_centrality_metrics` function computes centrality measures for nodes (reactions) in the graph network: (A) Weighted degree centrality (normalized by number of nodes), (B) Betweenness centrality, (C) Clustering coefficient. Users should provide the full correlation matrices as input and not a subset based on certain reactions/pathways

- `G` is a `NetworkX` graph with nodes and weighted edges; edges should have 'weight' and 'source' attributes

```python
centrality_dict_100 = compute_nodes_centrality_metrics(
    G = G100_full)

print(centrality_dict_100.get("betweenness").get("PFK"))
print(centrality_dict_100.get("degree").get("PFK"))
print(centrality_dict_100.get("clustering").get("PFK"))
```

```python
0.00809659410427305
0.22683071469001767
0.6179079858733398
```


Now, suppose we had created a second graph called `G0_full` from a different sampling dataset (corresponding to asking at least 0% of the biomass maximum value as an objective). The `compare_node_centralities` function compares node centralities between two centrality dictionaries for shared nodes and returns the difference by substracting the corresponding metric score of graph 2 (second argument) from graph 1 (first argument).

- `centrality_dict_1` is a dictionary mapping node IDs to centrality values (first graph)
- `centrality_dict_2` is a dictionary mapping node IDs to centrality values (second graph)

```python
sorted_betwenness_nodes = compare_node_centralities(
    centrality_dict_1 = centrality_dict_100.get("betweenness"), 
    centrality_dict_2 = centrality_dict_0.get("betweenness"))

sorted_weighted_degree_nodes = compare_node_centralities(
    centrality_dict_1 = centrality_dict_100.get("degree"), 
    centrality_dict_2 = centrality_dict_0.get("degree"))

print(dict(sorted_betwenness_nodes).get("CYTBD"))
print(dict(sorted_weighted_degree_nodes).get("CYTBD"))
```

```python
0.1663975459581592
0.09272427760787
```

Both centrality metrics agree that the `CYTBD` node, has a higher centrality in the `G100_full` graph. Users are encouraged to examine the nodes with most extreme differences between the given graphs to get insights on the model's behavior.


## Visualize the constructed graph

The `plot_graph` function a correlation-based networkx graph with multiple visual features including node annotations, edge styles, and clique-based shadowing. Here we are gonna plot a subset based on the 2 common pathays, we used in the previous sections: Glycolysis and Pentose Phosphate Pathway. The centrality metrics, provided in the `centralities` parameter should have occured from a full network and thus not be restricted to calculations from a sub-graph

- `G` is a NetworkX graph with nodes and weighted edges; edges should have 'weight' and 'source' attributes
- `pos` is a dictionary of node positions
- `remove_clique_edges` is a boolean variable that if `True`, edges within positive cliques and between negative cliques are removed and replaced with shadow areas (green for positive and red for negative)
- `include_matrix2_in_cliques` is a boolean variable that if `False`, only matrix1 (linear edges) is used for positive clique detection. If `True` cliques may be computed from a combination of linear corelations (matrix1) and copula dependencies (matrix2)
- `min_clique_size` is a minimum size for cliques to be considered (default = 5)
- `shadow_edges` visualization mode for clique shadows:
    - `None` : no shadows
    - `positive`: show shadows around positive cliques
    - `negative`: show shadows between cliques with negative edges
    - `mixed`: show both types of shadows
- `centralities` is a dictionary containing precomputed centrality metrics

```python
plot_graph(
    G = G100_glycolysis_ppp, 
    pos = pos100_glycolysis_ppp, 
    remove_clique_edges = True, 
    include_matrix2_in_cliques = False, 
    min_clique_size = 5, 
    shadow_edges = "mixed", 
    centralities = centrality_dict_100)
```

![graph_plot](/img/graph_plot.png)