Custom Function Definitions#
- class functions.GCN(input_dim, hidden_feats, num_classes)[source]#
Graph Convolutional Network (GCN) class.
This class represents a Graph Convolutional Network (GCN) model. It inherits from the nn.Module class of PyTorch.
- Parameters:
input_dim (int) – The dimensionality of the input features.
hidden_feats (list) – A list of integers representing the number of hidden units in each layer.
num_classes (int) – The number of output classes.
- functions.abs_bicorr(data)[source]#
Calculate the absolute biweight midcorrelation matrix for the given data.
- Parameters:
data (pandas.DataFrame) – The input DataFrame containing numeric data.
- Returns:
The absolute biweight midcorrelation matrix.
- Return type:
pandas.DataFrame
- functions.analyse_and_plot_density(graph)[source]#
Calculates and plots the density of the graph for a predefined series of thresholds.
Parameters: graph (nx.Graph): The original NetworkX graph.
Returns: densities (list of float): Densities of the graph at each threshold.
- functions.calc_abs_bicorr(data)[source]#
Calculate the absolute biweight midcorrelation matrix for numeric data.
Parameters: data (pd.DataFrame): Input DataFrame with numeric data.
Returns: pd.DataFrame: DataFrame containing the absolute biweight midcorrelation matrix.
- functions.clean_graph(G, degree_threshold=1, keep_largest_component=True)[source]#
Cleans the graph by performing several cleaning steps: - Removes unconnected nodes (isolates) - Removes self-loops - Removes nodes with a degree below a specified threshold - Keeps only the largest connected component (optional)
Parameters: G (nx.Graph): The NetworkX graph to clean. degree_threshold (int): Minimum degree for nodes to keep. keep_largest_component (bool): Whether to keep only the largest connected component.
Returns: G (nx.Graph): Cleaned graph.
- functions.create_graph_from_correlation(correlation_matrix, threshold=0.8)[source]#
Creates a graph from a correlation matrix using a specified threshold.
Parameters: correlation_matrix (pd.DataFrame): DataFrame containing the correlation matrix. threshold (float): Threshold for including edges based on correlation value.
Returns: G (nx.Graph): Graph created from the correlation matrix.
- functions.draw_network_with_node_attrs(G, node_attributes, communities=None, title='Network Visualization', color_attr=None, shape_attr=None, figsize=(20, 10), layout='spring', cmap_name='tab20', with_labels=False, savefig=False, save_path_prefix='')[source]#
Draws a network graph with nodes colored and shaped based on their attributes, and optionally colored by community membership.
- Parameters:
G (networkx.Graph) – The graph to be drawn.
node_attributes (dict) – A dictionary where keys are node names and values are dictionaries of attributes.
communities (List[List[Any]], optional) – A list where each sublist contains the nodes belonging to a community. Default is None.
title (str, optional) – The title of the plot. Default is ‘Network Visualization’.
color_attr (str, optional) – Node attribute to color nodes by. Default is None.
shape_attr (str, optional) – Node attribute to shape nodes by. Default is None.
figsize (tuple, optional) – The size of the figure. Default is (20, 10).
layout (str, optional) – The layout algorithm for positioning nodes (‘spring’, ‘circular’, etc.). Default is ‘spring’.
cmap_name (str, optional) – The name of the colormap to use for coloring. Default is ‘tab20’.
with_labels (bool, optional) – Whether to draw labels for the nodes. Default is False.
- Raises:
ValueError – If the graph G is empty or not defined.
ValueError – If node_attributes is empty or not defined.
The function draws the graph with nodes colored and/or shaped based on their attributes. If communities are provided, nodes are colored by their community memberships. A legend is added to indicate the mapping of attributes to colors and shapes.
- functions.evaluate(split, device, g, h, model, labels)[source]#
Evaluate the performance of a model on a given dataset split.
- Parameters:
split (Tensor) – The index of the dataset split to evaluate.
device – The device to perform the evaluation on.
g – The graph input to the model.
h – The node features input to the model.
model – The model to evaluate.
labels – The ground truth labels for the dataset.
- Returns:
- A tuple containing the evaluation metrics:
loss (float): The loss value.
acc (float): The accuracy value.
F1 (float): The F1 score.
PRC (float): The precision-recall curve value.
SNS (float): The sensitivity value.
- Return type:
tuple
- functions.fetch_gene_info(gene_list)[source]#
Fetches gene information from MyGene.info.
Parameters: gene_list (list): List of gene symbols or Ensembl IDs.
Returns: list: List of dictionaries containing gene information.
- functions.filter_high_variance_genes(data, threshold)[source]#
Filter out genes with variance below the specified threshold.
Calculates the variance for each gene and filters out genes whose variance is below the specified threshold.
Parameters: data (DataFrame): Gene expression data with genes as columns and samples as rows. threshold (float): Minimum variance level to retain a gene.
Returns: DataFrame: Filtered data with genes having variance above the threshold.
- functions.filter_low_expression_genes(data, threshold=1.0)[source]#
Filter out low-expressed genes from the dataset.
Calculates the mean expression level for each gene and filters out genes whose mean expression level is below the specified threshold.
Parameters: data (DataFrame): Expression data with genes as columns. threshold (float): Minimum mean expression level to retain a gene.
Default is 1.0.
Returns: DataFrame: Filtered data with genes above the threshold.
- functions.gen_graph_legend(node_colours, G, attr)[source]#
Generate a legend for a graph based on node colors and attributes.
- Parameters:
node_colours (pd.Series) – A series of node colors.
G (networkx.Graph) – The graph object.
attr (str) – The attribute to use for labeling.
- Returns:
A list of matplotlib patches representing the legend.
- Return type:
patches (list)
- functions.get_edge_attributes(G)[source]#
Extracts edge attributes from a graph.
- Parameters:
G (networkx.Graph) – The graph from which to extract edge attributes.
- Returns:
A list of edge attributes.
- Return type:
list
- Raises:
ValueError – If the graph G is empty or not defined.
- functions.get_highest_degree_nodes(graph, top_n=10)[source]#
Returns the nodes with the highest degree in the graph.
Parameters: graph (nx.Graph): The NetworkX graph. top_n (int): The number of top nodes to return.
Returns: List of tuples: Each tuple contains a node and its degree.
- functions.get_k_neighbours(df, k)[source]#
Returns a dictionary of k-nearest neighbors for each node in the dataframe.
- Parameters:
df (DataFrame) – The similarity matrix dataframe.
k (int) – The number of neighbors to retrieve.
- Returns:
A dictionary where the keys are the nodes and the values are lists of k-nearest neighbors.
- Return type:
dict
- functions.knn_sparsification(graph, k)[source]#
Sparsifies the graph by keeping only the top-k edges with the highest weights for each node.
Parameters: graph (nx.Graph): The original NetworkX graph. k (int): The number of nearest neighbors to keep for each node.
Returns: nx.Graph: The sparsified graph.
- functions.message_passing(node, G)[source]#
Perform message passing for a given node in a graph.
- Parameters:
node (int) – The node for which message passing is performed.
G (networkx.Graph) – The graph containing the node and its neighbors.
- Returns:
The aggregated message from the neighboring nodes.
- Return type:
numpy.ndarray
Notes
This function gathers the messages for a single node and will be used in the message_passing_iteration. The function performs propagation and aggregation. Propagation: Gather the node features of all neighboring nodes. Aggregation: Aggregate the gathered messages using median aggregation.
- functions.message_passing_iteration(G)[source]#
Perform message passing iteration on a graph.
- Parameters:
G (networkx.Graph) – The input graph.
- Returns:
None
- Description:
This function performs message passing iteration on a graph. It iterates through all nodes in the graph and updates their features based on the aggregated message from neighboring nodes.
Note
The function message_passing is defined above.
- functions.pearson_corr(data)[source]#
Calculate the Pearson correlation coefficient matrix for a given DataFrame.
- Parameters:
data (pandas.DataFrame) – The input DataFrame containing numeric data.
- Returns:
The correlation coefficient matrix.
- Return type:
pandas.DataFrame
- functions.plot_degree_distribution(G)[source]#
Plots the degree distribution of the graph.
Parameters: G (nx.Graph): The NetworkX graph.
- functions.plot_knn_network(df, K, labels, node_colours=['skyblue'], node_size=300)[source]#
Plots a k-nearest neighbors network based on the given dataframe and parameters.
- Parameters:
df (pandas.DataFrame) – The input dataframe.
K (int) – The number of nearest neighbors to consider.
labels (pandas.Series) – The labels for each node in the dataframe.
node_colours (list, optional) – The colors for the nodes. Defaults to [‘skyblue’].
node_size (int, optional) – The size of the nodes. Defaults to 300.
- Returns:
The NetworkX graph representing the k-nearest neighbors network.
- Return type:
nx.Graph
- functions.print_gene_info_with_degree(top_genes_with_degrees, gene_info)[source]#
Prints gene information including the degree.
Parameters: top_genes_with_degrees (list): List of tuples containing gene symbols and their degrees. gene_info (list): List of dictionaries containing gene information.
- functions.print_graph_info(G)[source]#
Print basic information about a NetworkX graph.
Parameters: G (nx.Graph): The NetworkX graph.
- functions.remove_by_degree(graph, min_degree)[source]#
Sparsifies the graph by removing nodes with degree below the specified threshold.
Parameters: graph (nx.Graph): The original NetworkX graph. min_degree (int): The minimum degree threshold.
Returns: nx.Graph: The sparsified graph.
- functions.spanning_tree_sparsification(graph)[source]#
Sparsifies the graph by creating a minimum spanning tree.
Parameters: graph (nx.Graph): The original NetworkX graph.
Returns: nx.Graph: The sparsified graph.
- functions.threshold_sparsification(graph, threshold)[source]#
Sparsifies the graph by removing edges below the specified weight threshold.
Parameters: graph (nx.Graph): The original NetworkX graph. threshold (float): The weight threshold.
Returns: nx.Graph: The sparsified graph.
- functions.top_percentage_sparsification(graph, top_percentage)[source]#
Sparsifies the graph by keeping the top percentage of edges by weight.
Parameters: graph (nx.Graph): The original NetworkX graph. top_percentage (float): The percentage of top-weight edges to keep.
Returns: nx.Graph: The sparsified graph.
- functions.train(g, h, train_split, val_split, device, model, labels, epochs, lr)[source]#
Trains a model using the specified graph, node features, train/validation splits, device, model, labels, epochs, and learning rate.
- Parameters:
g (Graph) – The graph object.
h (Tensor) – The node features tensor.
train_split (Tensor) – The train split tensor.
val_split (Tensor) – The validation split tensor.
device (str) – The device to train the model on.
model (nn.Module) – The model to train.
labels (Tensor) – The labels tensor.
epochs (int) – The number of training epochs.
lr (float) – The learning rate.
- Returns:
A tuple containing the figures for training and validation loss.
- Return type:
tuple