Custom Function Definitions#

class functions.GCN(input_dim, hidden_feats, num_classes)[source]#

Graph Convolutional Network (GCN) class.

This class represents a Graph Convolutional Network (GCN) model. It inherits from the nn.Module class of PyTorch.

Parameters:
  • input_dim (int) – The dimensionality of the input features.

  • hidden_feats (list) – A list of integers representing the number of hidden units in each layer.

  • num_classes (int) – The number of output classes.

forward(g, h)[source]#

Forward pass of the GCN model.

This method performs the forward pass of the GCN model for an arbitrary number of layers.

Parameters:
  • g (Graph) – The input graph.

  • h (Tensor) – The input node features.

Returns:

The output scores of the model.

Return type:

Tensor

functions.abs_bicorr(data)[source]#

Calculate the absolute biweight midcorrelation matrix for the given data.

Parameters:

data (pandas.DataFrame) – The input DataFrame containing numeric data.

Returns:

The absolute biweight midcorrelation matrix.

Return type:

pandas.DataFrame

functions.analyse_and_plot_density(graph)[source]#

Calculates and plots the density of the graph for a predefined series of thresholds.

Parameters: graph (nx.Graph): The original NetworkX graph.

Returns: densities (list of float): Densities of the graph at each threshold.

functions.calc_abs_bicorr(data)[source]#

Calculate the absolute biweight midcorrelation matrix for numeric data.

Parameters: data (pd.DataFrame): Input DataFrame with numeric data.

Returns: pd.DataFrame: DataFrame containing the absolute biweight midcorrelation matrix.

functions.clean_graph(G, degree_threshold=1, keep_largest_component=True)[source]#

Cleans the graph by performing several cleaning steps: - Removes unconnected nodes (isolates) - Removes self-loops - Removes nodes with a degree below a specified threshold - Keeps only the largest connected component (optional)

Parameters: G (nx.Graph): The NetworkX graph to clean. degree_threshold (int): Minimum degree for nodes to keep. keep_largest_component (bool): Whether to keep only the largest connected component.

Returns: G (nx.Graph): Cleaned graph.

functions.create_graph_from_correlation(correlation_matrix, threshold=0.8)[source]#

Creates a graph from a correlation matrix using a specified threshold.

Parameters: correlation_matrix (pd.DataFrame): DataFrame containing the correlation matrix. threshold (float): Threshold for including edges based on correlation value.

Returns: G (nx.Graph): Graph created from the correlation matrix.

functions.draw_network_with_node_attrs(G, node_attributes, communities=None, title='Network Visualization', color_attr=None, shape_attr=None, figsize=(20, 10), layout='spring', cmap_name='tab20', with_labels=False, savefig=False, save_path_prefix='')[source]#

Draws a network graph with nodes colored and shaped based on their attributes, and optionally colored by community membership.

Parameters:
  • G (networkx.Graph) – The graph to be drawn.

  • node_attributes (dict) – A dictionary where keys are node names and values are dictionaries of attributes.

  • communities (List[List[Any]], optional) – A list where each sublist contains the nodes belonging to a community. Default is None.

  • title (str, optional) – The title of the plot. Default is ‘Network Visualization’.

  • color_attr (str, optional) – Node attribute to color nodes by. Default is None.

  • shape_attr (str, optional) – Node attribute to shape nodes by. Default is None.

  • figsize (tuple, optional) – The size of the figure. Default is (20, 10).

  • layout (str, optional) – The layout algorithm for positioning nodes (‘spring’, ‘circular’, etc.). Default is ‘spring’.

  • cmap_name (str, optional) – The name of the colormap to use for coloring. Default is ‘tab20’.

  • with_labels (bool, optional) – Whether to draw labels for the nodes. Default is False.

Raises:
  • ValueError – If the graph G is empty or not defined.

  • ValueError – If node_attributes is empty or not defined.

The function draws the graph with nodes colored and/or shaped based on their attributes. If communities are provided, nodes are colored by their community memberships. A legend is added to indicate the mapping of attributes to colors and shapes.

functions.evaluate(split, device, g, h, model, labels)[source]#

Evaluate the performance of a model on a given dataset split.

Parameters:
  • split (Tensor) – The index of the dataset split to evaluate.

  • device – The device to perform the evaluation on.

  • g – The graph input to the model.

  • h – The node features input to the model.

  • model – The model to evaluate.

  • labels – The ground truth labels for the dataset.

Returns:

A tuple containing the evaluation metrics:
  • loss (float): The loss value.

  • acc (float): The accuracy value.

  • F1 (float): The F1 score.

  • PRC (float): The precision-recall curve value.

  • SNS (float): The sensitivity value.

Return type:

tuple

functions.fetch_gene_info(gene_list)[source]#

Fetches gene information from MyGene.info.

Parameters: gene_list (list): List of gene symbols or Ensembl IDs.

Returns: list: List of dictionaries containing gene information.

functions.filter_high_variance_genes(data, threshold)[source]#

Filter out genes with variance below the specified threshold.

Calculates the variance for each gene and filters out genes whose variance is below the specified threshold.

Parameters: data (DataFrame): Gene expression data with genes as columns and samples as rows. threshold (float): Minimum variance level to retain a gene.

Returns: DataFrame: Filtered data with genes having variance above the threshold.

functions.filter_low_expression_genes(data, threshold=1.0)[source]#

Filter out low-expressed genes from the dataset.

Calculates the mean expression level for each gene and filters out genes whose mean expression level is below the specified threshold.

Parameters: data (DataFrame): Expression data with genes as columns. threshold (float): Minimum mean expression level to retain a gene.

Default is 1.0.

Returns: DataFrame: Filtered data with genes above the threshold.

functions.gen_graph_legend(node_colours, G, attr)[source]#

Generate a legend for a graph based on node colors and attributes.

Parameters:
  • node_colours (pd.Series) – A series of node colors.

  • G (networkx.Graph) – The graph object.

  • attr (str) – The attribute to use for labeling.

Returns:

A list of matplotlib patches representing the legend.

Return type:

patches (list)

functions.get_edge_attributes(G)[source]#

Extracts edge attributes from a graph.

Parameters:

G (networkx.Graph) – The graph from which to extract edge attributes.

Returns:

A list of edge attributes.

Return type:

list

Raises:

ValueError – If the graph G is empty or not defined.

functions.get_highest_degree_nodes(graph, top_n=10)[source]#

Returns the nodes with the highest degree in the graph.

Parameters: graph (nx.Graph): The NetworkX graph. top_n (int): The number of top nodes to return.

Returns: List of tuples: Each tuple contains a node and its degree.

functions.get_k_neighbours(df, k)[source]#

Returns a dictionary of k-nearest neighbors for each node in the dataframe.

Parameters:
  • df (DataFrame) – The similarity matrix dataframe.

  • k (int) – The number of neighbors to retrieve.

Returns:

A dictionary where the keys are the nodes and the values are lists of k-nearest neighbors.

Return type:

dict

functions.knn_sparsification(graph, k)[source]#

Sparsifies the graph by keeping only the top-k edges with the highest weights for each node.

Parameters: graph (nx.Graph): The original NetworkX graph. k (int): The number of nearest neighbors to keep for each node.

Returns: nx.Graph: The sparsified graph.

functions.message_passing(node, G)[source]#

Perform message passing for a given node in a graph.

Parameters:
  • node (int) – The node for which message passing is performed.

  • G (networkx.Graph) – The graph containing the node and its neighbors.

Returns:

The aggregated message from the neighboring nodes.

Return type:

numpy.ndarray

Notes

This function gathers the messages for a single node and will be used in the message_passing_iteration. The function performs propagation and aggregation. Propagation: Gather the node features of all neighboring nodes. Aggregation: Aggregate the gathered messages using median aggregation.

functions.message_passing_iteration(G)[source]#

Perform message passing iteration on a graph.

Parameters:

G (networkx.Graph) – The input graph.

Returns:

None

Description:

This function performs message passing iteration on a graph. It iterates through all nodes in the graph and updates their features based on the aggregated message from neighboring nodes.

Note

The function message_passing is defined above.

functions.pearson_corr(data)[source]#

Calculate the Pearson correlation coefficient matrix for a given DataFrame.

Parameters:

data (pandas.DataFrame) – The input DataFrame containing numeric data.

Returns:

The correlation coefficient matrix.

Return type:

pandas.DataFrame

functions.plot_degree_distribution(G)[source]#

Plots the degree distribution of the graph.

Parameters: G (nx.Graph): The NetworkX graph.

functions.plot_knn_network(df, K, labels, node_colours=['skyblue'], node_size=300)[source]#

Plots a k-nearest neighbors network based on the given dataframe and parameters.

Parameters:
  • df (pandas.DataFrame) – The input dataframe.

  • K (int) – The number of nearest neighbors to consider.

  • labels (pandas.Series) – The labels for each node in the dataframe.

  • node_colours (list, optional) – The colors for the nodes. Defaults to [‘skyblue’].

  • node_size (int, optional) – The size of the nodes. Defaults to 300.

Returns:

The NetworkX graph representing the k-nearest neighbors network.

Return type:

nx.Graph

functions.print_gene_info_with_degree(top_genes_with_degrees, gene_info)[source]#

Prints gene information including the degree.

Parameters: top_genes_with_degrees (list): List of tuples containing gene symbols and their degrees. gene_info (list): List of dictionaries containing gene information.

functions.print_graph_info(G)[source]#

Print basic information about a NetworkX graph.

Parameters: G (nx.Graph): The NetworkX graph.

functions.remove_by_degree(graph, min_degree)[source]#

Sparsifies the graph by removing nodes with degree below the specified threshold.

Parameters: graph (nx.Graph): The original NetworkX graph. min_degree (int): The minimum degree threshold.

Returns: nx.Graph: The sparsified graph.

functions.spanning_tree_sparsification(graph)[source]#

Sparsifies the graph by creating a minimum spanning tree.

Parameters: graph (nx.Graph): The original NetworkX graph.

Returns: nx.Graph: The sparsified graph.

functions.threshold_sparsification(graph, threshold)[source]#

Sparsifies the graph by removing edges below the specified weight threshold.

Parameters: graph (nx.Graph): The original NetworkX graph. threshold (float): The weight threshold.

Returns: nx.Graph: The sparsified graph.

functions.top_percentage_sparsification(graph, top_percentage)[source]#

Sparsifies the graph by keeping the top percentage of edges by weight.

Parameters: graph (nx.Graph): The original NetworkX graph. top_percentage (float): The percentage of top-weight edges to keep.

Returns: nx.Graph: The sparsified graph.

functions.train(g, h, train_split, val_split, device, model, labels, epochs, lr)[source]#

Trains a model using the specified graph, node features, train/validation splits, device, model, labels, epochs, and learning rate.

Parameters:
  • g (Graph) – The graph object.

  • h (Tensor) – The node features tensor.

  • train_split (Tensor) – The train split tensor.

  • val_split (Tensor) – The validation split tensor.

  • device (str) – The device to train the model on.

  • model (nn.Module) – The model to train.

  • labels (Tensor) – The labels tensor.

  • epochs (int) – The number of training epochs.

  • lr (float) – The learning rate.

Returns:

A tuple containing the figures for training and validation loss.

Return type:

tuple

functions.visualise_edge_weight_distribution(G)[source]#

Visualizes the distribution of edge weights.

Parameters: edge_weights (list): List of edge weights.

functions.visualise_graph(G, title='Gene Co-expression Network')[source]#

Visualizes the graph using Matplotlib and NetworkX.

Parameters: G (nx.Graph): Graph to visualize. title (str): Title of the plot.