Skip to content

spatialfusion.utils.embed_gcn_utils

spatialfusion.utils.embed_gcn_utils

Utility functions for extracting GCN embeddings and metadata for downstream analysis.

This module provides: - extract_gcn_embeddings_with_metadata: Extracts GCN embeddings and merges with cell type, spatial, and ligand-receptor metadata.

extract_gcn_embeddings_with_metadata(model, graphs, sample_list, base_path, z_joint, device='cuda:0', spatial_key='spatial_px', celltype_key='celltypes', adata_by_sample=None)

Extracts GCN embeddings along with metadata such as spatial coordinates and cell types. Supports both in-memory AnnData inputs and on-disk loading via base_path.

Parameters:

Name Type Description Default
model Module

Trained GCN model with an encode method that takes (graph, features) as input.

required
graphs list[DGLGraph]

List of DGL graphs, one per sample, containing node features in ndata['feat'].

required
sample_list list[str]

List of sample IDs corresponding to the graphs.

required
base_path str or Path

Root directory containing per-sample subdirectories. Used only if adata_by_sample is not provided.

required
z_joint DataFrame

Joint embeddings from the autoencoder step. Used to align indices.

required
device str

Device string (e.g., 'cuda:0' or 'cpu') for model inference.

'cuda:0'
spatial_key str

Key name for spatial coordinates stored in adata.obsm.

'spatial_px'
celltype_key str

Column name in adata.obs or celltypes.csv for cell type annotation.

'celltypes'
adata_by_sample Optional[Dict[str, AnnData]]

Optional mapping from sample IDs to AnnData objects already loaded in memory. If provided, this takes precedence over disk loading.

None

Returns:

Type Description
DataFrame

pd.DataFrame: A concatenated DataFrame of GCN embeddings across all samples, including metadata: - sample_id, cell_id - celltype (and optional subtype/niche labels) - spatial coordinates (X_coord, Y_coord) - optional ligand–receptor features if present