Reading SpaceRanger Output#
This section covers how to read spatial transcriptomics data from SpaceRanger output.
Reading Cell Segmentation Data#
TrackCell can read 10X HD SpaceRanger cell segmentation output.
Required Directory Structure#
The function expects the SpaceRanger output directory to have the following structure:
segmented_outputs/
├── graphclust_annotated_cell_segmentations.geojson # Cell segmentation polygons (default)
│ OR
├── cell_segmentations.geojson # Alternative filename (auto-detected)
├── filtered_feature_cell_matrix.h5 # Expression matrix
└── spatial/
├── tissue_hires_image.png # High-resolution tissue image
├── tissue_lowres_image.png # Low-resolution tissue image
└── scalefactors_json.json # Image scaling factors
Note: The function will automatically try alternative filenames if the default
graphclust_annotated_cell_segmentations.geojson is not found. Supported alternatives include:
cell_segmentations.geojson, cell_segmentations_annotated.geojson, and
annotated_cell_segmentations.geojson. You can also explicitly specify the filename using
the cell_segmentations_file parameter.
Usage#
import trackcell as tcl
# Read SpaceRanger cell segmentation output
# The function will auto-detect the segmentation file if default name is not found
adata = tcl.io.read_hd_cellseg(
datapath="SpaceRanger4.0/Cse1/outs/segmented_outputs",
sample="Cse1",
# cell_segmentations_file is optional, default is "graphclust_annotated_cell_segmentations.geojson"
)
# Or explicitly specify the segmentation file name
adata = tcl.io.read_hd_cellseg(
datapath="SpaceRanger4.0/Cse1/outs/segmented_outputs",
sample="Cse1",
cell_segmentations_file="cell_segmentations.geojson" # Custom filename
)
The resulting AnnData object contains:
Expression matrix in
.XCell metadata in
.obsGene metadata in
.varSpatial coordinates in
.obsm["spatial"]Tissue images in
.uns["spatial"][sample]["images"]Scalefactors in
.uns["spatial"][sample]["scalefactors"]Cell geometries in
.uns["spatial"][sample]["geometries"](GeoDataFrame)Cell geometries in
.obs["geometry"](WKT strings for serialization)
Reading Bin-Level Data (2um/8um/16um)#
TrackCell can read 10X HD SpaceRanger bin-level output for 2um, 8um, or 16um bins.
Required Directory Structure#
The function expects the SpaceRanger output directory to have the following structure:
square_016um/
├── filtered_feature_bc_matrix.h5 # Expression matrix (H5 format, preferred)
│ OR
├── filtered_feature_bc_matrix/ # Expression matrix (directory format)
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
└── spatial/
├── tissue_positions.parquet # Spatial coordinates (parquet format, preferred)
│ OR
├── tissue_positions.csv # Spatial coordinates (CSV format)
├── tissue_hires_image.png # High-resolution tissue image
├── tissue_lowres_image.png # Low-resolution tissue image
└── scalefactors_json.json # Image scaling factors
Usage#
import trackcell as tcl
# Read SpaceRanger bin-level output (2um/8um/16um bins)
adata = tcl.io.read_hd_bin(
datapath="SpaceRanger4.0/Cse1/binned_outputs/square_016um",
sample="Cse1",
binsize=16 # Bin size in micrometers (default: 16, common values: 2, 8, or 16)
)
# Access the bin size information
print(f"Bin size: {adata.uns['spatial']['Cse1']['binsize']} um")
Subsetting Data and Synchronizing Geometries#
When you subset an AnnData object loaded with read_hd_cellseg(), the cell geometries
stored in adata.uns["spatial"][sample]["geometries"] are not automatically updated.
This can cause errors when plotting subsetted data.
Important: Always call sync_geometries_after_subset() after subsetting data loaded
with read_hd_cellseg() to ensure geometries are synchronized.
Usage#
import trackcell as tcl
import numpy as np
# Read data
adata = tcl.io.read_hd_cellseg(
datapath="SpaceRanger4.0/Cse1/outs/segmented_outputs",
sample="Cse1"
)
# Method 1: Subset by spatial region
x_min, x_max = 16000, 18000
y_min, y_max = 14000, 18000
spatial_coords = adata.obsm['spatial']
mask = ((spatial_coords[:, 0] >= x_min) & (spatial_coords[:, 0] <= x_max) &
(spatial_coords[:, 1] >= y_min) & (spatial_coords[:, 1] <= y_max))
adata_subset = adata[mask].copy()
# IMPORTANT: Synchronize geometries after subsetting
tcl.io.sync_geometries_after_subset(adata_subset, sample="Cse1")
# Now you can safely plot the subset
tcl.pl.spatial_cell(adata_subset, color="classification")
# Method 2: Subset by cell metadata
adata_subset2 = adata[adata.obs['classification'] == 'Cluster-1'].copy()
# IMPORTANT: Synchronize geometries after subsetting
tcl.io.sync_geometries_after_subset(adata_subset2, sample="Cse1")
# Now you can safely plot the subset
tcl.pl.spatial_cell(adata_subset2, color="classification")
What Gets Synchronized#
The sync_geometries_after_subset() function:
Filters
adata.uns["spatial"][sample]["geometries"](GeoDataFrame) to only include cells present in the subsettedadata.obs_namesEnsures the geometries match the subsetted data
Note: adata.obs["geometry"] (WKT strings) is automatically subset when you subset
the AnnData object, so it doesn’t need manual synchronization. However, the plotting function
prefers using the GeoDataFrame format for better performance.
Why This Is Necessary#
When you subset an AnnData object:
adata.obsandadata.obsmare automatically subset (they are indexed by cell IDs)adata.uns["spatial"][sample]["geometries"]is NOT automatically subset (it’s a separate GeoDataFrame object)
If you try to plot without synchronizing, the plotting function may:
Fail with errors like
ValueError: aspect must be finite and positiveAttempt to access geometries for cells that no longer exist in the subset
Produce incorrect visualizations
Always call sync_geometries_after_subset() after subsetting to avoid these issues.