Visualization#

This section covers visualization of spatial transcriptomics data using TrackCell.

Basic Plotting with Cell Polygons#

TrackCell provides a specialized plotting function that visualizes cells as polygons instead of points, providing a more accurate representation of cell boundaries:

# Plot cells as polygons (requires data loaded with read_hd_cellseg)
tcl.pl.spatial_cell(
    adata,
    color="classification",  # Color by cell type
    groups=['Cluster-2', 'Cluster-3'],  # Optional: filter specific groups
    figsize=(6, 6),
    edges_width=0.5,
    edges_color="black",
    alpha=0.8
)

# Plot continuous values (e.g., distance to a label)
tcl.pl.spatial_cell(
    adata,
    color="Cluster-2_dist",  # Distance to Cluster-2
    cmap="Reds",
    figsize=(6, 6)
)

# Plot gene expression
tcl.pl.spatial_cell(
    adata,
    color="PDPN",  # Gene name
    cmap="viridis",
    figsize=(6, 6)
)

# Plot multiple variables in subplots
tcl.pl.spatial_cell(
    adata,
    color=["classification", "Cluster-2_dist"],  # Two subplots
    figsize=(12, 6)
)

Custom Color Palettes#

You can customize colors for categorical variables using the palette parameter. The palette parameter accepts either a dictionary or a list/array of colors.

Using a Dictionary (Category-to-Color Mapping)

When using a dictionary, you explicitly map each category to a color:

# Define custom color palette as dictionary
custom_palette = {
    'Cluster-1': 'red',
    'Cluster-2': 'blue',
    'Cluster-3': 'green',
    'Cluster-4': 'orange'
}

tcl.pl.spatial_cell(
    adata,
    color="classification",
    palette=custom_palette,
    figsize=(6, 6)
)

Using a List/Array (Sequential Color Assignment)

When using a list or array, colors are assigned to categories in alphabetical order:

# Define custom color palette as list
# Colors will be assigned to categories in sorted order
color_list = ['#FF0000', '#0000FF', '#00FF00', '#FFA500']

tcl.pl.spatial_cell(
    adata,
    color="classification",
    palette=color_list,
    figsize=(6, 6)
)

# Or using numpy array
import numpy as np
color_array = np.array(['red', 'blue', 'green', 'yellow'])

tcl.pl.spatial_cell(
    adata,
    color="classification",
    palette=color_array,
    figsize=(6, 6)
)

Note: If the palette has fewer colors than categories, colors will be cycled. A warning will be issued if this occurs.

Performance Optimization for Large Datasets#

When working with large datasets (e.g., 200,000+ cells), visualization can be slow. Here are recommended strategies to improve performance:

Performance Comparison#

Expected performance for different approaches with ~230,000 cells:

  • Full polygon plotting: Several minutes to tens of minutes

  • Using groups (filtering to ~10% of cells): ~10-30 seconds

  • Point-based plotting: ~1-5 seconds

  • Downsampling to 10,000 cells: ~5-15 seconds

Optimization Strategies#

Strategy 1: Filter by Cell Groups

The most effective optimization is to plot only cells of interest using the groups parameter:

# Plot only specific cell types
tcl.pl.spatial_cell(
    adata,
    color="classification",
    groups=['Cluster-2', 'Cluster-3', 'Cluster-5']  # Only these cell types
)

Strategy 2: Spatial Region Cropping

Crop to a specific spatial region of interest:

import numpy as np
import trackcell as tcl

# Define region of interest
x_min, x_max = 1000, 5000
y_min, y_max = 1000, 5000

# Create mask for spatial coordinates
spatial_coords = adata.obsm['spatial']
mask = ((spatial_coords[:, 0] >= x_min) & (spatial_coords[:, 0] <= x_max) &
        (spatial_coords[:, 1] >= y_min) & (spatial_coords[:, 1] <= y_max))

# Create subset
adata_subset = adata[mask].copy()

# IMPORTANT: Synchronize geometries after subsetting
# This is required when data was loaded with read_hd_cellseg()
tcl.io.sync_geometries_after_subset(adata_subset, sample="Cse1")

# Plot subset
tcl.pl.spatial_cell(adata_subset, color="classification")

Strategy 3: Use Point-based Visualization

For large datasets, point-based visualization is much faster:

import scanpy as sc

# Point-based visualization (much faster)
sc.pl.spatial(
    adata,
    color='classification',
    spot_size=1,      # Small spots
    size=0.5,         # Further reduce size
    groups=['Cluster-2', 'Cluster-3']
)

Strategy 4: Disable Edge Rendering

Disable polygon edges to improve rendering performance:

tcl.pl.spatial_cell(
    adata,
    color="classification",
    edges_width=0,  # Disable edges
    groups=['Cluster-2', 'Cluster-3']
)

Strategy 5: Use Low-Resolution Images

Use low-resolution background images when available:

tcl.pl.spatial_cell(
    adata,
    color="classification",
    img_key="lowres",  # Use low-resolution image
    groups=['Cluster-2', 'Cluster-3']
)

Best Practices#

  1. Always use GeoDataFrame format: Ensure your data uses adata.uns['spatial'][sample]['geometries'] (GeoDataFrame) rather than WKT strings for better performance.

  2. Start with point-based visualization: Use sc.pl.spatial() for initial exploration, then switch to polygon-based visualization for detailed analysis.

  3. Filter before plotting: Always use groups or spatial cropping to reduce the number of cells before plotting.

  4. Combine strategies: Use multiple optimization strategies together for best results.

  5. Save intermediate results: For repeated visualization, consider saving filtered subsets to avoid repeated filtering operations.

Example Workflow#

Here’s a recommended workflow for visualizing large datasets:

import trackcell as tcl
import scanpy as sc

# Step 1: Quick overview with point-based plot
sc.pl.spatial(
    adata,
    color='classification',
    spot_size=0.5,
    size=0.3
)

# Step 2: Detailed view of specific cell types with polygons
tcl.pl.spatial_cell(
    adata,
    color="classification",
    groups=['Cluster-2', 'Cluster-3'],  # Focus on specific types
    edges_width=0,                      # Optimize performance
    figsize=(6, 6)
)

# Step 3: High-resolution view of specific region
# (Use spatial cropping as shown in Strategy 2)