Visualization ============= This section covers visualization of spatial transcriptomics data using TrackCell. Basic Plotting with Cell Polygons ---------------------------------- TrackCell provides a specialized plotting function that visualizes cells as polygons instead of points, providing a more accurate representation of cell boundaries: .. code-block:: python # Plot cells as polygons (requires data loaded with read_hd_cellseg) tcl.pl.spatial_cell( adata, color="classification", # Color by cell type groups=['Cluster-2', 'Cluster-3'], # Optional: filter specific groups figsize=(6, 6), edges_width=0.5, edges_color="black", alpha=0.8 ) # Plot continuous values (e.g., distance to a label) tcl.pl.spatial_cell( adata, color="Cluster-2_dist", # Distance to Cluster-2 cmap="Reds", figsize=(6, 6) ) # Plot gene expression tcl.pl.spatial_cell( adata, color="PDPN", # Gene name cmap="viridis", figsize=(6, 6) ) # Plot multiple variables in subplots tcl.pl.spatial_cell( adata, color=["classification", "Cluster-2_dist"], # Two subplots figsize=(12, 6) ) Custom Color Palettes ---------------------- You can customize colors for categorical variables using the ``palette`` parameter. The ``palette`` parameter accepts either a dictionary or a list/array of colors. **Using a Dictionary (Category-to-Color Mapping)** When using a dictionary, you explicitly map each category to a color: .. code-block:: python # Define custom color palette as dictionary custom_palette = { 'Cluster-1': 'red', 'Cluster-2': 'blue', 'Cluster-3': 'green', 'Cluster-4': 'orange' } tcl.pl.spatial_cell( adata, color="classification", palette=custom_palette, figsize=(6, 6) ) **Using a List/Array (Sequential Color Assignment)** When using a list or array, colors are assigned to categories in alphabetical order: .. code-block:: python # Define custom color palette as list # Colors will be assigned to categories in sorted order color_list = ['#FF0000', '#0000FF', '#00FF00', '#FFA500'] tcl.pl.spatial_cell( adata, color="classification", palette=color_list, figsize=(6, 6) ) # Or using numpy array import numpy as np color_array = np.array(['red', 'blue', 'green', 'yellow']) tcl.pl.spatial_cell( adata, color="classification", palette=color_array, figsize=(6, 6) ) **Note**: If the palette has fewer colors than categories, colors will be cycled. A warning will be issued if this occurs. Performance Optimization for Large Datasets -------------------------------------------- When working with large datasets (e.g., 200,000+ cells), visualization can be slow. Here are recommended strategies to improve performance: Recommended Combination Strategy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For datasets with 200,000+ cells, we recommend using a combination of optimization techniques: **Option 1: Optimized Polygon Plotting** Use ``groups`` parameter to filter cells, disable edges, and use low-resolution images: .. code-block:: python import trackcell as tcl # Optimized polygon plotting for large datasets tcl.pl.spatial_cell( adata, color="classification", groups=['Cluster-2', 'Cluster-3'], # Only plot cells of interest edges_width=0, # Disable edges for better performance img_key="lowres", # Use low-resolution image figsize=(10, 10) ) **Option 2: Fast Point-based Preview** For quick exploration, use point-based visualization which is much faster: .. code-block:: python import scanpy as sc # Fast point-based preview sc.pl.spatial( adata, color='classification', spot_size=0.5, size=0.3, groups=['Cluster-2', 'Cluster-3'] ) Performance Comparison ~~~~~~~~~~~~~~~~~~~~~~ Expected performance for different approaches with ~230,000 cells: * **Full polygon plotting**: Several minutes to tens of minutes * **Using groups** (filtering to ~10% of cells): ~10-30 seconds * **Point-based plotting**: ~1-5 seconds * **Downsampling to 10,000 cells**: ~5-15 seconds Optimization Strategies ~~~~~~~~~~~~~~~~~~~~~~~ **Strategy 1: Filter by Cell Groups** The most effective optimization is to plot only cells of interest using the ``groups`` parameter: .. code-block:: python # Plot only specific cell types tcl.pl.spatial_cell( adata, color="classification", groups=['Cluster-2', 'Cluster-3', 'Cluster-5'] # Only these cell types ) **Strategy 2: Spatial Region Cropping** Crop to a specific spatial region of interest: .. code-block:: python import numpy as np import trackcell as tcl # Define region of interest x_min, x_max = 1000, 5000 y_min, y_max = 1000, 5000 # Create mask for spatial coordinates spatial_coords = adata.obsm['spatial'] mask = ((spatial_coords[:, 0] >= x_min) & (spatial_coords[:, 0] <= x_max) & (spatial_coords[:, 1] >= y_min) & (spatial_coords[:, 1] <= y_max)) # Create subset adata_subset = adata[mask].copy() # IMPORTANT: Synchronize geometries after subsetting # This is required when data was loaded with read_hd_cellseg() tcl.io.sync_geometries_after_subset(adata_subset, sample="Cse1") # Plot subset tcl.pl.spatial_cell(adata_subset, color="classification") **Strategy 3: Use Point-based Visualization** For large datasets, point-based visualization is much faster: .. code-block:: python import scanpy as sc # Point-based visualization (much faster) sc.pl.spatial( adata, color='classification', spot_size=1, # Small spots size=0.5, # Further reduce size groups=['Cluster-2', 'Cluster-3'] ) **Strategy 4: Disable Edge Rendering** Disable polygon edges to improve rendering performance: .. code-block:: python tcl.pl.spatial_cell( adata, color="classification", edges_width=0, # Disable edges groups=['Cluster-2', 'Cluster-3'] ) **Strategy 5: Use Low-Resolution Images** Use low-resolution background images when available: .. code-block:: python tcl.pl.spatial_cell( adata, color="classification", img_key="lowres", # Use low-resolution image groups=['Cluster-2', 'Cluster-3'] ) Best Practices ~~~~~~~~~~~~~~ 1. **Always use GeoDataFrame format**: Ensure your data uses ``adata.uns['spatial'][sample]['geometries']`` (GeoDataFrame) rather than WKT strings for better performance. 2. **Start with point-based visualization**: Use ``sc.pl.spatial()`` for initial exploration, then switch to polygon-based visualization for detailed analysis. 3. **Filter before plotting**: Always use ``groups`` or spatial cropping to reduce the number of cells before plotting. 4. **Combine strategies**: Use multiple optimization strategies together for best results. 5. **Save intermediate results**: For repeated visualization, consider saving filtered subsets to avoid repeated filtering operations. Example Workflow ~~~~~~~~~~~~~~~~ Here's a recommended workflow for visualizing large datasets: .. code-block:: python import trackcell as tcl import scanpy as sc # Step 1: Quick overview with point-based plot sc.pl.spatial( adata, color='classification', spot_size=0.5, size=0.3 ) # Step 2: Detailed view of specific cell types with polygons tcl.pl.spatial_cell( adata, color="classification", groups=['Cluster-2', 'Cluster-3'], # Focus on specific types edges_width=0, # Optimize performance figsize=(6, 6) ) # Step 3: High-resolution view of specific region # (Use spatial cropping as shown in Strategy 2)