Caching System

This guide explains the caching system used in the Tracker API, including how it works, best practices, and troubleshooting common issues.

Overview

The Tracker API uses Redis for caching API responses to improve performance and reduce database load. The caching system is implemented in two layers:

Backend (Redis) Caching: Server-side caching of API responses using Redis
Frontend (React Query) Caching: Client-side caching in the Admin Panel and Frontend using React Query

This guide focuses on the backend Redis caching system.

Caching Strategy

CRUD-Based Cache Invalidation - Strategy

Current Implementation: The system uses a CRUD-based cache invalidation strategy instead of TTL-based expiration. Caches are invalidated immediately when data is created, updated, or deleted, ensuring real-time data consistency.

Key Principles

Event-Driven Invalidation: Caches are invalidated based on data modification events, not time-based expiration
Immediate Consistency: Changes appear in the frontend immediately after database updates
Pattern-Based Clearing: Related cache patterns are invalidated together to maintain data consistency
No TTL Dependency: Cache entries remain valid until explicitly invalidated by CRUD operations

Aggressive Cache Invalidation Strategy

Current Implementation: The system uses an aggressive cache invalidation strategy to ensure data freshness, prioritizing real-time data consistency over cache performance.

Background Service Cache Invalidation

Background services that update tracker data automatically invalidate all tracker-related caches to prevent stale data issues:

Service	Triggers Cache Invalidation When	Invalidated Patterns
Tracker Status Service	Updates tracker status or creates status history	`tracker:*`
	Processes batch status updates	`production_run:*`
		`locations:*`
		`map_data:*`
Unified Geofence Service	Processes location reports and updates status	`tracker:*`
	Creates geofence events	`production_run:*`
	Batch processes location reports	`locations:*`
		`map_data:*`
Tracker Fetcher Service	Updates `last_report_received` timestamps	`tracker:*`
	Stores new location reports	`production_run:*`
		`locations:*`
		`map_data:*`

Shared Cache Invalidation Utility

All background services use the shared cache invalidation utility (services/shared/cache_invalidation.py):

from services.shared.cache_invalidation import invalidate_tracker_caches

# Called after any tracker data update
invalidate_tracker_caches("service_name")

This utility invalidates all tracker-related cache patterns:

tracker:* - All tracker data caches
production_run:* - Production run caches (contain tracker data)
locations:* - Location data caches
map_data:* - Map visualization caches

API Route Cache Invalidation Matrix

When entity X is modified via API routes, invalidate cache patterns for X and dependent entities:

Modified Entity	Invalidate Patterns
Production Run	`production_run::{id}:`
	`locations:::production_run_{id}:*`
	`map_data::production_run_{id}:`
Location	`{location_type}::{id}:`
	`{location_type}:*` (all location type cache)
	`locations:::*`
	`map_data::location_{id}*`
Brand/Client	`{entity}::{id}:`
	`{entity}:list:*`
	`production_run::client_{id}:`

Frontend-Backend Alignment

Event-driven invalidation - backend CRUD operations trigger immediate frontend cache clearing
No TTL dependency - frontend caches remain valid until backend invalidation events
Consistent cache keys - frontend query keys must match backend cache key patterns
Mutation invalidation - all mutations must invalidate relevant frontend queries immediately

Redis Cache Implementation

Architecture

The Redis caching system consists of the following components:

RedisClient: A wrapper around the Redis client library that handles connection management and provides basic operations like get, set, delete, etc.
CacheManager: A generic cache manager for Pydantic models that handles serialization/deserialization and provides higher-level caching operations.
Cache Utilities: Standardized utilities for cache key generation and invalidation.

Cache Key Generation

Cache keys are generated using a standardized format to ensure consistency across different parts of the application. The format is:

{entity_type}:id:{entity_id}:user:{user_id}:admin:{is_admin}:{additional_parameters}

For example:

production_runs:user:1:admin:true:skip:0:limit:10
production_run:id:123:user:1:admin:false
production_run_trackers:id:123:user:1:admin:true:skip:0:limit:10

The standardized cache key generation is implemented in the generate_cache_key function in app/core/cache_utils.py.

CRUD-Based Cache Invalidation

Cache invalidation is performed immediately when data is modified through CRUD operations (created, updated, or deleted). The cache invalidation system is designed to invalidate not only the specific entity that was modified but also related entities that might be affected by the change.

All cache invalidation is triggered by CRUD operations, not TTL expiration:

CRUD Operation	Trigger	Invalidated Patterns
CREATE	New entity created via API or background service	Entity-specific patterns + related patterns
UPDATE	Entity modified via API or background service	Entity-specific patterns + related patterns
DELETE	Entity removed via API or background service	Entity-specific patterns + related patterns

For example, when a production run is updated:

The specific production run's cache is invalidated immediately
The list of production runs cache is invalidated immediately
The trackers associated with the production run cache is invalidated immediately

The standardized cache invalidation is implemented in the invalidate_entity_cache function and entity-specific helper functions in app/core/cache_utils.py.

Using the Caching System

Safe Caching with SQLAlchemy Objects

Important: When caching SQLAlchemy objects, you must use the safe caching helpers to avoid serialization errors with database sessions and locks.

Safe Caching Helpers

The application provides safe caching helpers in app/core/cache_helpers.py:

prepare_for_cache(): Converts SQLAlchemy objects to Pydantic schemas
safe_cache_set(): Safely caches data with proper serialization
create_paginated_cache_data(): Creates paginated response data for caching
sqlalchemy_to_dict(): Converts SQLAlchemy objects to dictionaries

In API Routes

To use the caching system in API routes:

Import the cache utilities and safe helpers:

from app.core.cache_utils import generate_cache_key, invalidate_production_run_cache
from app.core.cache_helpers import create_paginated_cache_data, prepare_for_cache

Generate a cache key for GET requests:

# Extract user ID as an integer
user_id = int(current_user.id)

cache_key = generate_cache_key(
    entity_type="production_run",
    entity_id=production_run_id,
    user_id=user_id,
    is_admin=crud.user.is_admin(current_user),
)

Try to get data from cache first:

try:
    cached_data = cache_manager.get(cache_key, request)
    if cached_data:
        # Add performance headers
        request.state.query_time = round((time.time() - start_time) * 1000, 2)
        request.state.cache_status = "hit"
        request.state.query_count = 0
        return cached_data
except Exception as e:
    logger.warning(f"Cache get failed: {str(e)}")
    # Continue without cache

If not in cache, get from database and cache the result safely:

# Get data from database (SQLAlchemy objects)
sqlalchemy_objects = get_data_from_database()

# For single objects - convert to dict first, then cache
if isinstance(sqlalchemy_objects, single_object):
    obj_dict = {c.name: getattr(sqlalchemy_objects, c.name)
                for c in sqlalchemy_objects.__table__.columns}
    # Process any special fields (like image URLs)
    if obj_dict.get("image_url"):
        obj_dict["image_url"] = get_full_image_url(obj_dict["image_url"])

    # Convert to Pydantic schema for response
    response_data = schemas.YourSchema.model_validate(obj_dict)

    # Cache the dict, not the SQLAlchemy object
    try:
        prepared_data = prepare_for_cache(obj_dict, schemas.YourSchema)
        cache_manager.set(cache_key, prepared_data)
    except Exception as e:
        logger.error(f"Error caching data: {str(e)}")

# For paginated lists - use the helper function
elif isinstance(sqlalchemy_objects, list):
    # Convert SQLAlchemy objects to dicts
    object_dicts = []
    for obj in sqlalchemy_objects:
        obj_dict = {c.name: getattr(obj, c.name) for c in obj.__table__.columns}
        object_dicts.append(obj_dict)

    # Create paginated response
    response_data = create_paginated_response(object_dicts, total, page, limit, pages)

    # Cache using safe helper
    try:
        cache_data = create_paginated_cache_data(
            object_dicts,  # Use dicts, not SQLAlchemy objects
            total_count,
            current_page,
            limit,
            total_pages,
            schemas.YourSchema,
            process_image_urls=True
        )
        if cache_data:
            cache_manager.set(cache_key, cache_data)
    except Exception as e:
        logger.error(f"Error caching paginated data: {str(e)}")

return response_data

Invalidate cache when data is modified:

# Update data in database
updated_data = update_data_in_database()

# Invalidate cache
invalidate_production_run_cache(production_run_id)

return updated_data

Common Serialization Issues and Solutions

Problem: "cannot pickle '_thread.RLock' object"

This error occurs when trying to cache SQLAlchemy objects that contain database session locks.

Solution: Always convert SQLAlchemy objects to dictionaries or Pydantic schemas before caching:

# ❌ DON'T: Cache SQLAlchemy objects directly
cache_manager.set(cache_key, sqlalchemy_object)

# ✅ DO: Convert to dict first
obj_dict = {c.name: getattr(sqlalchemy_object, c.name)
            for c in sqlalchemy_object.__table__.columns}
prepared_data = prepare_for_cache(obj_dict, schemas.YourSchema)
cache_manager.set(cache_key, prepared_data)

Problem: Geographic/Spatial Data Serialization

Geographic data (WKBElement, Point, Polygon) can cause serialization issues.

Solution: The safe caching helpers automatically skip non-serializable geographic data:

# The helpers automatically handle this
prepared_data = prepare_for_cache(data_with_geo_fields, schemas.YourSchema)

Problem: Relationship Objects

SQLAlchemy relationship objects contain references to database sessions.

Solution: Use exclude_relations=True (default) in sqlalchemy_to_dict():

# Automatically excludes relationship objects
obj_dict = sqlalchemy_to_dict(sqlalchemy_object, exclude_relations=True)

In CLI Scripts

CLI scripts that modify data should also invalidate the cache to ensure consistency. For example:

from app.core.cache_utils import invalidate_production_run_cache

# Import trackers from CSV
import_trackers_from_csv(csv_path, production_run_id)

# Invalidate cache
invalidate_production_run_cache(production_run_id)

Best Practices

Use Standardized Cache Keys: Always use the generate_cache_key function to generate cache keys to ensure consistency.
Invalidate Related Caches: When modifying data, invalidate not only the specific entity's cache but also related entities that might be affected by the change.
Handle Cache Invalidation Errors Gracefully: Cache invalidation should not prevent the operation from completing. If cache invalidation fails, log the error and continue.
CRUD-Based Invalidation: Ensure all CRUD operations (Create, Update, Delete) trigger appropriate cache invalidation immediately after database commits.
Monitor Cache Hit Rate: Monitor the cache hit rate to ensure the caching system is effective. A low hit rate might indicate issues with cache key generation or too aggressive invalidation.
Background Service Integration: Ensure background services that modify data use the shared cache invalidation utilities to maintain consistency.

Troubleshooting

Stale Data

Note: As of August 2025, the system implements aggressive cache invalidation in background services to prevent stale data issues. If you're still seeing stale data, it could be due to:

Missing Cache Invalidation in API Routes: Ensure that API route cache invalidation is performed when data is modified via the web interface.
Incorrect Cache Keys: Ensure that cache keys are generated consistently.
CLI Scripts Bypassing Cache Invalidation: Ensure that CLI scripts that modify data also invalidate the cache.
Background Service Issues: Check that background services are running and processing data correctly.

To fix stale data issues:

Check background service logs for cache invalidation messages:

# Look for cache invalidation log entries
docker logs tracker-api | grep "invalidated.*cache keys"

Flush the Redis cache as a temporary fix:

redis-cli FLUSHALL

Verify background services are running:

# Check if services are processing data
docker logs tracker-api | grep -E "(tracker_status_service|unified_geofence_service|tracker_fetcher_service)"

Check the cache invalidation code in the relevant API routes and CLI scripts.
Ensure that the invalidate_entity_cache function is called with the correct parameters.

Background Service Cache Invalidation Debugging

If background services are not invalidating caches properly:

Check service logs for cache invalidation calls:

# Look for specific service cache invalidation
docker logs tracker-api | grep "invalidated.*cache keys"

Verify shared cache invalidation utility is being imported correctly in services:

from services.shared.cache_invalidation import invalidate_tracker_caches

Check Redis connectivity from background services - cache invalidation failures should be logged as errors.

High Cache Miss Rate

If you're seeing a high cache miss rate, it could be due to:

Inconsistent Cache Keys: Ensure that cache keys are generated consistently.
Too Aggressive Cache Invalidation: The current CRUD-based strategy prioritizes data freshness over cache performance. This is expected behavior.
Frequent Data Modifications: High cache miss rates are normal when data is frequently updated, as caches are invalidated immediately on CRUD operations.

Redis Connection Issues

If you're experiencing Redis connection issues:

Check the Redis connection parameters in the .env file.
Ensure that Redis is running and accessible from the API server.
Check the Redis logs for any errors.
Verify that the Redis password is correct.

Recent Improvements

Redis Cluster Support and Cache Invalidation Fix (October 2025)

We've implemented comprehensive fixes for Redis Cluster mode to resolve cache invalidation issues that were preventing proper cache clearing across the application.

Problem

The application experienced cache invalidation failures in Redis Cluster mode due to two critical issues:

Cluster Node Scanning Failure: The keys() method couldn't properly iterate over cluster primary nodes, resulting in "No targets were found to execute SCAN command" errors.
Hash Tag Pattern Mismatch: Cache keys stored with hash tags (e.g., {production_run:list:...}) weren't matched by invalidation patterns (e.g., production_run:*).

Impact: Cache invalidation failed silently, requiring manual FLUSHALL to clear stale data after creating, updating, or deleting entities like production runs, storage locations, and brands.

Solution

1. Enhanced Redis Cluster Node Scanning (app/core/redis.py)

Improved _get_primary_nodes() method to properly detect cluster nodes using redis-py 6.1+ API
Enhanced keys() method with robust error handling for per-node scanning
Added comprehensive logging for cluster operations

def _get_primary_nodes(self) -> List[Redis]:
    """Get all primary nodes from the Redis cluster."""
    if not self.is_cluster or self.client is None:
        return []

    try:
        # For redis-py 6.1+, use get_nodes() and filter for primaries
        if hasattr(self.client, "get_nodes"):
            nodes = self.client.get_nodes()
            primary_nodes = []
            for node in nodes:
                if hasattr(node, "redis_connection"):
                    primary_nodes.append(node.redis_connection)
            return primary_nodes

        # Fallback to older API
        if hasattr(self.client, "get_primaries"):
            return self.client.get_primaries()

        logger.warning("Cannot get cluster nodes - no suitable API found")
        return []
    except Exception as e:
        logger.error(f"Error getting primary nodes: {e}")
        return []

2. Automatic Hash Tag Pattern Handling (app/core/cache_utils.py)

Added generate_invalidation_pattern() function to automatically transform patterns for cluster mode
Updated invalidate_cache_by_patterns() to use pattern transformation
Added comprehensive debug logging

def generate_invalidation_pattern(pattern: str, is_cluster: Optional[bool] = None) -> str:
    """
    Generate cache invalidation pattern for both standalone and cluster modes.

    In cluster mode, CacheManager wraps keys with hash tags like {key},
    so we need to match those hash-tagged keys.
    """
    if is_cluster is None:
        is_cluster = redis_client.is_cluster

    if is_cluster:
        # In cluster mode, keys are wrapped with hash tags: {pattern}
        return f"{{{pattern}}}"

    return pattern

Cache Invalidation Benefits

App-Wide Fix: All cache invalidation now works correctly for all entity types (production runs, brands, clients, trackers, storage locations, delivery locations)
Automatic Handling: Pattern transformation happens automatically - no changes needed in individual routes
Backward Compatible: Works correctly in both standalone and cluster Redis modes
Better Logging: Comprehensive debug logging helps troubleshoot cache issues

Testing

After implementation, verify cache invalidation works by:

Creating a production run → should appear immediately in list
Updating a production run → changes should appear immediately
Deleting a production run → should disappear immediately
Adding/editing storage locations → should work without errors
No need to manually run FLUSHALL to see changes

Technical Details

Affected Functions (all now work correctly in cluster mode):

invalidate_production_run_cache() - production runs
invalidate_brand_cache() - brands
invalidate_client_cache() - clients
invalidate_tracker_cache() - trackers
invalidate_location_cache() - storage/delivery locations
All smart invalidation functions

Logs to Monitor:

DEBUG: Invalidating pattern: production_run:* -> {production_run:*}
DEBUG: Scanning 3 primary nodes for pattern: {production_run:*}
DEBUG: Found 5 keys on node 192.168.1.10:6379
DEBUG: Found 12 keys matching pattern {production_run:*}
INFO: Total keys invalidated: 12

Safe Caching Implementation (2025)

We've implemented a comprehensive safe caching system to resolve serialization issues with SQLAlchemy objects:

New Safe Caching Helpers (`app/core/cache_helpers.py`)

sqlalchemy_to_dict(): Safely converts SQLAlchemy objects to dictionaries, excluding non-serializable attributes like database sessions and relationship objects.
prepare_for_cache(): Converts SQLAlchemy objects to Pydantic schemas before caching, with automatic handling of image URLs and other special fields.
safe_cache_set(): Wrapper function for safe caching with comprehensive error handling.
create_paginated_cache_data(): Creates paginated response data suitable for caching, handling lists of SQLAlchemy objects safely.

Enhanced Redis Cache Manager

The Redis cache manager has been improved with:

Advanced serialization detection: Automatically detects and skips non-serializable objects like thread locks, database sessions, and SQLAlchemy relationship objects.
Comprehensive error handling: Graceful handling of serialization failures without breaking the API.
Geographic data support: Automatic handling of spatial data types that can cause serialization issues.

Performance Middleware Improvements

The performance middleware has been enhanced to:

Safe JSON response modification: Properly handles responses containing non-serializable objects.
Fallback mechanisms: Continues operation even when performance metrics injection fails.
Better error logging: Improved error reporting without exposing sensitive data.

Resolved Issues

Fixed "cannot pickle '_thread.RLock' object" error: The primary serialization issue that occurred when caching SQLAlchemy objects with database session locks.
Improved cache reliability: Caching now works consistently across all endpoints without breaking the API.
Better error handling: Cache failures no longer cause API endpoints to return 500 errors.

Standardized Cache Utilities

We've implemented standardized cache utilities in app/core/cache_utils.py to ensure consistent cache key generation and invalidation across the application. These utilities include:

generate_cache_key: A function for generating standardized cache keys.
invalidate_entity_cache: A function for invalidating cache entries for an entity and its related entities.
Entity-specific helper functions like invalidate_production_run_cache, invalidate_tracker_cache, and invalidate_location_cache.

CLI Script Cache Invalidation

We've updated CLI scripts that modify data to properly invalidate the cache. For example, the import_trackers.py script now invalidates the cache after importing trackers, ensuring that newly imported trackers appear immediately without needing to flush the cache.

Background Service Cache Invalidation (August 2025)

We've implemented aggressive cache invalidation in background services to resolve stale data issues where the frontend showed outdated information while the database contained current data.

Implementation Details

Shared Cache Invalidation Utility (services/shared/cache_invalidation.py):
Centralized cache invalidation logic following DRY principles
Pattern-based invalidation for all tracker-related data
Comprehensive error handling and logging
Service Integration:
Tracker Status Service: Invalidates caches after status updates and batch processing
Unified Geofence Service: Invalidates caches after location processing and status changes
Tracker Fetcher Service: Invalidates caches after updating last_report_received timestamps
Aggressive Strategy: Prioritizes data freshness over cache performance by invalidating all tracker-related cache patterns when any tracker data changes.

Code Example

# In background services
from services.shared.cache_invalidation import invalidate_tracker_caches

# After updating tracker data in database
self.db.commit()

# Invalidate all tracker-related caches
invalidate_tracker_caches("service_name")

Benefits

Eliminates stale data issues: Frontend always shows current data
Real-time consistency: Changes appear immediately in the UI
Simplified debugging: No complex cache invalidation logic to troubleshoot
DRY implementation: Shared utility prevents code duplication

Performance Considerations

The aggressive approach trades cache performance for data consistency. Future optimizations may include:

Smart invalidation based on specific tracker IDs
Selective pattern invalidation
Cache warming strategies

API Routes Cache Management

We've updated API routes to use the standardized cache key generation and invalidation functions, along with the new safe caching helpers, ensuring consistent and reliable cache management across the application.

Modified Entity	Invalidate Patterns
Production Run	`production_run::{id}:`
	`locations:::production_run_{id}:*`
	`map_data::production_run_{id}:`
Location	`{location_type}::{id}:`
	`{location_type}:*` (all location type cache)
	`locations:::*`
	`map_data::location_{id}*`
Brand/Client	`{entity}::{id}:`
	`{entity}:list:*`
	`production_run::client_{id}:`