Caching System
This guide explains the caching system used in the Tracker API, including how it works, best practices, and troubleshooting common issues.
Overview
The Tracker API uses Redis for caching API responses to improve performance and reduce database load. The caching system is implemented in two layers:
- Backend (Redis) Caching: Server-side caching of API responses using Redis
- Frontend (React Query) Caching: Client-side caching in the Admin Panel and Frontend using React Query
This guide focuses on the backend Redis caching system.
Caching Strategy
CRUD-Based Cache Invalidation - Strategy
Current Implementation: The system uses a CRUD-based cache invalidation strategy instead of TTL-based expiration. Caches are invalidated immediately when data is created, updated, or deleted, ensuring real-time data consistency.
Key Principles
- Event-Driven Invalidation: Caches are invalidated based on data modification events, not time-based expiration
- Immediate Consistency: Changes appear in the frontend immediately after database updates
- Pattern-Based Clearing: Related cache patterns are invalidated together to maintain data consistency
- No TTL Dependency: Cache entries remain valid until explicitly invalidated by CRUD operations
Aggressive Cache Invalidation Strategy
Current Implementation: The system uses an aggressive cache invalidation strategy to ensure data freshness, prioritizing real-time data consistency over cache performance.
Background Service Cache Invalidation
Background services that update tracker data automatically invalidate all tracker-related caches to prevent stale data issues:
| Service | Triggers Cache Invalidation When | Invalidated Patterns |
|---|---|---|
| Tracker Status Service | Updates tracker status or creates status history | tracker:* |
| Processes batch status updates | production_run:* |
|
locations:* |
||
map_data:* |
||
| Unified Geofence Service | Processes location reports and updates status | tracker:* |
| Creates geofence events | production_run:* |
|
| Batch processes location reports | locations:* |
|
map_data:* |
||
| Tracker Fetcher Service | Updates last_report_received timestamps |
tracker:* |
| Stores new location reports | production_run:* |
|
locations:* |
||
map_data:* |
Shared Cache Invalidation Utility
All background services use the shared cache invalidation utility (services/shared/cache_invalidation.py):
from services.shared.cache_invalidation import invalidate_tracker_caches
# Called after any tracker data update
invalidate_tracker_caches("service_name")
This utility invalidates all tracker-related cache patterns:
tracker:*- All tracker data cachesproduction_run:*- Production run caches (contain tracker data)locations:*- Location data cachesmap_data:*- Map visualization caches
API Route Cache Invalidation Matrix
When entity X is modified via API routes, invalidate cache patterns for X and dependent entities:
| Modified Entity | Invalidate Patterns |
|---|---|
| Production Run | production_run:*:{id}:* |
locations:*:*:production_run_{id}:* |
|
map_data:*:production_run_{id}:* |
|
| Location | {location_type}:*:{id}:* |
{location_type}:* (all location type cache) |
|
locations:*:*:* |
|
map_data:*:*location_{id}* |
|
| Brand/Client | {entity}:*:{id}:* |
{entity}:list:* |
|
production_run:*:client_{id}:* |
Frontend-Backend Alignment
- Event-driven invalidation - backend CRUD operations trigger immediate frontend cache clearing
- No TTL dependency - frontend caches remain valid until backend invalidation events
- Consistent cache keys - frontend query keys must match backend cache key patterns
- Mutation invalidation - all mutations must invalidate relevant frontend queries immediately
Redis Cache Implementation
Architecture
The Redis caching system consists of the following components:
- RedisClient: A wrapper around the Redis client library that handles connection management and provides basic operations like get, set, delete, etc.
- CacheManager: A generic cache manager for Pydantic models that handles serialization/deserialization and provides higher-level caching operations.
- Cache Utilities: Standardized utilities for cache key generation and invalidation.
Cache Key Generation
Cache keys are generated using a standardized format to ensure consistency across different parts of the application. The format is:
{entity_type}:id:{entity_id}:user:{user_id}:admin:{is_admin}:{additional_parameters}
For example:
production_runs:user:1:admin:true:skip:0:limit:10production_run:id:123:user:1:admin:falseproduction_run_trackers:id:123:user:1:admin:true:skip:0:limit:10
The standardized cache key generation is implemented in the generate_cache_key function in app/core/cache_utils.py.
CRUD-Based Cache Invalidation
Cache invalidation is performed immediately when data is modified through CRUD operations (created, updated, or deleted). The cache invalidation system is designed to invalidate not only the specific entity that was modified but also related entities that might be affected by the change.
All cache invalidation is triggered by CRUD operations, not TTL expiration:
| CRUD Operation | Trigger | Invalidated Patterns |
|---|---|---|
| CREATE | New entity created via API or background service | Entity-specific patterns + related patterns |
| UPDATE | Entity modified via API or background service | Entity-specific patterns + related patterns |
| DELETE | Entity removed via API or background service | Entity-specific patterns + related patterns |
For example, when a production run is updated:
- The specific production run's cache is invalidated immediately
- The list of production runs cache is invalidated immediately
- The trackers associated with the production run cache is invalidated immediately
The standardized cache invalidation is implemented in the invalidate_entity_cache function and entity-specific helper functions in app/core/cache_utils.py.
Using the Caching System
Safe Caching with SQLAlchemy Objects
Important: When caching SQLAlchemy objects, you must use the safe caching helpers to avoid serialization errors with database sessions and locks.
Safe Caching Helpers
The application provides safe caching helpers in app/core/cache_helpers.py:
prepare_for_cache(): Converts SQLAlchemy objects to Pydantic schemassafe_cache_set(): Safely caches data with proper serializationcreate_paginated_cache_data(): Creates paginated response data for cachingsqlalchemy_to_dict(): Converts SQLAlchemy objects to dictionaries
In API Routes
To use the caching system in API routes:
- Import the cache utilities and safe helpers:
from app.core.cache_utils import generate_cache_key, invalidate_production_run_cache
from app.core.cache_helpers import create_paginated_cache_data, prepare_for_cache
- Generate a cache key for GET requests:
# Extract user ID as an integer
user_id = int(current_user.id)
cache_key = generate_cache_key(
entity_type="production_run",
entity_id=production_run_id,
user_id=user_id,
is_admin=crud.user.is_admin(current_user),
)
- Try to get data from cache first:
try:
cached_data = cache_manager.get(cache_key, request)
if cached_data:
# Add performance headers
request.state.query_time = round((time.time() - start_time) * 1000, 2)
request.state.cache_status = "hit"
request.state.query_count = 0
return cached_data
except Exception as e:
logger.warning(f"Cache get failed: {str(e)}")
# Continue without cache
- If not in cache, get from database and cache the result safely:
# Get data from database (SQLAlchemy objects)
sqlalchemy_objects = get_data_from_database()
# For single objects - convert to dict first, then cache
if isinstance(sqlalchemy_objects, single_object):
obj_dict = {c.name: getattr(sqlalchemy_objects, c.name)
for c in sqlalchemy_objects.__table__.columns}
# Process any special fields (like image URLs)
if obj_dict.get("image_url"):
obj_dict["image_url"] = get_full_image_url(obj_dict["image_url"])
# Convert to Pydantic schema for response
response_data = schemas.YourSchema.model_validate(obj_dict)
# Cache the dict, not the SQLAlchemy object
try:
prepared_data = prepare_for_cache(obj_dict, schemas.YourSchema)
cache_manager.set(cache_key, prepared_data)
except Exception as e:
logger.error(f"Error caching data: {str(e)}")
# For paginated lists - use the helper function
elif isinstance(sqlalchemy_objects, list):
# Convert SQLAlchemy objects to dicts
object_dicts = []
for obj in sqlalchemy_objects:
obj_dict = {c.name: getattr(obj, c.name) for c in obj.__table__.columns}
object_dicts.append(obj_dict)
# Create paginated response
response_data = create_paginated_response(object_dicts, total, page, limit, pages)
# Cache using safe helper
try:
cache_data = create_paginated_cache_data(
object_dicts, # Use dicts, not SQLAlchemy objects
total_count,
current_page,
limit,
total_pages,
schemas.YourSchema,
process_image_urls=True
)
if cache_data:
cache_manager.set(cache_key, cache_data)
except Exception as e:
logger.error(f"Error caching paginated data: {str(e)}")
return response_data
- Invalidate cache when data is modified:
# Update data in database
updated_data = update_data_in_database()
# Invalidate cache
invalidate_production_run_cache(production_run_id)
return updated_data
Common Serialization Issues and Solutions
Problem: "cannot pickle '_thread.RLock' object"
This error occurs when trying to cache SQLAlchemy objects that contain database session locks.
Solution: Always convert SQLAlchemy objects to dictionaries or Pydantic schemas before caching:
# ❌ DON'T: Cache SQLAlchemy objects directly
cache_manager.set(cache_key, sqlalchemy_object)
# ✅ DO: Convert to dict first
obj_dict = {c.name: getattr(sqlalchemy_object, c.name)
for c in sqlalchemy_object.__table__.columns}
prepared_data = prepare_for_cache(obj_dict, schemas.YourSchema)
cache_manager.set(cache_key, prepared_data)
Problem: Geographic/Spatial Data Serialization
Geographic data (WKBElement, Point, Polygon) can cause serialization issues.
Solution: The safe caching helpers automatically skip non-serializable geographic data:
# The helpers automatically handle this
prepared_data = prepare_for_cache(data_with_geo_fields, schemas.YourSchema)
Problem: Relationship Objects
SQLAlchemy relationship objects contain references to database sessions.
Solution: Use exclude_relations=True (default) in sqlalchemy_to_dict():
# Automatically excludes relationship objects
obj_dict = sqlalchemy_to_dict(sqlalchemy_object, exclude_relations=True)
In CLI Scripts
CLI scripts that modify data should also invalidate the cache to ensure consistency. For example:
from app.core.cache_utils import invalidate_production_run_cache
# Import trackers from CSV
import_trackers_from_csv(csv_path, production_run_id)
# Invalidate cache
invalidate_production_run_cache(production_run_id)
Best Practices
-
Use Standardized Cache Keys: Always use the
generate_cache_keyfunction to generate cache keys to ensure consistency. -
Invalidate Related Caches: When modifying data, invalidate not only the specific entity's cache but also related entities that might be affected by the change.
-
Handle Cache Invalidation Errors Gracefully: Cache invalidation should not prevent the operation from completing. If cache invalidation fails, log the error and continue.
-
CRUD-Based Invalidation: Ensure all CRUD operations (Create, Update, Delete) trigger appropriate cache invalidation immediately after database commits.
-
Monitor Cache Hit Rate: Monitor the cache hit rate to ensure the caching system is effective. A low hit rate might indicate issues with cache key generation or too aggressive invalidation.
-
Background Service Integration: Ensure background services that modify data use the shared cache invalidation utilities to maintain consistency.
Troubleshooting
Stale Data
Note: As of August 2025, the system implements aggressive cache invalidation in background services to prevent stale data issues. If you're still seeing stale data, it could be due to:
- Missing Cache Invalidation in API Routes: Ensure that API route cache invalidation is performed when data is modified via the web interface.
- Incorrect Cache Keys: Ensure that cache keys are generated consistently.
- CLI Scripts Bypassing Cache Invalidation: Ensure that CLI scripts that modify data also invalidate the cache.
- Background Service Issues: Check that background services are running and processing data correctly.
To fix stale data issues:
- Check background service logs for cache invalidation messages:
# Look for cache invalidation log entries
docker logs tracker-api | grep "invalidated.*cache keys"
- Flush the Redis cache as a temporary fix:
redis-cli FLUSHALL
- Verify background services are running:
# Check if services are processing data
docker logs tracker-api | grep -E "(tracker_status_service|unified_geofence_service|tracker_fetcher_service)"
-
Check the cache invalidation code in the relevant API routes and CLI scripts.
-
Ensure that the
invalidate_entity_cachefunction is called with the correct parameters.
Background Service Cache Invalidation Debugging
If background services are not invalidating caches properly:
- Check service logs for cache invalidation calls:
# Look for specific service cache invalidation
docker logs tracker-api | grep "invalidated.*cache keys"
- Verify shared cache invalidation utility is being imported correctly in services:
from services.shared.cache_invalidation import invalidate_tracker_caches
- Check Redis connectivity from background services - cache invalidation failures should be logged as errors.
High Cache Miss Rate
If you're seeing a high cache miss rate, it could be due to:
- Inconsistent Cache Keys: Ensure that cache keys are generated consistently.
- Too Aggressive Cache Invalidation: The current CRUD-based strategy prioritizes data freshness over cache performance. This is expected behavior.
- Frequent Data Modifications: High cache miss rates are normal when data is frequently updated, as caches are invalidated immediately on CRUD operations.
Redis Connection Issues
If you're experiencing Redis connection issues:
- Check the Redis connection parameters in the
.envfile. - Ensure that Redis is running and accessible from the API server.
- Check the Redis logs for any errors.
- Verify that the Redis password is correct.
Recent Improvements
Redis Cluster Support and Cache Invalidation Fix (October 2025)
We've implemented comprehensive fixes for Redis Cluster mode to resolve cache invalidation issues that were preventing proper cache clearing across the application.
Problem
The application experienced cache invalidation failures in Redis Cluster mode due to two critical issues:
- Cluster Node Scanning Failure: The
keys()method couldn't properly iterate over cluster primary nodes, resulting in "No targets were found to execute SCAN command" errors. - Hash Tag Pattern Mismatch: Cache keys stored with hash tags (e.g.,
{production_run:list:...}) weren't matched by invalidation patterns (e.g.,production_run:*).
Impact: Cache invalidation failed silently, requiring manual FLUSHALL to clear stale data after creating, updating, or deleting entities like production runs, storage locations, and brands.
Solution
1. Enhanced Redis Cluster Node Scanning (app/core/redis.py)
- Improved
_get_primary_nodes()method to properly detect cluster nodes using redis-py 6.1+ API - Enhanced
keys()method with robust error handling for per-node scanning - Added comprehensive logging for cluster operations
def _get_primary_nodes(self) -> List[Redis]:
"""Get all primary nodes from the Redis cluster."""
if not self.is_cluster or self.client is None:
return []
try:
# For redis-py 6.1+, use get_nodes() and filter for primaries
if hasattr(self.client, "get_nodes"):
nodes = self.client.get_nodes()
primary_nodes = []
for node in nodes:
if hasattr(node, "redis_connection"):
primary_nodes.append(node.redis_connection)
return primary_nodes
# Fallback to older API
if hasattr(self.client, "get_primaries"):
return self.client.get_primaries()
logger.warning("Cannot get cluster nodes - no suitable API found")
return []
except Exception as e:
logger.error(f"Error getting primary nodes: {e}")
return []
2. Automatic Hash Tag Pattern Handling (app/core/cache_utils.py)
- Added
generate_invalidation_pattern()function to automatically transform patterns for cluster mode - Updated
invalidate_cache_by_patterns()to use pattern transformation - Added comprehensive debug logging
def generate_invalidation_pattern(pattern: str, is_cluster: Optional[bool] = None) -> str:
"""
Generate cache invalidation pattern for both standalone and cluster modes.
In cluster mode, CacheManager wraps keys with hash tags like {key},
so we need to match those hash-tagged keys.
"""
if is_cluster is None:
is_cluster = redis_client.is_cluster
if is_cluster:
# In cluster mode, keys are wrapped with hash tags: {pattern}
return f"{{{pattern}}}"
return pattern
Cache Invalidation Benefits
- App-Wide Fix: All cache invalidation now works correctly for all entity types (production runs, brands, clients, trackers, storage locations, delivery locations)
- Automatic Handling: Pattern transformation happens automatically - no changes needed in individual routes
- Backward Compatible: Works correctly in both standalone and cluster Redis modes
- Better Logging: Comprehensive debug logging helps troubleshoot cache issues
Testing
After implementation, verify cache invalidation works by:
- Creating a production run → should appear immediately in list
- Updating a production run → changes should appear immediately
- Deleting a production run → should disappear immediately
- Adding/editing storage locations → should work without errors
- No need to manually run
FLUSHALLto see changes
Technical Details
Affected Functions (all now work correctly in cluster mode):
invalidate_production_run_cache()- production runsinvalidate_brand_cache()- brandsinvalidate_client_cache()- clientsinvalidate_tracker_cache()- trackersinvalidate_location_cache()- storage/delivery locations- All smart invalidation functions
Logs to Monitor:
DEBUG: Invalidating pattern: production_run:* -> {production_run:*}
DEBUG: Scanning 3 primary nodes for pattern: {production_run:*}
DEBUG: Found 5 keys on node 192.168.1.10:6379
DEBUG: Found 12 keys matching pattern {production_run:*}
INFO: Total keys invalidated: 12
Safe Caching Implementation (2025)
We've implemented a comprehensive safe caching system to resolve serialization issues with SQLAlchemy objects:
New Safe Caching Helpers (app/core/cache_helpers.py)
-
sqlalchemy_to_dict(): Safely converts SQLAlchemy objects to dictionaries, excluding non-serializable attributes like database sessions and relationship objects. -
prepare_for_cache(): Converts SQLAlchemy objects to Pydantic schemas before caching, with automatic handling of image URLs and other special fields. -
safe_cache_set(): Wrapper function for safe caching with comprehensive error handling. -
create_paginated_cache_data(): Creates paginated response data suitable for caching, handling lists of SQLAlchemy objects safely.
Enhanced Redis Cache Manager
The Redis cache manager has been improved with:
- Advanced serialization detection: Automatically detects and skips non-serializable objects like thread locks, database sessions, and SQLAlchemy relationship objects.
- Comprehensive error handling: Graceful handling of serialization failures without breaking the API.
- Geographic data support: Automatic handling of spatial data types that can cause serialization issues.
Performance Middleware Improvements
The performance middleware has been enhanced to:
- Safe JSON response modification: Properly handles responses containing non-serializable objects.
- Fallback mechanisms: Continues operation even when performance metrics injection fails.
- Better error logging: Improved error reporting without exposing sensitive data.
Resolved Issues
- Fixed "cannot pickle '_thread.RLock' object" error: The primary serialization issue that occurred when caching SQLAlchemy objects with database session locks.
- Improved cache reliability: Caching now works consistently across all endpoints without breaking the API.
- Better error handling: Cache failures no longer cause API endpoints to return 500 errors.
Standardized Cache Utilities
We've implemented standardized cache utilities in app/core/cache_utils.py to ensure consistent cache key generation and invalidation across the application. These utilities include:
generate_cache_key: A function for generating standardized cache keys.invalidate_entity_cache: A function for invalidating cache entries for an entity and its related entities.- Entity-specific helper functions like
invalidate_production_run_cache,invalidate_tracker_cache, andinvalidate_location_cache.
CLI Script Cache Invalidation
We've updated CLI scripts that modify data to properly invalidate the cache. For example, the import_trackers.py script now invalidates the cache after importing trackers, ensuring that newly imported trackers appear immediately without needing to flush the cache.
Background Service Cache Invalidation (August 2025)
We've implemented aggressive cache invalidation in background services to resolve stale data issues where the frontend showed outdated information while the database contained current data.
Implementation Details
- Shared Cache Invalidation Utility (
services/shared/cache_invalidation.py): - Centralized cache invalidation logic following DRY principles
- Pattern-based invalidation for all tracker-related data
-
Comprehensive error handling and logging
-
Service Integration:
- Tracker Status Service: Invalidates caches after status updates and batch processing
- Unified Geofence Service: Invalidates caches after location processing and status changes
-
Tracker Fetcher Service: Invalidates caches after updating
last_report_receivedtimestamps -
Aggressive Strategy: Prioritizes data freshness over cache performance by invalidating all tracker-related cache patterns when any tracker data changes.
Code Example
# In background services
from services.shared.cache_invalidation import invalidate_tracker_caches
# After updating tracker data in database
self.db.commit()
# Invalidate all tracker-related caches
invalidate_tracker_caches("service_name")
Benefits
- Eliminates stale data issues: Frontend always shows current data
- Real-time consistency: Changes appear immediately in the UI
- Simplified debugging: No complex cache invalidation logic to troubleshoot
- DRY implementation: Shared utility prevents code duplication
Performance Considerations
The aggressive approach trades cache performance for data consistency. Future optimizations may include:
- Smart invalidation based on specific tracker IDs
- Selective pattern invalidation
- Cache warming strategies
API Routes Cache Management
We've updated API routes to use the standardized cache key generation and invalidation functions, along with the new safe caching helpers, ensuring consistent and reliable cache management across the application.