Location Aggregation Fix
Problem Description
The tracker geographic data aggregation system was averaging positional data from duplicate devices, causing incorrect location calculations. For example:
- Device A reports from Milton Keynes, UK
- Device B (duplicate) reports from China
- System calculates average position → Romania (incorrect)
Additionally, the system was including manufacturing locations in China, which are not relevant for tracking USA and EU customers.
Root Cause
The TimescaleDB continuous aggregates were using ST_Centroid(ST_Union(...)) to combine location data, which:
- Averages all locations for trackers with the same
hashed_advertisement_key - Includes China manufacturing locations in the aggregation
- Creates geographically impossible results when duplicate trackers report from distant locations
Solution
1. Use Most Recent Location Instead of Averaging
- Replaced
ST_Centroid(ST_Union(...))with(array_agg(location ORDER BY timestamp DESC))[1] - This selects the most recent location report instead of averaging all locations
2. Exclude China Geofence
- Added geographic filter to exclude China (73.6°E-134.8°E longitude, 18.2°N-53.5°N latitude)
- Prevents manufacturing locations from affecting customer tracking
Files Modified
scripts/create_timescaledb_views.sql- Updated for new installations with geography casting fixesscripts/apply_location_aggregation_fix.sql- Migration script to apply fixes with proper error handlingscripts/refresh_location_aggregates.sql- Script to refresh aggregates after applying fixscripts/diagnose_duplicate_trackers.sql- Diagnostic queries with geography casting fixesalembic/versions/170085d95316_fix_location_aggregation_averaging.py- Alembic migration
Implementation Steps
Step 1: Diagnose Current State
# Connect to your database and run diagnostic queries
psql -d your_database -f scripts/diagnose_duplicate_trackers.sql
This will show:
- Number of duplicate trackers
- Geographic spread of duplicates
- Impact of China exclusion
- Current vs proposed aggregation comparison
Step 2: Run Database Migration
# Run the Alembic migration (this creates a reminder function)
python -m alembic upgrade head
Step 3: Apply the TimescaleDB Fix
# IMPORTANT: Run during maintenance window as this drops/recreates views
# This step must be done manually as TimescaleDB operations cannot run in transactions
psql -d your_database -f scripts/apply_location_aggregation_fix.sql
Step 4: Refresh the Aggregates
# Complete the fix by refreshing the continuous aggregates
# This step is separate because refresh operations cannot run in transaction blocks
psql -d your_database -f scripts/refresh_location_aggregates.sql
Step 5: Verify the Fix
# Run diagnostics again to confirm the fix
psql -d your_database -f scripts/diagnose_duplicate_trackers.sql
Expected Results
After applying the fix:
- Duplicate trackers will show the most recent location instead of averaged position
- China locations will be excluded from aggregation
- Geographic accuracy will be significantly improved
- No more impossible locations like Romania when devices are in UK and China
Technical Details
Before (Problematic)
ST_Centroid(ST_Union(lr.location::geometry)) AS location
After (Fixed)
(array_agg(lr.location ORDER BY lr.timestamp DESC))[1] AS location
WHERE lr.confidence >= 2
AND NOT (
ST_X(lr.location::geometry) BETWEEN 73.6 AND 134.8
AND ST_Y(lr.location::geometry) BETWEEN 18.2 AND 53.5
)
China Exclusion Bounds
- Longitude: 73.6°E to 134.8°E
- Latitude: 18.2°N to 53.5°N
- Rationale: Covers mainland China manufacturing regions
Monitoring
After deployment, monitor:
- Location accuracy - Verify trackers show realistic positions
- Duplicate detection - Run diagnostic queries periodically
- Performance - Ensure aggregation performance remains acceptable
Rollback Plan
If issues arise, you can rollback by:
- Running the original
scripts/recreate_continuous_aggregates.sql - This will restore the previous averaging behavior
Future Improvements
Consider implementing:
- Duplicate tracker cleanup - Remove or merge duplicate tracker records
- Configurable geofences - Make exclusion zones configurable
- Data quality monitoring - Automated detection of geographic anomalies