Skip to content

Location Aggregation Fix

Problem Description

The tracker geographic data aggregation system was averaging positional data from duplicate devices, causing incorrect location calculations. For example:

  • Device A reports from Milton Keynes, UK
  • Device B (duplicate) reports from China
  • System calculates average position → Romania (incorrect)

Additionally, the system was including manufacturing locations in China, which are not relevant for tracking USA and EU customers.

Root Cause

The TimescaleDB continuous aggregates were using ST_Centroid(ST_Union(...)) to combine location data, which:

  1. Averages all locations for trackers with the same hashed_advertisement_key
  2. Includes China manufacturing locations in the aggregation
  3. Creates geographically impossible results when duplicate trackers report from distant locations

Solution

1. Use Most Recent Location Instead of Averaging

  • Replaced ST_Centroid(ST_Union(...)) with (array_agg(location ORDER BY timestamp DESC))[1]
  • This selects the most recent location report instead of averaging all locations

2. Exclude China Geofence

  • Added geographic filter to exclude China (73.6°E-134.8°E longitude, 18.2°N-53.5°N latitude)
  • Prevents manufacturing locations from affecting customer tracking

Files Modified

  1. scripts/create_timescaledb_views.sql - Updated for new installations with geography casting fixes
  2. scripts/apply_location_aggregation_fix.sql - Migration script to apply fixes with proper error handling
  3. scripts/refresh_location_aggregates.sql - Script to refresh aggregates after applying fix
  4. scripts/diagnose_duplicate_trackers.sql - Diagnostic queries with geography casting fixes
  5. alembic/versions/170085d95316_fix_location_aggregation_averaging.py - Alembic migration

Implementation Steps

Step 1: Diagnose Current State

# Connect to your database and run diagnostic queries
psql -d your_database -f scripts/diagnose_duplicate_trackers.sql

This will show:

  • Number of duplicate trackers
  • Geographic spread of duplicates
  • Impact of China exclusion
  • Current vs proposed aggregation comparison

Step 2: Run Database Migration

# Run the Alembic migration (this creates a reminder function)
python -m alembic upgrade head

Step 3: Apply the TimescaleDB Fix

# IMPORTANT: Run during maintenance window as this drops/recreates views
# This step must be done manually as TimescaleDB operations cannot run in transactions
psql -d your_database -f scripts/apply_location_aggregation_fix.sql

Step 4: Refresh the Aggregates

# Complete the fix by refreshing the continuous aggregates
# This step is separate because refresh operations cannot run in transaction blocks
psql -d your_database -f scripts/refresh_location_aggregates.sql

Step 5: Verify the Fix

# Run diagnostics again to confirm the fix
psql -d your_database -f scripts/diagnose_duplicate_trackers.sql

Expected Results

After applying the fix:

  • Duplicate trackers will show the most recent location instead of averaged position
  • China locations will be excluded from aggregation
  • Geographic accuracy will be significantly improved
  • No more impossible locations like Romania when devices are in UK and China

Technical Details

Before (Problematic)

ST_Centroid(ST_Union(lr.location::geometry)) AS location

After (Fixed)

(array_agg(lr.location ORDER BY lr.timestamp DESC))[1] AS location
WHERE lr.confidence >= 2
  AND NOT (
    ST_X(lr.location::geometry) BETWEEN 73.6 AND 134.8
    AND ST_Y(lr.location::geometry) BETWEEN 18.2 AND 53.5
  )

China Exclusion Bounds

  • Longitude: 73.6°E to 134.8°E
  • Latitude: 18.2°N to 53.5°N
  • Rationale: Covers mainland China manufacturing regions

Monitoring

After deployment, monitor:

  1. Location accuracy - Verify trackers show realistic positions
  2. Duplicate detection - Run diagnostic queries periodically
  3. Performance - Ensure aggregation performance remains acceptable

Rollback Plan

If issues arise, you can rollback by:

  1. Running the original scripts/recreate_continuous_aggregates.sql
  2. This will restore the previous averaging behavior

Future Improvements

Consider implementing:

  1. Duplicate tracker cleanup - Remove or merge duplicate tracker records
  2. Configurable geofences - Make exclusion zones configurable
  3. Data quality monitoring - Automated detection of geographic anomalies