Skip to content

Location Reports IDENTITY Column Implementation Plan

✅ IMPLEMENTATION COMPLETED

Status: Successfully implemented on 2025-07-16 Migration: b1c2d3e4f5g6_add_identity_to_location_reports.py applied Application Code: Updated to use IDENTITY columns with improved transaction handling Testing: Comprehensive integration tests created and passing Additional Fix: Enhanced transaction atomicity to prevent orphaned location reports Critical Discovery: Disabled conflicting async geofence processing that was causing foreign key violations

Problem Analysis (RESOLVED)

The tracker fetcher service was experiencing foreign key constraint violations due to race conditions in manual ID generation:

Error: Key (location_report_id, location_report_timestamp)=(375342, 2025-07-14 15:52:02+00)
is not present in table "location_reports".

Root Cause (FIXED): The previous approach used SELECT COALESCE(MAX(id), 0) + 1 which created race conditions:

  1. Process A gets next_id = 375342
  2. Process B gets next_id = 375342 (same ID!)
  3. Process A inserts successfully
  4. Process B's insert fails due to conflict, but geofence events still try to use ID 375342
  5. Process A's transaction might rollback, leaving no record with ID 375342
  6. Geofence events fail with foreign key constraint violation

Solution Implemented: PostgreSQL IDENTITY columns provide atomic, thread-safe ID generation

Implementation Plan

Phase 1: Database Migration (PRIORITY)

File: alembic/versions/b1c2d3e4f5g6_add_identity_to_location_reports.py

-- Add IDENTITY to existing column
ALTER TABLE location_reports
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;

-- Set sequence to start after existing max ID
SELECT setval(
    pg_get_serial_sequence('location_reports', 'id'),
    COALESCE((SELECT MAX(id) FROM location_reports), 1)
);

Benefits:

  • Eliminates race conditions completely
  • Thread-safe ID generation
  • 10-50% performance improvement expected
  • Standard PostgreSQL/TimescaleDB approach

Phase 2: Application Code Updates

File: services/tracker_fetcher/service.py

Current Code (REMOVE):

# Generate next ID for the location report
next_id_query = text("SELECT COALESCE(MAX(id), 0) + 1 FROM location_reports")
next_id = db.execute(next_id_query).scalar()

# INSERT with manual ID
INSERT INTO location_reports (id, hashed_adv_key, ...)
VALUES (:id, :hashed_adv_key, ...)

New Code (IMPLEMENT):

# Let database auto-generate ID using IDENTITY sequence
INSERT INTO location_reports (hashed_adv_key, timestamp, ...)
VALUES (:hashed_adv_key, :timestamp, ...)
ON CONFLICT (hashed_adv_key, timestamp)
DO UPDATE SET confidence = EXCLUDED.confidence, ...
RETURNING id, (xmax = 0) AS inserted

Phase 3: Testing Strategy

3.1 Pre-Migration Testing

-- Verify current foreign key integrity
SELECT COUNT(*) FROM geofence_events ge
LEFT JOIN location_reports lr ON (ge.location_report_id = lr.id
                                 AND ge.location_report_timestamp = lr.timestamp)
WHERE lr.id IS NULL;
-- Should return 0 after fixing current issues

3.2 Post-Migration Testing

def test_identity_with_foreign_keys():
    # Test new location report with auto-generated ID
    result = db.execute(text("""
        INSERT INTO location_reports (hashed_adv_key, timestamp, location)
        VALUES ('test_key', NOW(), ST_Point(0, 0))
        RETURNING id, timestamp
    """))

    location_id, location_timestamp = result.fetchone()

    # Test geofence event creation with proper foreign key
    db.execute(text("""
        INSERT INTO geofence_events
        (tracker_id, location_report_id, location_report_timestamp, event_type, timestamp)
        VALUES (1, :location_id, :location_timestamp, 'ENTRY', NOW())
    """), {
        'location_id': location_id,
        'location_timestamp': location_timestamp
    })

    # Verify relationship works
    assert db.execute(text("""
        SELECT COUNT(*) FROM geofence_events ge
        JOIN location_reports lr ON (ge.location_report_id = lr.id
                                    AND ge.location_report_timestamp = lr.timestamp)
        WHERE lr.id = :location_id
    """), {'location_id': location_id}).scalar() == 1

3.3 Load Testing

  • Test concurrent inserts (simulate production load)
  • Verify no race conditions occur
  • Measure performance improvement
  • Monitor sequence contention (unlikely but possible)

Phase 4: Deployment Strategy

4.1 Migration Execution

# Run migration during maintenance window
cd /home/paulb/tracker-restapi
docker compose exec api alembic upgrade head

4.2 Application Deployment

  • Deploy updated application code
  • Monitor for any issues
  • Verify geofence events are created successfully

4.3 Monitoring

-- Monitor sequence status
SELECT schemaname, sequencename, last_value, is_called
FROM pg_sequences
WHERE sequencename LIKE '%location_reports%';

-- Monitor for foreign key violations (should be 0)
SELECT COUNT(*) FROM geofence_events ge
LEFT JOIN location_reports lr ON (ge.location_report_id = lr.id
                                 AND ge.location_report_timestamp = lr.timestamp)
WHERE lr.id IS NULL;

Phase 5: Rollback Plan (If Needed)

-- Remove IDENTITY (keeps existing IDs)
ALTER TABLE location_reports
ALTER COLUMN id DROP IDENTITY;

Then revert application code to manual ID generation.

Expected Outcomes

Performance Improvements

  • Insert latency: 10-50% reduction expected
  • Concurrency: Eliminates race condition errors completely
  • Scalability: O(1) ID generation vs O(log n) MAX() queries
  • Memory usage: Reduced (no MAX queries)

Reliability Improvements

  • Zero race conditions: Thread-safe sequence operations
  • Foreign key integrity: Guaranteed valid IDs for geofence events
  • Error reduction: Eliminates "Key not present" errors
  • Data consistency: Atomic ID generation and insertion

Maintenance Benefits

  • Standard approach: PostgreSQL/TimescaleDB best practice
  • Reduced complexity: No manual ID management in application
  • Future-proof: Scales with table growth
  • Documentation: Clear migration path for similar tables

Risk Assessment

Low Risk

  • ✅ IDENTITY columns are standard PostgreSQL feature
  • ✅ TimescaleDB fully supports IDENTITY on hypertables
  • ✅ Foreign key relationships remain unchanged
  • ✅ Existing data remains valid
  • ✅ Rollback is straightforward

Mitigation Strategies

  • Testing: Comprehensive testing in development environment
  • Monitoring: Real-time monitoring during deployment
  • Rollback: Prepared rollback scripts and procedures
  • Communication: Clear deployment timeline and expectations

Success Criteria

  1. Zero foreign key constraint violations in tracker fetcher logs
  2. Performance improvement in location report insertion
  3. Successful geofence event creation with proper foreign key references
  4. No data loss during migration
  5. Stable system operation post-deployment

Timeline

  • Phase 1: Database migration - Ready for deployment
  • Phase 2: Application code updates - Ready for deployment
  • Phase 3: Testing - Execute in development environment
  • Phase 4: Production deployment - Schedule maintenance window
  • Phase 5: Monitoring and validation - Ongoing post-deployment

Next Steps

  1. Execute migration in development environment for testing
  2. Validate that foreign key violations are eliminated
  3. Performance test under simulated production load
  4. Schedule production deployment during maintenance window
  5. Monitor system health post-deployment

This implementation follows the detailed analysis in docs/database/timescaledb-identity-columns.md and provides a complete solution to the race condition issues affecting the tracker fetcher service.