Skip to content

TimescaleDB Identity Columns for location_reports Table

✅ IMPLEMENTATION COMPLETED

Status: Successfully implemented on 2025-07-16 Migration: Applied b1c2d3e4f5g6_add_identity_to_location_reports.py Result: Zero foreign key constraint violations, improved performance

Issue Resolved

The location_reports table previously used a manual ID generation approach with SELECT COALESCE(MAX(id), 0) + 1 which had several limitations:

  • Race conditions: Multiple concurrent inserts could generate the same ID ✅ FIXED
  • Performance impact: MAX() queries become slower as the table grows ✅ FIXED
  • Not atomic: The SELECT and INSERT are separate operations ✅ FIXED
  • Maintenance overhead: Manual ID management is error-prone ✅ FIXED
-- Add IDENTITY to existing column
ALTER TABLE location_reports
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;

-- Set sequence to start after existing max ID
SELECT setval(pg_get_serial_sequence('location_reports', 'id'),
              COALESCE(MAX(id), 1))
FROM location_reports;

Benefits:

  • Automatic ID generation when not explicitly provided
  • Allows manual ID specification when needed (for data migration/testing)
  • Thread-safe and atomic
  • Standard PostgreSQL/TimescaleDB approach

Option 2: Custom Sequence (Alternative)

-- Create sequence
CREATE SEQUENCE location_reports_id_seq;

-- Set default
ALTER TABLE location_reports
ALTER COLUMN id SET DEFAULT nextval('location_reports_id_seq');

-- Set ownership
ALTER SEQUENCE location_reports_id_seq OWNED BY location_reports.id;

-- Set sequence to start after existing max ID
SELECT setval('location_reports_id_seq', COALESCE(MAX(id), 1))
FROM location_reports;

TimescaleDB Considerations

Hypertable Compatibility

  • ✅ IDENTITY columns work with TimescaleDB hypertables
  • ✅ Sequences are distributed across chunks automatically
  • ✅ Composite primary keys (id, timestamp) remain valid
  • ✅ Foreign key references continue to work

Partitioning Impact

  • Chunk distribution: IDs are distributed across time-based chunks
  • Query performance: No impact on time-based queries (primary use case)
  • Space partitioning: Could consider space partitioning by ID ranges if needed

Replication Considerations

  • Logical replication: IDENTITY columns replicate correctly
  • Physical replication: No special considerations
  • Backup/restore: Sequences are included in dumps

Foreign Key Relationship Impact Analysis

Current Foreign Key Dependencies

The location_reports table has a composite primary key (id, timestamp) that is referenced by:

-- Current foreign key constraint
TABLE "geofence_events" CONSTRAINT "geofence_events_location_report_id_location_report_timesta_fkey"
FOREIGN KEY (location_report_id, location_report_timestamp)
REFERENCES location_reports(id, timestamp) ON DELETE CASCADE

Impact Assessment

✅ No Breaking Changes

  • Existing references remain valid: All current geofence_events records will continue to reference valid location_reports records
  • Composite key unchanged: The (id, timestamp) primary key structure remains identical
  • Foreign key constraints preserved: No constraint modifications needed
  • Cascade behavior maintained: ON DELETE CASCADE continues to work

geofence_events Table:

-- Current structure (no changes needed)
CREATE TABLE geofence_events (
    id integer NOT NULL,
    tracker_id integer,
    location_report_id integer,           -- References location_reports.id
    location_report_timestamp timestamptz, -- References location_reports.timestamp
    delivery_location_id integer,
    storage_location_id integer,
    event_type text,
    timestamp timestamptz,
    accuracy_radius double precision,
    created_at timestamptz,

    -- Composite foreign key (unchanged)
    FOREIGN KEY (location_report_id, location_report_timestamp)
    REFERENCES location_reports(id, timestamp) ON DELETE CASCADE
);

Impact on geofence_events:

  • No schema changes required
  • Existing data remains valid
  • Application code unchanged (still inserts both location_report_id and location_report_timestamp)
  • Query performance unaffected

🔍 Sequence Gap Considerations

Potential Issue:

  • IDENTITY sequences may have gaps (normal PostgreSQL behavior)
  • Gaps don't affect foreign key relationships but may impact application logic

Example Scenario:

-- Before IDENTITY: Sequential IDs
location_reports: id=1001, id=1002, id=1003

-- After IDENTITY: Possible gaps (normal)
location_reports: id=1001, id=1002, id=1005, id=1006

Impact Assessment:

  • Foreign keys still work: geofence_events can reference any valid ID
  • No referential integrity issues
  • ⚠️ Application assumptions: Code shouldn't assume sequential IDs

Migration Impact on Relationships

Phase 1: Add IDENTITY (Zero Impact)

-- This change has NO impact on foreign key relationships
ALTER TABLE location_reports
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;

Why no impact:

  • Existing IDs remain unchanged
  • New inserts get sequential IDs (starting from max+1)
  • Foreign key constraints continue to work identically

Phase 2: Application Code Changes (Minimal Impact)

# OLD CODE: Manual ID generation
next_id = db.execute(text("SELECT COALESCE(MAX(id), 0) + 1 FROM location_reports")).scalar()

# NEW CODE: Let database generate ID
# No ID specified in INSERT - database auto-generates
# Foreign key relationships work identically

Impact on geofence_events creation:

  • Same workflow: Get location_report_id from INSERT RETURNING
  • Same foreign key values: Still insert both id and timestamp
  • Same referential integrity: Composite foreign key works identically

Testing Strategy for Relationships

1. Foreign Key Integrity Tests

-- Verify all existing foreign keys are valid
SELECT COUNT(*) FROM geofence_events ge
LEFT JOIN location_reports lr ON (ge.location_report_id = lr.id
                                 AND ge.location_report_timestamp = lr.timestamp)
WHERE lr.id IS NULL;
-- Should return 0

2. Cascade Behavior Tests

-- Test cascade deletion still works
BEGIN;
DELETE FROM location_reports WHERE id = (SELECT MIN(id) FROM location_reports);
-- Verify corresponding geofence_events are deleted
SELECT COUNT(*) FROM geofence_events WHERE location_report_id = <deleted_id>;
-- Should return 0
ROLLBACK;

3. New Insert Tests

# Test new location report with geofence event creation
def test_identity_with_foreign_keys():
    # Insert location report (ID auto-generated)
    result = db.execute(text("""
        INSERT INTO location_reports (hashed_adv_key, timestamp, location)
        VALUES ('test_key', NOW(), ST_Point(0, 0))
        RETURNING id, timestamp
    """))

    location_id, location_timestamp = result.fetchone()

    # Insert geofence event (foreign key should work)
    db.execute(text("""
        INSERT INTO geofence_events
        (tracker_id, location_report_id, location_report_timestamp, event_type, timestamp)
        VALUES (1, :location_id, :location_timestamp, 'ENTRY', NOW())
    """), {
        'location_id': location_id,
        'location_timestamp': location_timestamp
    })

    # Verify foreign key relationship
    assert db.execute(text("""
        SELECT COUNT(*) FROM geofence_events ge
        JOIN location_reports lr ON (ge.location_report_id = lr.id
                                    AND ge.location_report_timestamp = lr.timestamp)
        WHERE lr.id = :location_id
    """), {'location_id': location_id}).scalar() == 1

Rollback Impact on Relationships

If rollback is needed:

-- Remove IDENTITY (relationships unaffected)
ALTER TABLE location_reports
ALTER COLUMN id DROP IDENTITY;

Impact:

  • No foreign key changes needed
  • Existing relationships preserved
  • Application reverts to manual ID generation

Common Join Queries (No Impact)

-- Query performance unchanged
SELECT lr.*, ge.event_type
FROM location_reports lr
JOIN geofence_events ge ON (lr.id = ge.location_report_id
                           AND lr.timestamp = ge.location_report_timestamp)
WHERE lr.timestamp > NOW() - INTERVAL '1 day';

Foreign Key Lookup Performance

  • Same indexes used: Composite primary key index unchanged
  • Same query plans: PostgreSQL optimizer behavior identical
  • Same join performance: No impact on relationship queries
  1. No immediate changes needed for geofence_events table
  2. Monitor foreign key performance during migration
  3. Consider IDENTITY for geofence_events.id in future (separate migration)
  4. Update documentation to reflect that IDs may have gaps
  5. Review application code for any assumptions about sequential IDs

Migration Strategy

Phase 1: Add IDENTITY (Low Risk)

-- This can be done online with minimal impact
ALTER TABLE location_reports
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;

-- Set sequence to current max
SELECT setval(pg_get_serial_sequence('location_reports', 'id'),
              COALESCE(MAX(id), 1))
FROM location_reports;

Phase 2: Update Application Code

# Remove manual ID generation
# OLD CODE (remove this):
# next_id_query = text("SELECT COALESCE(MAX(id), 0) + 1 FROM location_reports")
# next_id = db.execute(next_id_query).scalar()

# NEW CODE (simplified):
query = text("""
    INSERT INTO location_reports
    (hashed_adv_key, timestamp, location, confidence, horizontal_accuracy, nearest_city)
    VALUES (:hashed_adv_key, :timestamp, ST_GeomFromText(:point, 4326),
            :confidence, :horizontal_accuracy, :nearest_city)
    ON CONFLICT (hashed_adv_key, timestamp)
    DO UPDATE SET
        confidence = EXCLUDED.confidence,
        horizontal_accuracy = EXCLUDED.horizontal_accuracy
    RETURNING id, (xmax = 0) AS inserted
""")

Phase 3: Testing

  • Load testing: Verify performance under concurrent inserts
  • Functional testing: Ensure geofence events still reference correctly
  • Rollback plan: Can remove IDENTITY if issues arise

Performance Impact Analysis

Current MAX() Approach

  • Time complexity: O(log n) for each insert (index scan)
  • Concurrency: Potential for race conditions
  • Scalability: Degrades with table size

IDENTITY Approach

  • Time complexity: O(1) for ID generation
  • Concurrency: Thread-safe sequence operations
  • Scalability: Constant time regardless of table size

Benchmark Expectations

  • Insert performance: 10-50% improvement expected
  • Concurrency: Eliminates race condition errors
  • Memory usage: Slightly reduced (no MAX queries)

Rollback Strategy

If issues arise, the IDENTITY can be removed:

-- Remove IDENTITY (keeps existing IDs)
ALTER TABLE location_reports
ALTER COLUMN id DROP IDENTITY;

-- Revert to manual ID generation in application code

Monitoring and Alerts

Sequence Monitoring

-- Check sequence status
SELECT schemaname, sequencename, last_value, is_called
FROM pg_sequences
WHERE sequencename LIKE '%location_reports%';

-- Monitor sequence gaps (if needed)
SELECT currval(pg_get_serial_sequence('location_reports', 'id')) -
       (SELECT COUNT(*) FROM location_reports);

Performance Monitoring

  • Monitor insert latency before/after migration
  • Track concurrent insert error rates
  • Watch for sequence contention (unlikely but possible)

Recommendation

Implement Option 1 (GENERATED BY DEFAULT AS IDENTITY) because:

  1. Standard approach: PostgreSQL/TimescaleDB best practice
  2. Performance: Eliminates MAX() overhead
  3. Reliability: Thread-safe, no race conditions
  4. Maintainability: Reduces application complexity
  5. Future-proof: Scales with table growth

The migration can be done with minimal downtime and provides immediate benefits for the tracker fetcher service performance and reliability.