TimescaleDB Identity Columns for location_reports Table
✅ IMPLEMENTATION COMPLETED
Status: Successfully implemented on 2025-07-16
Migration: Applied b1c2d3e4f5g6_add_identity_to_location_reports.py
Result: Zero foreign key constraint violations, improved performance
Issue Resolved
The location_reports table previously used a manual ID generation approach with SELECT COALESCE(MAX(id), 0) + 1 which had several limitations:
- Race conditions: Multiple concurrent inserts could generate the same ID ✅ FIXED
- Performance impact: MAX() queries become slower as the table grows ✅ FIXED
- Not atomic: The SELECT and INSERT are separate operations ✅ FIXED
- Maintenance overhead: Manual ID management is error-prone ✅ FIXED
Recommended Solution: IDENTITY Columns
Option 1: GENERATED BY DEFAULT AS IDENTITY (Recommended)
-- Add IDENTITY to existing column
ALTER TABLE location_reports
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;
-- Set sequence to start after existing max ID
SELECT setval(pg_get_serial_sequence('location_reports', 'id'),
COALESCE(MAX(id), 1))
FROM location_reports;
Benefits:
- Automatic ID generation when not explicitly provided
- Allows manual ID specification when needed (for data migration/testing)
- Thread-safe and atomic
- Standard PostgreSQL/TimescaleDB approach
Option 2: Custom Sequence (Alternative)
-- Create sequence
CREATE SEQUENCE location_reports_id_seq;
-- Set default
ALTER TABLE location_reports
ALTER COLUMN id SET DEFAULT nextval('location_reports_id_seq');
-- Set ownership
ALTER SEQUENCE location_reports_id_seq OWNED BY location_reports.id;
-- Set sequence to start after existing max ID
SELECT setval('location_reports_id_seq', COALESCE(MAX(id), 1))
FROM location_reports;
TimescaleDB Considerations
Hypertable Compatibility
- ✅ IDENTITY columns work with TimescaleDB hypertables
- ✅ Sequences are distributed across chunks automatically
- ✅ Composite primary keys
(id, timestamp)remain valid - ✅ Foreign key references continue to work
Partitioning Impact
- Chunk distribution: IDs are distributed across time-based chunks
- Query performance: No impact on time-based queries (primary use case)
- Space partitioning: Could consider space partitioning by ID ranges if needed
Replication Considerations
- Logical replication: IDENTITY columns replicate correctly
- Physical replication: No special considerations
- Backup/restore: Sequences are included in dumps
Foreign Key Relationship Impact Analysis
Current Foreign Key Dependencies
The location_reports table has a composite primary key (id, timestamp) that is referenced by:
-- Current foreign key constraint
TABLE "geofence_events" CONSTRAINT "geofence_events_location_report_id_location_report_timesta_fkey"
FOREIGN KEY (location_report_id, location_report_timestamp)
REFERENCES location_reports(id, timestamp) ON DELETE CASCADE
Impact Assessment
✅ No Breaking Changes
- Existing references remain valid: All current
geofence_eventsrecords will continue to reference validlocation_reportsrecords - Composite key unchanged: The
(id, timestamp)primary key structure remains identical - Foreign key constraints preserved: No constraint modifications needed
- Cascade behavior maintained:
ON DELETE CASCADEcontinues to work
⚠️ Considerations for Related Tables
geofence_events Table:
-- Current structure (no changes needed)
CREATE TABLE geofence_events (
id integer NOT NULL,
tracker_id integer,
location_report_id integer, -- References location_reports.id
location_report_timestamp timestamptz, -- References location_reports.timestamp
delivery_location_id integer,
storage_location_id integer,
event_type text,
timestamp timestamptz,
accuracy_radius double precision,
created_at timestamptz,
-- Composite foreign key (unchanged)
FOREIGN KEY (location_report_id, location_report_timestamp)
REFERENCES location_reports(id, timestamp) ON DELETE CASCADE
);
Impact on geofence_events:
- ✅ No schema changes required
- ✅ Existing data remains valid
- ✅ Application code unchanged (still inserts both
location_report_idandlocation_report_timestamp) - ✅ Query performance unaffected
🔍 Sequence Gap Considerations
Potential Issue:
- IDENTITY sequences may have gaps (normal PostgreSQL behavior)
- Gaps don't affect foreign key relationships but may impact application logic
Example Scenario:
-- Before IDENTITY: Sequential IDs
location_reports: id=1001, id=1002, id=1003
-- After IDENTITY: Possible gaps (normal)
location_reports: id=1001, id=1002, id=1005, id=1006
Impact Assessment:
- ✅ Foreign keys still work:
geofence_eventscan reference any valid ID - ✅ No referential integrity issues
- ⚠️ Application assumptions: Code shouldn't assume sequential IDs
Migration Impact on Relationships
Phase 1: Add IDENTITY (Zero Impact)
-- This change has NO impact on foreign key relationships
ALTER TABLE location_reports
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;
Why no impact:
- Existing IDs remain unchanged
- New inserts get sequential IDs (starting from max+1)
- Foreign key constraints continue to work identically
Phase 2: Application Code Changes (Minimal Impact)
# OLD CODE: Manual ID generation
next_id = db.execute(text("SELECT COALESCE(MAX(id), 0) + 1 FROM location_reports")).scalar()
# NEW CODE: Let database generate ID
# No ID specified in INSERT - database auto-generates
# Foreign key relationships work identically
Impact on geofence_events creation:
- ✅ Same workflow: Get
location_report_idfrom INSERT RETURNING - ✅ Same foreign key values: Still insert both
idandtimestamp - ✅ Same referential integrity: Composite foreign key works identically
Testing Strategy for Relationships
1. Foreign Key Integrity Tests
-- Verify all existing foreign keys are valid
SELECT COUNT(*) FROM geofence_events ge
LEFT JOIN location_reports lr ON (ge.location_report_id = lr.id
AND ge.location_report_timestamp = lr.timestamp)
WHERE lr.id IS NULL;
-- Should return 0
2. Cascade Behavior Tests
-- Test cascade deletion still works
BEGIN;
DELETE FROM location_reports WHERE id = (SELECT MIN(id) FROM location_reports);
-- Verify corresponding geofence_events are deleted
SELECT COUNT(*) FROM geofence_events WHERE location_report_id = <deleted_id>;
-- Should return 0
ROLLBACK;
3. New Insert Tests
# Test new location report with geofence event creation
def test_identity_with_foreign_keys():
# Insert location report (ID auto-generated)
result = db.execute(text("""
INSERT INTO location_reports (hashed_adv_key, timestamp, location)
VALUES ('test_key', NOW(), ST_Point(0, 0))
RETURNING id, timestamp
"""))
location_id, location_timestamp = result.fetchone()
# Insert geofence event (foreign key should work)
db.execute(text("""
INSERT INTO geofence_events
(tracker_id, location_report_id, location_report_timestamp, event_type, timestamp)
VALUES (1, :location_id, :location_timestamp, 'ENTRY', NOW())
"""), {
'location_id': location_id,
'location_timestamp': location_timestamp
})
# Verify foreign key relationship
assert db.execute(text("""
SELECT COUNT(*) FROM geofence_events ge
JOIN location_reports lr ON (ge.location_report_id = lr.id
AND ge.location_report_timestamp = lr.timestamp)
WHERE lr.id = :location_id
"""), {'location_id': location_id}).scalar() == 1
Rollback Impact on Relationships
If rollback is needed:
-- Remove IDENTITY (relationships unaffected)
ALTER TABLE location_reports
ALTER COLUMN id DROP IDENTITY;
Impact:
- ✅ No foreign key changes needed
- ✅ Existing relationships preserved
- ✅ Application reverts to manual ID generation
Performance Impact on Related Queries
Common Join Queries (No Impact)
-- Query performance unchanged
SELECT lr.*, ge.event_type
FROM location_reports lr
JOIN geofence_events ge ON (lr.id = ge.location_report_id
AND lr.timestamp = ge.location_report_timestamp)
WHERE lr.timestamp > NOW() - INTERVAL '1 day';
Foreign Key Lookup Performance
- ✅ Same indexes used: Composite primary key index unchanged
- ✅ Same query plans: PostgreSQL optimizer behavior identical
- ✅ Same join performance: No impact on relationship queries
Recommendations for Related Tables
- No immediate changes needed for
geofence_eventstable - Monitor foreign key performance during migration
- Consider IDENTITY for geofence_events.id in future (separate migration)
- Update documentation to reflect that IDs may have gaps
- Review application code for any assumptions about sequential IDs
Migration Strategy
Phase 1: Add IDENTITY (Low Risk)
-- This can be done online with minimal impact
ALTER TABLE location_reports
ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;
-- Set sequence to current max
SELECT setval(pg_get_serial_sequence('location_reports', 'id'),
COALESCE(MAX(id), 1))
FROM location_reports;
Phase 2: Update Application Code
# Remove manual ID generation
# OLD CODE (remove this):
# next_id_query = text("SELECT COALESCE(MAX(id), 0) + 1 FROM location_reports")
# next_id = db.execute(next_id_query).scalar()
# NEW CODE (simplified):
query = text("""
INSERT INTO location_reports
(hashed_adv_key, timestamp, location, confidence, horizontal_accuracy, nearest_city)
VALUES (:hashed_adv_key, :timestamp, ST_GeomFromText(:point, 4326),
:confidence, :horizontal_accuracy, :nearest_city)
ON CONFLICT (hashed_adv_key, timestamp)
DO UPDATE SET
confidence = EXCLUDED.confidence,
horizontal_accuracy = EXCLUDED.horizontal_accuracy
RETURNING id, (xmax = 0) AS inserted
""")
Phase 3: Testing
- Load testing: Verify performance under concurrent inserts
- Functional testing: Ensure geofence events still reference correctly
- Rollback plan: Can remove IDENTITY if issues arise
Performance Impact Analysis
Current MAX() Approach
- Time complexity: O(log n) for each insert (index scan)
- Concurrency: Potential for race conditions
- Scalability: Degrades with table size
IDENTITY Approach
- Time complexity: O(1) for ID generation
- Concurrency: Thread-safe sequence operations
- Scalability: Constant time regardless of table size
Benchmark Expectations
- Insert performance: 10-50% improvement expected
- Concurrency: Eliminates race condition errors
- Memory usage: Slightly reduced (no MAX queries)
Rollback Strategy
If issues arise, the IDENTITY can be removed:
-- Remove IDENTITY (keeps existing IDs)
ALTER TABLE location_reports
ALTER COLUMN id DROP IDENTITY;
-- Revert to manual ID generation in application code
Monitoring and Alerts
Sequence Monitoring
-- Check sequence status
SELECT schemaname, sequencename, last_value, is_called
FROM pg_sequences
WHERE sequencename LIKE '%location_reports%';
-- Monitor sequence gaps (if needed)
SELECT currval(pg_get_serial_sequence('location_reports', 'id')) -
(SELECT COUNT(*) FROM location_reports);
Performance Monitoring
- Monitor insert latency before/after migration
- Track concurrent insert error rates
- Watch for sequence contention (unlikely but possible)
Recommendation
Implement Option 1 (GENERATED BY DEFAULT AS IDENTITY) because:
- Standard approach: PostgreSQL/TimescaleDB best practice
- Performance: Eliminates MAX() overhead
- Reliability: Thread-safe, no race conditions
- Maintainability: Reduces application complexity
- Future-proof: Scales with table growth
The migration can be done with minimal downtime and provides immediate benefits for the tracker fetcher service performance and reliability.