TimescaleDB Features

This project uses TimescaleDB to efficiently store and query time-series data. TimescaleDB is an extension to PostgreSQL that provides specialized features for time-series data.

Overview

TimescaleDB is used in this project for:

Hypertables: Partitioning time-series data for efficient storage and querying
Continuous Aggregates: Pre-computing aggregated data for faster queries
Materialized Views: Storing pre-computed query results for efficient access
Geospatial Functions: Working with location data using PostGIS integration

Database Schema

The following tables use TimescaleDB features:

location_reports: Raw location data from trackers, stored in a hypertable
location_history_hourly: Hourly aggregated location data (continuous aggregate)
location_history_daily: Daily aggregated location data (continuous aggregate)
location_history: Materialized view combining recent hourly data with older daily data

Applying TimescaleDB Features

Some TimescaleDB features (like continuous aggregates and materialized views) cannot be created inside a transaction. These features are not included in the regular Alembic migrations and need to be applied separately.

In ECS, the database migration task now runs scripts/run_migrations.sh, which applies alembic upgrade heads and then executes scripts/apply_timescaledb_features.sh so the schema bootstrap is fully automated. Using heads avoids failing when the revision graph briefly has more than one active branch.

Prerequisites

Before applying TimescaleDB features, ensure that:

The TimescaleDB extension is installed in your PostgreSQL database
The regular database migrations have been applied
PostgreSQL client tools (psql) are installed if you're using the provided script

Using the Script

A script is provided to apply TimescaleDB features:

# With Docker (requires PostgreSQL client tools in the API container)
docker compose exec api ./scripts/apply_timescaledb_features.sh

# Without Docker (requires PostgreSQL client tools)
./scripts/apply_timescaledb_features.sh

If PostgreSQL client tools are not available in your environment, you can run the SQL script directly in the PostgreSQL container:

# With Docker, run directly in the PostgreSQL container
docker compose exec db psql -U postgres -d tracker -f /path/to/scripts/create_timescaledb_views.sql

For AWS ECS staging, the repeatable path is the helper script:

./scripts/run_staging_migrations.sh

That wrapper runs the staging migrations ECS task, waits for it to stop, and prints the CloudWatch logs for the migration container.

What the Script Does

The script performs the following operations:

Creates a hypertable for the location_reports table using the timestamp column for partitioning
Creates a continuous aggregate for hourly location data (location_history_hourly)
Creates a continuous aggregate for daily location data (location_history_daily)
Creates a view that combines recent hourly data with older daily data (location_history_view)
Creates a materialized view based on that view (location_history)
Creates functions for geofence matching and materialized view refreshing

Best Practices

The database schema follows these best practices:

Using TEXT instead of VARCHAR for string columns
In PostgreSQL, there's no performance difference between TEXT and VARCHAR(n)
Both use the same internal storage mechanism
TEXT is more flexible if requirements change (no need to alter column length)
VARCHAR(n) is appropriate when there's a specific need to enforce a length constraint
Using TIMESTAMPTZ (timestamp with time zone) instead of TIMESTAMP for timestamp columns
TIMESTAMPTZ stores the time zone information, which is important for accurate time-series data
This prevents issues with daylight saving time and time zone conversions
Including the partitioning column (timestamp) in the primary key of hypertables
This is required by TimescaleDB for efficient partitioning
It allows for better query performance on time-based queries
Using continuous aggregates for efficient querying of aggregated time-series data
Pre-computed aggregates significantly improve query performance
TimescaleDB automatically updates these aggregates as new data arrives
Using materialized views for efficient access to pre-computed query results
Materialized views store the results of complex queries
They can be refreshed periodically to include new data

Maintenance

Refreshing Materialized Views

The materialized views need to be refreshed periodically to include new data. A function is provided for this purpose:

SELECT refresh_location_history();

This function can be called manually or scheduled to run periodically using a cron job or similar mechanism.

Geofence Matching

A function is provided for matching tracker locations with geofences:

SELECT match_geofences();

This function:

Finds trackers with location history in the last 24 hours
Checks if they are within a storage or delivery location geofence
Updates the tracker status if they have been in a geofence for a sufficient amount of time
Adds a status history entry for the tracker

This function can also be called manually or scheduled to run periodically.

Troubleshooting

Common Issues

"function create_hypertable does not exist": This error indicates that the TimescaleDB extension is not installed or not enabled in the database. Ensure that the extension is installed and enabled.
"relation does not exist": This error indicates that the tables haven't been created yet. Run the Alembic migrations first to create the tables.
"CREATE MATERIALIZED VIEW ... WITH DATA cannot run inside a transaction block": This error occurs when trying to create a continuous aggregate or materialized view inside a transaction. Use the provided script to apply these features outside of a transaction.

Checking TimescaleDB Status

To check if TimescaleDB is installed and enabled:

SELECT * FROM pg_extension WHERE extname = 'timescaledb';

To check the version of TimescaleDB:

SELECT extversion FROM pg_extension WHERE extname = 'timescaledb';

To list all hypertables:

SELECT * FROM timescaledb_information.hypertables;

To list all continuous aggregates:

SELECT * FROM timescaledb_information.continuous_aggregates;