TimescaleDB Production Migration Runbook
This document covers the migration of the current production TimescaleDB workload to its own dedicated EC2 instance, with PostgreSQL, PostGIS, and TimescaleDB installed on the target host.
The intent is to sign off on the plan before the migration begins.
Executive Summary
We are moving the production tracker database away from the shared production PostgreSQL host onto a dedicated EC2 instance.
Key constraints and assumptions:
- The current source host shares a disk with a separate 1.1 TB database, so disk-level snapshots are not a viable migration method.
- The tracker database is approximately 11 GB.
- The team can accept a maintenance window of 3 to 4 hours.
- Tracker data can be re-fetched from Apple for any missing records after cutover.
- Existing
pgBackRestinfrastructure already exists and should be retained. - The dedicated Timescale EC2 does not need Patroni or a replication cluster for this migration.
- The current production source is PostgreSQL 16.10 with PostGIS 3.5.2 and TimescaleDB 2.18.2.
- The target should initially match those same PostgreSQL, PostGIS, and TimescaleDB versions unless a separate upgrade is approved.
- Security groups and any VPC-adjacent infrastructure outside our VPC are managed by Makimo.
- The new EC2 should start with the same instance type used by staging and only be adjusted if capacity requires it.
- The new dedicated EC2 should use a 25 GB OS volume and a 50 GB PostgreSQL/TimescaleDB data volume.
- The cutover should be scheduled ASAP.
Recommended migration method:
- Take a final backup with
pgBackRest. - Restore onto the new EC2.
- Use the existing
pgBackRestrepository for WAL archiving and recovery. - Cut over during the planned outage.
This is the simplest plan that fits the available downtime and avoids introducing unnecessary replication complexity.
Decision
We are not using:
- disk snapshots
- a streaming-replication cluster
- a live logical-replication catch-up phase
We are using:
- a dedicated PostgreSQL 16 + PostGIS + TimescaleDB EC2 instance
- the existing
pgBackRestrepository host atpg-backrest.timescale.glimpse.internal(172.31.203.209) - SSH-based pgBackRest communication between the database host and the repository host
- WAL archiving for recovery and backup integrity
Current State
The current production setup is effectively split across the shared PostgreSQL host and the existing pgBackRest repository infrastructure.
Relevant current characteristics:
pgBackRestpushes backups and WAL to a repository host over SSH.- The repository host also publishes an S3-backed secondary repository.
- Patroni is present on the cluster that currently uses
pgBackRest, but the new dedicated Timescale host does not need Patroni unless a separate standby is introduced later. - PostgreSQL archiving uses
archive_command = pgbackrest --stanza=production archive-push "%p". - Recovery uses
restore_command = pgbackrest --stanza=production archive-get %f "%p".
Target State
The target system is a single dedicated EC2 instance that runs:
- PostgreSQL 16
- PostGIS
- TimescaleDB
pgBackRestclient configuration
The target instance should:
- accept application traffic for the tracker database only
- push WAL to the repository host
- allow future restore/PITR operations through the existing repository
- remain independent from the unrelated 1.1 TB database
What pgBackRest Is Doing Here
In this migration design, pgBackRest provides:
- backups
- WAL archiving
- restore support
- point-in-time recovery
It is not the same thing as PostgreSQL streaming replication.
That distinction matters because:
- the current cluster configs reference replication for Patroni-managed nodes
- the new dedicated Timescale EC2 does not need a Patroni cluster to satisfy the migration goal
- for this migration, backup and archive continuity are enough
Repository Host
The repository host is:
pg-backrest.timescale.glimpse.internal172.31.203.209
It serves as the SSH-accessed repo1 target for pgBackRest.
The current design also has:
repo2backed by S3 ineu-west-2- long-term retention stored separately from the fast SSH repository
Network Requirements
The new EC2 must be able to talk to the repository host over SSH.
Minimum required connectivity:
- TCP 22 from the new Timescale EC2 to
172.31.203.209 - TCP 22 from
172.31.203.209to the new Timescale EC2 if the repository host needs to initiate backup or restore checks - HTTPS egress to S3 if
repo2is used from that host
Recommended security group posture:
- allow SSH only between the database host and the repository host
- keep the rule scoped to the specific private IPs or security groups
- do not open SSH broadly across the VPC
Proposed pgBackRest Layout
The existing configuration model is broadly reusable.
Example shape:
[global]
process-max=4
log-level-console=info
log-level-file=debug
log-path=/var/log/pgbackrest
compress-type=zst
compress-level=6
delta=y
start-fast=y
repo1-host=pg-backrest.timescale.glimpse.internal
repo1-host-user=postgres
repo2-type=s3
repo2-s3-bucket=glimpse-timescale-backups
repo2-s3-region=eu-west-2
repo2-path=/pgbackrest
[production]
pg1-path=/var/pg_data/16/patroni
archive-timeout=60
archive-async=y
archive-push-queue-max=4GB
spool-path=/var/spool/pgbackrest
Notes:
- Keep the repository-host SSH configuration in place.
- Keep the S3 repository in place if it is already part of the backup strategy.
- Do not copy live secrets into documentation or source control.
Migration Approach
Why This Approach
The database is small enough that a logical restore is practical, and the maintenance window is long enough to avoid introducing replication complexity.
This approach has the fewest moving parts:
- no snapshot dependency
- no temporary dual-write logic
- no logical-replication slot management
- no cluster failover complexity
Migration Sequence
- Provision the new EC2 with the correct OS, PostgreSQL version, PostGIS, and TimescaleDB.
- Configure SSH trust between the new EC2 and the pgBackRest repository host.
- Configure
pgBackReston the new host. - Restore the latest good backup onto the new EC2.
- Validate the restored database before the cutover window.
- Stop writes on the old production system.
- Take a final backup if needed for safety.
- Restore or recover onto the new EC2 to the agreed cutover point.
- Switch the application connection string to the new EC2.
- Verify application health and database behavior.
- Keep the old system available for rollback until signoff is complete.
Detailed Runbook
1. Pre-Migration Preparation
Confirm the following before the maintenance window:
- the target EC2 instance is created
- PostgreSQL 16 is installed
- PostGIS is installed
- TimescaleDB is installed and matches the intended production version
pgBackRestis installed on the target- the repository host can be reached over SSH
- the application knows the target DB hostname or IP
- the team has a rollback decision point
- the target EC2 starts with the staging instance type, which is currently
t3.medium - the target EC2 uses a 25 GB OS volume and a 50 GB data volume
Also confirm:
- the current backup set is valid
- the backup repository contains at least one recent full backup
- WAL archiving is healthy
- the application is prepared to tolerate a short database outage
2. Validate the Repository Path
From the new EC2, verify access to the repository host:
ssh postgres@pg-backrest.timescale.glimpse.internal
Then verify pgBackRest connectivity and stanza visibility:
pgbackrest --stanza=production info
If SSH or stanza access fails, do not proceed to cutover.
3. Restore Onto the New EC2
Restore the latest backup to the new PostgreSQL data directory.
The restore flow should follow the normal Timescale-safe pattern:
- create the database cluster
- enable the required extensions
- run
timescaledb_pre_restore()before restore - restore the backup
- run
timescaledb_post_restore()after restore
Example pattern:
sudo -u postgres psql -d postgres -c "CREATE DATABASE tracker;"
sudo -u postgres psql -d tracker -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
sudo -u postgres psql -d tracker -c "SELECT timescaledb_pre_restore();"
sudo -u postgres pgbackrest --stanza=production restore --type=none
sudo -u postgres psql -d tracker -c "SELECT timescaledb_post_restore();"
Use the actual database name and restore target used by the environment.
Ensure the target instance has enough disk for:
- the restored 11 GB database
- future WAL growth during normal operation
- indexes and autovacuum overhead
- temporary growth during restore and maintenance tasks
4. Verify the Restored Database
Before cutover, check:
- PostgreSQL starts cleanly
- TimescaleDB loads in
shared_preload_libraries - the schema exists
- hypertables are present
- continuous aggregates exist
- the expected tables contain data
- application login works against the target
Useful checks:
SELECT extname, extversion
FROM pg_extension
WHERE extname IN ('timescaledb', 'postgis');
SELECT *
FROM timescaledb_information.hypertables;
SELECT *
FROM timescaledb_information.continuous_aggregates;
5. Cutover Window
During the maintenance window:
- stop application writes to the old database
- let any in-flight jobs finish
- take a final backup if the procedure requires one
- restore or recover the final state to the new EC2
- validate the restored data
- switch the application connection details
- bring the application back online
Do not shorten this step by guessing. Wait until validation passes.
6. Post-Cutover Verification
Check the following after cutover:
- application health endpoints succeed
- API reads and writes succeed
- tracker ingest works
- Timescale jobs are running
- WAL archiving to pgBackRest is working from the new host
- backup retention behaves as expected
Recommended smoke tests:
- create a new tracker write
- confirm the row lands in the database
- confirm recent historical queries work
- confirm the application can recover missing data from Apple as expected
7. Rollback Plan
Rollback should remain simple:
- stop writes to the new EC2
- point the application back to the old database
- re-enable the old system
- investigate the restore issue offline
Rollback is only acceptable until the old system is decommissioned.
Data Safety and Recovery
The existing pgBackRest design should remain in place after migration.
Goals:
- keep WAL archiving active
- keep the SSH repository reachable
- keep the S3 repository usable for longer retention
- preserve point-in-time recovery capability
If future needs change, the pgBackRest repository can support:
- a fresh restore after instance failure
- a point-in-time recovery to a known good time
- new backup rotations on the dedicated EC2
What Is Intentionally Not Included
This migration plan does not include:
- Patroni bootstrap
- etcd cluster configuration
- streaming replication setup
- read replicas
- logical-replication catch-up
- dual-write application changes
Those can be added later if the system needs higher availability.
Open Questions to Resolve Before Signoff
- None. The migration design is now fully specified.
Signoff Checklist
- New EC2 exists and is reachable
- PostgreSQL, PostGIS, and TimescaleDB versions are agreed
-
pgBackRestcan reachpg-backrest.timescale.glimpse.internal - Backup repository contains a recent valid full backup
- WAL archiving is healthy
- Target EC2 disk sizing is approved
- Target EC2 uses 25 GB OS and 50 GB data volumes
- Restore procedure is tested or rehearsed
- App cutover time is scheduled
- Rollback decision point is documented
- Secrets and passwords are not stored in documentation