TimescaleDB Production Migration Runbook
This document covers the migration of the current production TimescaleDB workload to its own dedicated EC2 instance, with PostgreSQL, PostGIS, and TimescaleDB installed on the target host.
The intent is to sign off on the plan before the migration begins.
Executive Summary
We are moving the production tracker database away from the shared production PostgreSQL host onto a dedicated EC2 instance.
Key constraints and assumptions:
- The current source host shares a disk with a separate 1.1 TB database, so disk-level snapshots are not a viable migration method.
- The tracker database is approximately 11 GB.
- The team can accept a maintenance window of 3 to 4 hours.
- Tracker data can be re-fetched from Apple for any missing records after cutover.
- Existing
pgBackRestinfrastructure already exists and should be retained. - The dedicated Timescale EC2 does not need Patroni or a replication cluster for this migration.
- The current production source is PostgreSQL 16.10 with PostGIS 3.5.2 and TimescaleDB 2.18.2.
- The target should initially match those same PostgreSQL, PostGIS, and TimescaleDB versions unless a separate upgrade is approved.
- Security groups and any VPC-adjacent infrastructure outside our VPC are managed by Makimo.
- We can make the new EC2 side of the
pgBackRestpath operational in our VPC while Makimo handles the repository-side VPC and connectivity. - The new EC2 should start with the same instance type used by staging and only be adjusted if capacity requires it.
- The new dedicated EC2 should use a 25 GB OS volume and a 50 GB PostgreSQL/TimescaleDB data volume sized for the tracker database restore and working space only.
- The cutover should be scheduled ASAP.
Recommended migration method:
- Provision the new EC2 and its two EBS volumes.
- Take a final logical backup of the tracker database only.
- Restore that database onto the new EC2.
- Use the existing
pgBackRestrepository for WAL archiving and recovery on the new host after cutover. - Cut over during the planned outage.
This is the simplest plan that fits the available downtime and avoids introducing unnecessary replication complexity.
Important constraint:
- Do not use the existing
pgBackRestrepository as the source of truth for the tracker migration restore, because it contains backups for the shared 1.1 TB production database as well. - Use
pg_dump/pg_restorefor the tracker database migration itself. - Keep
pgBackReston the new host for ongoing archiving and future recovery after the cutover.
Decision
We are not using:
- disk snapshots
- a streaming-replication cluster
- a live logical-replication catch-up phase
We are using:
- a dedicated PostgreSQL 16 + PostGIS + TimescaleDB EC2 instance
- the existing
pgBackRestrepository host atpg-backrest.timescale.glimpse.internal(172.31.203.209) - SSH-based pgBackRest communication between the database host and the repository host
- WAL archiving for recovery and backup integrity
Current State
The current production setup is effectively split across the shared PostgreSQL host and the existing pgBackRest repository infrastructure.
Relevant current characteristics:
pgBackRestpushes backups and WAL to a repository host over SSH.- The repository host also publishes an S3-backed secondary repository.
- Patroni is present on the cluster that currently uses
pgBackRest, but the new dedicated Timescale host does not need Patroni unless a separate standby is introduced later. - PostgreSQL archiving uses
archive_command = pgbackrest --stanza=tracker archive-push "%p". - Recovery uses
restore_command = pgbackrest --stanza=tracker archive-get %f "%p". - That repository is shared with the 1.1 TB production database, so it is not suitable for a tracker-only migration restore.
Target State
The target system is a single dedicated EC2 instance that runs:
- PostgreSQL 16
- PostGIS
- TimescaleDB
pgBackRestclient configuration- a mounted 50 GB data volume at
/var/lib/postgresqlfor the database cluster
The target instance should:
- accept application traffic for the tracker database only
- push WAL to the repository host
- allow future restore/PITR operations through the existing repository
- keep the PostgreSQL data directory on the dedicated EBS volume rather than the root disk
- remain independent from the unrelated 1.1 TB database
What pgBackRest Is Doing Here
In this migration design, pgBackRest provides:
- backups
- WAL archiving
- restore support
- point-in-time recovery
It is not the same thing as PostgreSQL streaming replication.
That distinction matters because:
- the current cluster configs reference replication for Patroni-managed nodes
- the new dedicated Timescale EC2 does not need a Patroni cluster to satisfy the migration goal
- for this migration, backup and archive continuity are enough
Repository Host
The repository host is:
pg-backrest.timescale.glimpse.internal172.31.203.209
It serves as the SSH-accessed repo3 target for pgBackRest.
The current design also has:
repo2backed by S3 ineu-west-2- long-term retention stored separately from the fast SSH repository
Network Requirements
The new EC2 must be able to talk to the repository host over SSH.
Our scope is the new Timescale EC2, its subnet routing, and its security groups. Makimo can manage the repository-side VPC and any required reciprocal rules there.
Minimum required connectivity:
- TCP 22 from the new Timescale EC2 to
172.31.203.209 - TCP 22 from
172.31.203.209to the new Timescale EC2 if the repository host needs to initiate backup or restore checks - HTTPS egress to S3 if
repo2is used from that host
Recommended security group posture:
- allow SSH only between the database host and the repository host
- keep the rule scoped to the specific private IPs or security groups
- do not open SSH broadly across the VPC
Proposed pgBackRest Layout
The existing configuration model is broadly reusable.
Example shape:
[global]
process-max=4
log-level-console=info
log-level-file=debug
log-path=/var/log/pgbackrest
compress-type=zst
compress-level=6
delta=y
start-fast=y
repo3-host=pg-backrest.timescale.glimpse.internal
repo3-host-user=postgres
repo3-path=/var/pg_backrest_data/repo3
repo3-cipher-type=aes-256-cbc
repo3-cipher-pass=<from-secret>
repo2-type=s3
repo2-s3-bucket=glimpse-timescale-backups
repo2-s3-region=eu-west-2
repo2-path=/pgbackrest
[tracker]
pg1-path=/var/lib/postgresql/16/main
archive-timeout=60
archive-async=y
archive-push-queue-max=4GB
spool-path=/var/spool/pgbackrest
Notes:
- Keep the repository-host SSH configuration in place.
- Keep the S3 repository in place if it is already part of the backup strategy.
- Do not copy live secrets into documentation or source control.
- Supply the
repo3-cipher-passvalue from Secrets Manager or another external secret store. In prodtfvars, use the secret name rather than the ARN.
Migration Approach
Why This Approach
The database is small enough that a logical restore is practical, and the maintenance window is long enough to avoid introducing replication complexity.
This approach has the fewest moving parts:
- no snapshot dependency
- no temporary dual-write logic
- no logical-replication slot management
- no cluster failover complexity
Migration Sequence
- Provision the new EC2 with the correct OS, PostgreSQL version, PostGIS, and TimescaleDB.
- Configure SSH trust between the new EC2 and the pgBackRest repository host.
- Configure
pgBackReston the new host. - Take a logical backup of the tracker database only.
- Restore the tracker database onto the new EC2.
- Validate the restored database before the cutover window.
- Stop writes on the old production system.
- Take a final logical backup if needed for safety.
- Restore the final tracker state to the new EC2.
- Switch the application connection string to the new EC2.
- Verify application health and database behavior.
- Keep the old system available for rollback until signoff is complete.
Detailed Runbook
1. Pre-Migration Preparation
Confirm the following before the maintenance window:
- the target EC2 instance is created
- PostgreSQL 16 is installed
- PostGIS is installed
- TimescaleDB is installed and matches the intended production version
pgBackRestis installed on the target- the repository host can be reached over SSH
- the application knows the target DB hostname or IP
- the team has a rollback decision point
- the target EC2 starts with the staging instance type, which is currently
t3.medium - the target EC2 uses a 25 GB OS volume and a 50 GB data volume
- the target EC2 data volume is sized for the 11 GB tracker restore plus indexes, WAL, and temporary growth
Also confirm:
- the current backup set is valid
- the backup repository contains at least one recent full backup
- WAL archiving is healthy
- the application is prepared to tolerate a short database outage
2. Validate the Repository Path
From the new EC2, verify access to the repository host:
ssh postgres@pg-backrest.timescale.glimpse.internal
Then verify pgBackRest connectivity and stanza visibility:
pgbackrest --stanza=tracker info
This validates the repository path for future backup and recovery on the new host.
If SSH or stanza access fails, do not proceed with post-cutover backup wiring.
Do not use this repository to restore the tracker database for the migration itself.
3. Restore Onto the New EC2
Create a logical backup of the tracker database on the old host, then restore that dump onto the new EC2.
The restore flow should follow the normal Timescale-safe pattern for a logical dump:
- create the database cluster
- create the tracker database
- enable the required extensions
- run
timescaledb_pre_restore()before restore - restore the tracker database dump
- run
timescaledb_post_restore()after restore - reassign ownership and baseline privileges back to the
trackerrole using targeted object ownership changes
Do not attempt a pgBackRest restore for the migration cutover. The repository contains backup history for the shared 1.1 TB host, not a tracker-only backup set.
Example pattern:
On the old host:
sudo -u postgres pg_dump -Fc --no-owner --no-privileges --file=/tmp/tracker.dump tracker
On the new EC2:
sudo -u postgres psql -d postgres -c "CREATE DATABASE tracker;"
sudo -u postgres psql -d tracker -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
sudo -u postgres psql -d tracker -c "SELECT timescaledb_pre_restore();"
sudo -u postgres pg_restore --dbname=tracker --no-owner --no-privileges /path/to/tracker.dump
sudo -u postgres psql -d tracker -c "SELECT timescaledb_post_restore();"
sudo -u postgres psql -f /path/to/post_restore_reassign_ownership.sql
Use the actual database name and restore target used by the environment.
Ensure the target instance has enough disk for:
- the restored 11 GB database
- future WAL growth during normal operation
- indexes and autovacuum overhead
- temporary growth during restore and maintenance tasks
4. Verify the Restored Database
Before cutover, check:
- PostgreSQL starts cleanly
- TimescaleDB loads in
shared_preload_libraries - the schema exists
- hypertables are present
- continuous aggregates exist
- ownership has been reassigned to
tracker - the expected tables contain data
- application login works against the target
Useful checks:
SELECT extname, extversion
FROM pg_extension
WHERE extname IN ('timescaledb', 'postgis');
SELECT *
FROM timescaledb_information.hypertables;
SELECT *
FROM timescaledb_information.continuous_aggregates;
5. Cutover Window
During the maintenance window:
- stop application writes to the old database
- let any in-flight jobs finish
- take a final logical backup if the procedure requires one
- restore the final tracker dump to the new EC2
- validate the restored data
- repair database ownership and baseline grants if the restore ran with
--no-owner - switch the application connection details
- bring the application back online
Do not shorten this step by guessing. Wait until validation passes.
6. Post-Cutover Verification
Check the following after cutover:
- application health endpoints succeed
- API reads and writes succeed
- tracker ingest works
- Timescale jobs are running
- WAL archiving to pgBackRest is working from the new host
- backup retention behaves as expected
Recommended smoke tests:
- create a new tracker write
- confirm the row lands in the database
- confirm recent historical queries work
- confirm the application can recover missing data from Apple as expected
7. Rollback Plan
Rollback should remain simple:
- stop writes to the new EC2
- point the application back to the old database
- re-enable the old system
- investigate the restore issue offline
Rollback is only acceptable until the old system is decommissioned.
Data Safety and Recovery
The existing pgBackRest design should remain in place after migration.
Goals:
- keep WAL archiving active
- keep the SSH repository reachable
- keep the S3 repository usable for longer retention
- preserve point-in-time recovery capability
If future needs change, the pgBackRest repository can support:
- a fresh restore after instance failure
- a point-in-time recovery to a known good time
- new backup rotations on the dedicated EC2
What Is Intentionally Not Included
This migration plan does not include:
- Patroni bootstrap
- etcd cluster configuration
- streaming replication setup
- read replicas
- logical-replication catch-up
- dual-write application changes
Those can be added later if the system needs higher availability.
Open Questions to Resolve Before Signoff
- None. The migration design is now fully specified.
Signoff Checklist
- New EC2 exists and is reachable
- PostgreSQL, PostGIS, and TimescaleDB versions are agreed
-
pgBackRestcan reachpg-backrest.timescale.glimpse.internal - Tracker logical backup process is tested or rehearsed
- WAL archiving is healthy
- Target EC2 disk sizing is approved
- Target EC2 uses 25 GB OS and 50 GB data volumes
- Restore procedure is tested or rehearsed
- App cutover time is scheduled
- Rollback decision point is documented
- Secrets and passwords are not stored in documentation