TimescaleDB Production Migration Runbook

This document covers the migration of the current production TimescaleDB workload to its own dedicated EC2 instance, with PostgreSQL, PostGIS, and TimescaleDB installed on the target host.

The intent is to sign off on the plan before the migration begins.

Executive Summary

We are moving the production tracker database away from the shared production PostgreSQL host onto a dedicated EC2 instance.

Key constraints and assumptions:

The current source host shares a disk with a separate 1.1 TB database, so disk-level snapshots are not a viable migration method.
The tracker database is approximately 11 GB.
The team can accept a maintenance window of 3 to 4 hours.
Tracker data can be re-fetched from Apple for any missing records after cutover.
Existing pgBackRest infrastructure already exists and should be retained.
The dedicated Timescale EC2 does not need Patroni or a replication cluster for this migration.
The current production source is PostgreSQL 16.10 with PostGIS 3.5.2 and TimescaleDB 2.18.2.
The target should initially match those same PostgreSQL, PostGIS, and TimescaleDB versions unless a separate upgrade is approved.
Security groups and any VPC-adjacent infrastructure outside our VPC are managed by Makimo.
The new EC2 should start with the same instance type used by staging and only be adjusted if capacity requires it.
The new dedicated EC2 should use a 25 GB OS volume and a 50 GB PostgreSQL/TimescaleDB data volume.
The cutover should be scheduled ASAP.

Recommended migration method:

Take a final backup with pgBackRest.
Restore onto the new EC2.
Use the existing pgBackRest repository for WAL archiving and recovery.
Cut over during the planned outage.

This is the simplest plan that fits the available downtime and avoids introducing unnecessary replication complexity.

Decision

We are not using:

disk snapshots
a streaming-replication cluster
a live logical-replication catch-up phase

We are using:

a dedicated PostgreSQL 16 + PostGIS + TimescaleDB EC2 instance
the existing pgBackRest repository host at pg-backrest.timescale.glimpse.internal (172.31.203.209)
SSH-based pgBackRest communication between the database host and the repository host
WAL archiving for recovery and backup integrity

Current State

The current production setup is effectively split across the shared PostgreSQL host and the existing pgBackRest repository infrastructure.

Relevant current characteristics:

pgBackRest pushes backups and WAL to a repository host over SSH.
The repository host also publishes an S3-backed secondary repository.
Patroni is present on the cluster that currently uses pgBackRest, but the new dedicated Timescale host does not need Patroni unless a separate standby is introduced later.
PostgreSQL archiving uses archive_command = pgbackrest --stanza=production archive-push "%p".
Recovery uses restore_command = pgbackrest --stanza=production archive-get %f "%p".

Target State

The target system is a single dedicated EC2 instance that runs:

PostgreSQL 16
PostGIS
TimescaleDB
pgBackRest client configuration

The target instance should:

accept application traffic for the tracker database only
push WAL to the repository host
allow future restore/PITR operations through the existing repository
remain independent from the unrelated 1.1 TB database

What `pgBackRest` Is Doing Here

In this migration design, pgBackRest provides:

backups
WAL archiving
restore support
point-in-time recovery

It is not the same thing as PostgreSQL streaming replication.

That distinction matters because:

the current cluster configs reference replication for Patroni-managed nodes
the new dedicated Timescale EC2 does not need a Patroni cluster to satisfy the migration goal
for this migration, backup and archive continuity are enough

Repository Host

The repository host is:

pg-backrest.timescale.glimpse.internal
172.31.203.209

It serves as the SSH-accessed repo1 target for pgBackRest.

The current design also has:

repo2 backed by S3 in eu-west-2
long-term retention stored separately from the fast SSH repository

Network Requirements

The new EC2 must be able to talk to the repository host over SSH.

Minimum required connectivity:

TCP 22 from the new Timescale EC2 to 172.31.203.209
TCP 22 from 172.31.203.209 to the new Timescale EC2 if the repository host needs to initiate backup or restore checks
HTTPS egress to S3 if repo2 is used from that host

Proposed `pgBackRest` Layout

The existing configuration model is broadly reusable.

Example shape:

[global]
process-max=4
log-level-console=info
log-level-file=debug
log-path=/var/log/pgbackrest
compress-type=zst
compress-level=6
delta=y
start-fast=y

repo1-host=pg-backrest.timescale.glimpse.internal
repo1-host-user=postgres

repo2-type=s3
repo2-s3-bucket=glimpse-timescale-backups
repo2-s3-region=eu-west-2
repo2-path=/pgbackrest

[production]
pg1-path=/var/pg_data/16/patroni
archive-timeout=60
archive-async=y
archive-push-queue-max=4GB
spool-path=/var/spool/pgbackrest

Notes:

Keep the repository-host SSH configuration in place.
Keep the S3 repository in place if it is already part of the backup strategy.
Do not copy live secrets into documentation or source control.

Migration Approach

Why This Approach

The database is small enough that a logical restore is practical, and the maintenance window is long enough to avoid introducing replication complexity.

This approach has the fewest moving parts:

no snapshot dependency
no temporary dual-write logic
no logical-replication slot management
no cluster failover complexity

Migration Sequence

Provision the new EC2 with the correct OS, PostgreSQL version, PostGIS, and TimescaleDB.
Configure SSH trust between the new EC2 and the pgBackRest repository host.
Configure pgBackRest on the new host.
Restore the latest good backup onto the new EC2.
Validate the restored database before the cutover window.
Stop writes on the old production system.
Take a final backup if needed for safety.
Restore or recover onto the new EC2 to the agreed cutover point.
Switch the application connection string to the new EC2.
Verify application health and database behavior.
Keep the old system available for rollback until signoff is complete.

Detailed Runbook

1. Pre-Migration Preparation

Confirm the following before the maintenance window:

the target EC2 instance is created
PostgreSQL 16 is installed
PostGIS is installed
TimescaleDB is installed and matches the intended production version
pgBackRest is installed on the target
the repository host can be reached over SSH
the application knows the target DB hostname or IP
the team has a rollback decision point
the target EC2 starts with the staging instance type, which is currently t3.medium
the target EC2 uses a 25 GB OS volume and a 50 GB data volume

Also confirm:

the current backup set is valid
the backup repository contains at least one recent full backup
WAL archiving is healthy
the application is prepared to tolerate a short database outage

2. Validate the Repository Path

From the new EC2, verify access to the repository host:

ssh postgres@pg-backrest.timescale.glimpse.internal

Then verify pgBackRest connectivity and stanza visibility:

pgbackrest --stanza=production info

If SSH or stanza access fails, do not proceed to cutover.

3. Restore Onto the New EC2

Restore the latest backup to the new PostgreSQL data directory.

The restore flow should follow the normal Timescale-safe pattern:

create the database cluster
enable the required extensions
run timescaledb_pre_restore() before restore
restore the backup
run timescaledb_post_restore() after restore

Example pattern:

sudo -u postgres psql -d postgres -c "CREATE DATABASE tracker;"
sudo -u postgres psql -d tracker -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
sudo -u postgres psql -d tracker -c "SELECT timescaledb_pre_restore();"
sudo -u postgres pgbackrest --stanza=production restore --type=none
sudo -u postgres psql -d tracker -c "SELECT timescaledb_post_restore();"

Use the actual database name and restore target used by the environment.

Ensure the target instance has enough disk for:

the restored 11 GB database
future WAL growth during normal operation
indexes and autovacuum overhead
temporary growth during restore and maintenance tasks

4. Verify the Restored Database

Before cutover, check:

PostgreSQL starts cleanly
TimescaleDB loads in shared_preload_libraries
the schema exists
hypertables are present
continuous aggregates exist
the expected tables contain data
application login works against the target

Useful checks:

SELECT extname, extversion
FROM pg_extension
WHERE extname IN ('timescaledb', 'postgis');

SELECT *
FROM timescaledb_information.hypertables;

SELECT *
FROM timescaledb_information.continuous_aggregates;

5. Cutover Window

During the maintenance window:

stop application writes to the old database
let any in-flight jobs finish
take a final backup if the procedure requires one
restore or recover the final state to the new EC2
validate the restored data
switch the application connection details
bring the application back online

Do not shorten this step by guessing. Wait until validation passes.

6. Post-Cutover Verification

Check the following after cutover:

application health endpoints succeed
API reads and writes succeed
tracker ingest works
Timescale jobs are running
WAL archiving to pgBackRest is working from the new host
backup retention behaves as expected

Recommended smoke tests:

create a new tracker write
confirm the row lands in the database
confirm recent historical queries work
confirm the application can recover missing data from Apple as expected

7. Rollback Plan

Rollback should remain simple:

stop writes to the new EC2
point the application back to the old database
re-enable the old system
investigate the restore issue offline

Rollback is only acceptable until the old system is decommissioned.

Data Safety and Recovery

The existing pgBackRest design should remain in place after migration.

Goals:

keep WAL archiving active
keep the SSH repository reachable
keep the S3 repository usable for longer retention
preserve point-in-time recovery capability

If future needs change, the pgBackRest repository can support:

a fresh restore after instance failure
a point-in-time recovery to a known good time
new backup rotations on the dedicated EC2

What Is Intentionally Not Included

This migration plan does not include:

Patroni bootstrap
etcd cluster configuration
streaming replication setup
read replicas
logical-replication catch-up
dual-write application changes

TimescaleDB Production Migration Runbook

Executive Summary

Decision

Current State

Target State

What `pgBackRest` Is Doing Here

Repository Host

Network Requirements

Proposed `pgBackRest` Layout

Migration Approach

Why This Approach

Migration Sequence

Detailed Runbook

1. Pre-Migration Preparation

2. Validate the Repository Path

3. Restore Onto the New EC2

4. Verify the Restored Database

5. Cutover Window

6. Post-Cutover Verification

7. Rollback Plan

Data Safety and Recovery

What Is Intentionally Not Included

Open Questions to Resolve Before Signoff

Signoff Checklist

References

TimescaleDB Production Migration Runbook

Executive Summary

Decision

Current State

Target State

What pgBackRest Is Doing Here

Repository Host

Network Requirements

Proposed pgBackRest Layout

Migration Approach

Why This Approach

Migration Sequence

Detailed Runbook

1. Pre-Migration Preparation

2. Validate the Repository Path

3. Restore Onto the New EC2

4. Verify the Restored Database

5. Cutover Window

6. Post-Cutover Verification

7. Rollback Plan

Data Safety and Recovery

What Is Intentionally Not Included

Open Questions to Resolve Before Signoff

Signoff Checklist

References

What `pgBackRest` Is Doing Here

Proposed `pgBackRest` Layout