Skip to content

TimescaleDB Production Migration Runbook

This document covers the migration of the current production TimescaleDB workload to its own dedicated EC2 instance, with PostgreSQL, PostGIS, and TimescaleDB installed on the target host.

The intent is to sign off on the plan before the migration begins.

Executive Summary

We are moving the production tracker database away from the shared production PostgreSQL host onto a dedicated EC2 instance.

Key constraints and assumptions:

  • The current source host shares a disk with a separate 1.1 TB database, so disk-level snapshots are not a viable migration method.
  • The tracker database is approximately 11 GB.
  • The team can accept a maintenance window of 3 to 4 hours.
  • Tracker data can be re-fetched from Apple for any missing records after cutover.
  • Existing pgBackRest infrastructure already exists and should be retained.
  • The dedicated Timescale EC2 does not need Patroni or a replication cluster for this migration.
  • The current production source is PostgreSQL 16.10 with PostGIS 3.5.2 and TimescaleDB 2.18.2.
  • The target should initially match those same PostgreSQL, PostGIS, and TimescaleDB versions unless a separate upgrade is approved.
  • Security groups and any VPC-adjacent infrastructure outside our VPC are managed by Makimo.
  • The new EC2 should start with the same instance type used by staging and only be adjusted if capacity requires it.
  • The new dedicated EC2 should use a 25 GB OS volume and a 50 GB PostgreSQL/TimescaleDB data volume.
  • The cutover should be scheduled ASAP.

Recommended migration method:

  1. Take a final backup with pgBackRest.
  2. Restore onto the new EC2.
  3. Use the existing pgBackRest repository for WAL archiving and recovery.
  4. Cut over during the planned outage.

This is the simplest plan that fits the available downtime and avoids introducing unnecessary replication complexity.

Decision

We are not using:

  • disk snapshots
  • a streaming-replication cluster
  • a live logical-replication catch-up phase

We are using:

  • a dedicated PostgreSQL 16 + PostGIS + TimescaleDB EC2 instance
  • the existing pgBackRest repository host at pg-backrest.timescale.glimpse.internal (172.31.203.209)
  • SSH-based pgBackRest communication between the database host and the repository host
  • WAL archiving for recovery and backup integrity

Current State

The current production setup is effectively split across the shared PostgreSQL host and the existing pgBackRest repository infrastructure.

Relevant current characteristics:

  • pgBackRest pushes backups and WAL to a repository host over SSH.
  • The repository host also publishes an S3-backed secondary repository.
  • Patroni is present on the cluster that currently uses pgBackRest, but the new dedicated Timescale host does not need Patroni unless a separate standby is introduced later.
  • PostgreSQL archiving uses archive_command = pgbackrest --stanza=production archive-push "%p".
  • Recovery uses restore_command = pgbackrest --stanza=production archive-get %f "%p".

Target State

The target system is a single dedicated EC2 instance that runs:

  • PostgreSQL 16
  • PostGIS
  • TimescaleDB
  • pgBackRest client configuration

The target instance should:

  • accept application traffic for the tracker database only
  • push WAL to the repository host
  • allow future restore/PITR operations through the existing repository
  • remain independent from the unrelated 1.1 TB database

What pgBackRest Is Doing Here

In this migration design, pgBackRest provides:

  • backups
  • WAL archiving
  • restore support
  • point-in-time recovery

It is not the same thing as PostgreSQL streaming replication.

That distinction matters because:

  • the current cluster configs reference replication for Patroni-managed nodes
  • the new dedicated Timescale EC2 does not need a Patroni cluster to satisfy the migration goal
  • for this migration, backup and archive continuity are enough

Repository Host

The repository host is:

  • pg-backrest.timescale.glimpse.internal
  • 172.31.203.209

It serves as the SSH-accessed repo1 target for pgBackRest.

The current design also has:

  • repo2 backed by S3 in eu-west-2
  • long-term retention stored separately from the fast SSH repository

Network Requirements

The new EC2 must be able to talk to the repository host over SSH.

Minimum required connectivity:

  • TCP 22 from the new Timescale EC2 to 172.31.203.209
  • TCP 22 from 172.31.203.209 to the new Timescale EC2 if the repository host needs to initiate backup or restore checks
  • HTTPS egress to S3 if repo2 is used from that host

Recommended security group posture:

  • allow SSH only between the database host and the repository host
  • keep the rule scoped to the specific private IPs or security groups
  • do not open SSH broadly across the VPC

Proposed pgBackRest Layout

The existing configuration model is broadly reusable.

Example shape:

[global]
process-max=4
log-level-console=info
log-level-file=debug
log-path=/var/log/pgbackrest
compress-type=zst
compress-level=6
delta=y
start-fast=y

repo1-host=pg-backrest.timescale.glimpse.internal
repo1-host-user=postgres

repo2-type=s3
repo2-s3-bucket=glimpse-timescale-backups
repo2-s3-region=eu-west-2
repo2-path=/pgbackrest

[production]
pg1-path=/var/pg_data/16/patroni
archive-timeout=60
archive-async=y
archive-push-queue-max=4GB
spool-path=/var/spool/pgbackrest

Notes:

  • Keep the repository-host SSH configuration in place.
  • Keep the S3 repository in place if it is already part of the backup strategy.
  • Do not copy live secrets into documentation or source control.

Migration Approach

Why This Approach

The database is small enough that a logical restore is practical, and the maintenance window is long enough to avoid introducing replication complexity.

This approach has the fewest moving parts:

  • no snapshot dependency
  • no temporary dual-write logic
  • no logical-replication slot management
  • no cluster failover complexity

Migration Sequence

  1. Provision the new EC2 with the correct OS, PostgreSQL version, PostGIS, and TimescaleDB.
  2. Configure SSH trust between the new EC2 and the pgBackRest repository host.
  3. Configure pgBackRest on the new host.
  4. Restore the latest good backup onto the new EC2.
  5. Validate the restored database before the cutover window.
  6. Stop writes on the old production system.
  7. Take a final backup if needed for safety.
  8. Restore or recover onto the new EC2 to the agreed cutover point.
  9. Switch the application connection string to the new EC2.
  10. Verify application health and database behavior.
  11. Keep the old system available for rollback until signoff is complete.

Detailed Runbook

1. Pre-Migration Preparation

Confirm the following before the maintenance window:

  • the target EC2 instance is created
  • PostgreSQL 16 is installed
  • PostGIS is installed
  • TimescaleDB is installed and matches the intended production version
  • pgBackRest is installed on the target
  • the repository host can be reached over SSH
  • the application knows the target DB hostname or IP
  • the team has a rollback decision point
  • the target EC2 starts with the staging instance type, which is currently t3.medium
  • the target EC2 uses a 25 GB OS volume and a 50 GB data volume

Also confirm:

  • the current backup set is valid
  • the backup repository contains at least one recent full backup
  • WAL archiving is healthy
  • the application is prepared to tolerate a short database outage

2. Validate the Repository Path

From the new EC2, verify access to the repository host:

ssh postgres@pg-backrest.timescale.glimpse.internal

Then verify pgBackRest connectivity and stanza visibility:

pgbackrest --stanza=production info

If SSH or stanza access fails, do not proceed to cutover.

3. Restore Onto the New EC2

Restore the latest backup to the new PostgreSQL data directory.

The restore flow should follow the normal Timescale-safe pattern:

  1. create the database cluster
  2. enable the required extensions
  3. run timescaledb_pre_restore() before restore
  4. restore the backup
  5. run timescaledb_post_restore() after restore

Example pattern:

sudo -u postgres psql -d postgres -c "CREATE DATABASE tracker;"
sudo -u postgres psql -d tracker -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
sudo -u postgres psql -d tracker -c "SELECT timescaledb_pre_restore();"
sudo -u postgres pgbackrest --stanza=production restore --type=none
sudo -u postgres psql -d tracker -c "SELECT timescaledb_post_restore();"

Use the actual database name and restore target used by the environment.

Ensure the target instance has enough disk for:

  • the restored 11 GB database
  • future WAL growth during normal operation
  • indexes and autovacuum overhead
  • temporary growth during restore and maintenance tasks

4. Verify the Restored Database

Before cutover, check:

  • PostgreSQL starts cleanly
  • TimescaleDB loads in shared_preload_libraries
  • the schema exists
  • hypertables are present
  • continuous aggregates exist
  • the expected tables contain data
  • application login works against the target

Useful checks:

SELECT extname, extversion
FROM pg_extension
WHERE extname IN ('timescaledb', 'postgis');

SELECT *
FROM timescaledb_information.hypertables;

SELECT *
FROM timescaledb_information.continuous_aggregates;

5. Cutover Window

During the maintenance window:

  1. stop application writes to the old database
  2. let any in-flight jobs finish
  3. take a final backup if the procedure requires one
  4. restore or recover the final state to the new EC2
  5. validate the restored data
  6. switch the application connection details
  7. bring the application back online

Do not shorten this step by guessing. Wait until validation passes.

6. Post-Cutover Verification

Check the following after cutover:

  • application health endpoints succeed
  • API reads and writes succeed
  • tracker ingest works
  • Timescale jobs are running
  • WAL archiving to pgBackRest is working from the new host
  • backup retention behaves as expected

Recommended smoke tests:

  • create a new tracker write
  • confirm the row lands in the database
  • confirm recent historical queries work
  • confirm the application can recover missing data from Apple as expected

7. Rollback Plan

Rollback should remain simple:

  1. stop writes to the new EC2
  2. point the application back to the old database
  3. re-enable the old system
  4. investigate the restore issue offline

Rollback is only acceptable until the old system is decommissioned.

Data Safety and Recovery

The existing pgBackRest design should remain in place after migration.

Goals:

  • keep WAL archiving active
  • keep the SSH repository reachable
  • keep the S3 repository usable for longer retention
  • preserve point-in-time recovery capability

If future needs change, the pgBackRest repository can support:

  • a fresh restore after instance failure
  • a point-in-time recovery to a known good time
  • new backup rotations on the dedicated EC2

What Is Intentionally Not Included

This migration plan does not include:

  • Patroni bootstrap
  • etcd cluster configuration
  • streaming replication setup
  • read replicas
  • logical-replication catch-up
  • dual-write application changes

Those can be added later if the system needs higher availability.

Open Questions to Resolve Before Signoff

  1. None. The migration design is now fully specified.

Signoff Checklist

  • New EC2 exists and is reachable
  • PostgreSQL, PostGIS, and TimescaleDB versions are agreed
  • pgBackRest can reach pg-backrest.timescale.glimpse.internal
  • Backup repository contains a recent valid full backup
  • WAL archiving is healthy
  • Target EC2 disk sizing is approved
  • Target EC2 uses 25 GB OS and 50 GB data volumes
  • Restore procedure is tested or rehearsed
  • App cutover time is scheduled
  • Rollback decision point is documented
  • Secrets and passwords are not stored in documentation

References