Tracker REST API — Terraform Deployment Guide
Single source of truth for the complete AWS infrastructure lifecycle: from first-time bootstrap through daily operations, container rollouts, database migrations, and seeding.
Table of Contents
- Architecture Overview
- Directory Layout
- Module Reference
- Prerequisites
- First-Time Bootstrap
- Daily Developer Workflow
- Deploying Container Updates
- Running Database Migrations
- Seeding Default Users
- Destroying the Environment
- Troubleshooting
1. Architecture Overview
All services run inside a single VPC (10.40.0.0/24) in eu-west-2 (London). There are no publicly routable IPs on any service; all external traffic enters through the ALB. The database host is reached exclusively via SSM Session Manager.
This guide describes the current staging-shaped stack. For the account-separated staging and production requirements, see Terraform Environment Compatibility.
Internet
│
▼
Cloudflare DNS
(tracker.staging.glimpse.technology)
│
▼
WAF WebACL
│
▼
Application Load Balancer (public subnets)
├── /api/* → ECS API service (port 8000)
├── tracker-admin.* → ECS Admin service (port 80)
└── default → ECS Frontend service (port 80)
│
▼ (private subnets)
ECS Fargate Cluster
├── api (FastAPI, 512 CPU / 1024 MB)
├── frontend (Nginx SPA, 256 CPU / 512 MB)
├── admin (Nginx SPA, 256 CPU / 512 MB)
├── anisette-v3 * (Anisette server, 256 CPU / 512 MB)
├── tracker-fetcher-2 *
├── unified-geofence *
├── notification-service *
└── materialized-view-service *
* Conditional on enable_anisette / enable_workers
EFS (persistent NFS)
└── /anisette → anisette-v3 service
└── /data → worker services
EC2 Database Host (private subnet, t3.medium)
├── PostgreSQL 16 + TimescaleDB + PostGIS (port 5432)
└── Valkey (port 6379)
Key Properties
- No SSH: The database EC2 instance has no SSH; access is via
aws ssm start-session. - No Public IPs: ECS tasks and the database host run with
assignPublicIp = DISABLED. - IMDSv2 enforced: EC2 metadata service requires signed tokens (hop limit 2).
- Immutable image tags: ECR is configured with
imageTagMutability = IMMUTABLE. Every push requires a new tag. - Single KMS key: One CMK encrypts EBS, ECR, EFS, S3 (ALB logs), CloudWatch Logs, Secrets Manager.
- Managed secrets: PostgreSQL password, Redis/Valkey password, and the app
SECRET_KEYare randomly generated by Terraform and stored in Secrets Manager. They are never in.tfvarsor environment files.
2. Directory Layout
infra/
├── terraform-guide.md ← this file
├── infrastructure.md ← module reference
├── README.md ← quick-start summary
├── envs/
│ └── staging/
│ ├── main.tf # Module composition — the "wiring"
│ ├── locals.tf # resource_prefix, common_tags
│ ├── variables.tf # All input variable declarations
│ ├── outputs.tf # All stack outputs
│ ├── providers.tf # AWS provider (region, profile, default tags)
│ ├── versions.tf # terraform >= 1.7, aws ~> 5.0, random ~> 3.6
│ ├── backend.tf # S3 backend stub (config loaded from backend.hcl)
│ ├── backend.hcl.example # Template — copy to backend.hcl, do not commit
│ ├── terraform.tfvars # Non-secret variable values
│ ├── terraform.tfvars.example # Template for the above
│ └── image-tags.auto.tfvars.json # Auto-updated container image tag map
└── modules/
├── kms/ # Customer-managed KMS key
├── network/ # VPC, subnets, IGW, NAT, flow logs
├── security/ # Security groups (ALB, ECS, DB, Cache, EFS)
├── ecr/ # ECR repositories (api, frontend, admin, services, anisette)
├── acm/ # ACM TLS certificates (DNS validated)
├── alb/ # Application Load Balancer, listeners, target groups
├── waf/ # WAFv2 with AWSManagedRulesCommonRuleSet
├── database/ # EC2 host running PostgreSQL 16 + Valkey (bootstrapped via user-data)
├── ecs/ # ECS Fargate cluster, task definitions, services, job definitions
├── efs/ # EFS file systems for Anisette and worker persistent storage
├── logs/ # CloudWatch log groups for all services and jobs
└── secrets/ # Secrets Manager secrets with randomly generated passwords
3. Module Reference
3.1 KMS (modules/kms/)
Creates one CMK for the entire stack.
- Alias:
alias/tracker-restapi-staging - Annual automatic rotation enabled
- Key policy grants access to: root account, CloudWatch Logs service, ELB log delivery service
- Output
key_arnis passed to every other module that encrypts data at rest
3.2 Network (modules/network/)
VPC CIDR 10.40.0.0/24 split across two AZs:
| Subnet | CIDR | AZ |
|---|---|---|
| Public AZ-a | 10.40.0.0/26 |
eu-west-2a |
| Public AZ-b | 10.40.64.0/26 |
eu-west-2b |
| Private AZ-a | 10.40.128.0/26 |
eu-west-2a |
| Private AZ-b | 10.40.192.0/26 |
eu-west-2b |
- NAT Gateway in Public AZ-a (single NAT for cost; upgrade to per-AZ for HA)
- VPC Flow Logs → CloudWatch (
/aws/vpc/tracker-restapi-staging-flow-logs, 365-day retention) - Interface VPC endpoints for ECR, Secrets Manager, CloudWatch Logs, SSM (keeps traffic inside AWS network)
3.3 Security (modules/security/)
Five security groups with least-privilege rules:
| Group | Inbound | Outbound |
|---|---|---|
| ALB | TCP 80, 443 from 0.0.0.0/0 |
TCP 80 → ECS SG; TCP 8000 → ECS SG |
| ECS | Service ports from ALB; port 6969 from ECS (Anisette internal) | DB 5432; Cache 6379; EFS 2049; HTTPS 443 |
| Database | TCP 5432 from ECS SG | TCP 80, 443 (apt repos) |
| Cache/MemoryDB | TCP 6379 from ECS SG | None |
| EFS | TCP 2049 from ECS SG | None |
Note: All groups have ignore_changes = [ingress, egress] — manual console changes are preserved across terraform apply.
3.4 ECR (modules/ecr/)
Five repositories, KMS-encrypted, scan-on-push, immutable tags, lifecycle: retain 30 most recent images.
| Logical Key | Repository |
|---|---|
api |
tracker-api |
frontend |
tracker-frontend |
admin |
tracker-admin |
services |
tracker-services |
anisette |
tracker-anisette |
3.5 ACM (modules/acm/)
Two certificates via DNS validation:
tracker.staging.glimpse.technologytracker-admin.staging.glimpse.technology
Terraform outputs acm_validation_records — these CNAME records must be added to Cloudflare before the certificate can be issued. The aws_acm_certificate_validation resource waits for validation to complete before proceeding.
3.6 WAF (modules/waf/)
WAFv2 WebACL (regional, attached to ALB):
| Priority | Rule | Blocks |
|---|---|---|
| 10 | AWSManagedRulesCommonRuleSet |
SQLi, XSS, bad user-agents |
| 20 | AWSManagedRulesKnownBadInputsRuleSet |
Log4Shell, SSRF, malformed input |
Logs to aws-waf-logs-tracker-restapi-staging, 365-day retention.
3.7 ALB (modules/alb/)
Internet-facing ALB in both public subnets:
- Port 80 → redirect 301 to HTTPS
- Port 443 → route by host/path:
| Priority | Condition | Target Group | Container Port |
|---|---|---|---|
| 10 | Path /api/*, /api/v1/*, /docs, /redoc, /openapi.json |
api |
8000 |
| 15 | Host tracker-admin.staging.glimpse.technology |
admin |
80 |
| 20 | Path /admin*, /health* |
admin |
80 |
| — | Default (all others) | frontend |
80 |
Access logs stored in S3 (glimpse-tracker-restapi-staging-alb-logs-{account-id}), 90-day expiry, KMS-encrypted. Deletion protection enabled.
3.8 Database (modules/database/)
Single EC2 instance (t3.medium) in the first private subnet. No RDS — the host bootstraps via user-data on first boot.
Software installed via user-data:
- PostgreSQL 16 +
postgresql-contrib - TimescaleDB 2 for PostgreSQL 16
- PostGIS 3
- Valkey (Redis-compatible, from Redis 7 lineage)
Bootstrap sequence:
- Installs packages from official PostgreSQL apt repo
- Fetches
postgres_passwordandredis_passwordfrom Secrets Manager - Writes
postgresql.conf: shared_preload_libraries = 'timescaledb,pg_stat_statements'max_connections = 100shared_buffers = 256MBeffective_cache_size = 768MB- Writes
pg_hba.conf— scram-sha-256 auth, VPC CIDR only - Creates role
trackerand databasetracker - Enables extensions:
timescaledb,postgis,postgis_topology,pg_stat_statements - Configures Valkey: binds
0.0.0.0, port 6379, password auth, AOF persistence
Forcing a re-bootstrap: Increment bootstrap_revision in terraform.tfvars. This replaces the EC2 instance (new AMI + re-runs user-data). Use with caution — all data is on the instance's EBS volume and will be lost unless backed up first.
Access: No SSH. Use SSM:
aws ssm start-session \
--profile glimpse-staging \
--region eu-west-2 \
--target $(terraform -chdir=infra/envs/staging output -raw database_instance_id)
3.9 ECS (modules/ecs/)
Fargate cluster with Container Insights enabled.
IAM roles:
- Execution role — pulls images from ECR, fetches secrets from Secrets Manager, writes logs to CloudWatch
- Task role — SSM exec access, EFS mount permissions (when enabled)
Long-running services:
| Service | Image | CPU | Mem | Port | ALB |
|---|---|---|---|---|---|
api |
tracker-api:{tag} |
512 | 1024 MB | 8000 | Yes |
frontend |
tracker-frontend:{tag} |
256 | 512 MB | 80 | Yes |
admin |
tracker-admin:{tag} |
256 | 512 MB | 80 | Yes |
anisette * |
tracker-anisette:{tag} |
256 | 512 MB | 6969 | No (private DNS) |
tracker-fetcher-2 * |
tracker-services:{tag} |
256 | 512 MB | — | No |
unified-geofence * |
tracker-services:{tag} |
256 | 512 MB | — | No |
notification-service * |
tracker-services:{tag} |
256 | 512 MB | — | No |
materialized-view-service * |
tracker-services:{tag} |
256 | 512 MB | — | No |
* Conditional on enable_anisette / enable_workers
One-off job task definitions (pre-created, not auto-run):
| Job | Image | Command |
|---|---|---|
migrations |
tracker-api:{tag} |
bash ./scripts/run_migrations.sh |
seed-users |
tracker-api:{tag} |
python ./scripts/seed_default_users.py |
Jobs use the same ECS cluster, security group, and subnets as services. Their task definitions are updated whenever the api image tag changes.
Secrets injected as environment variables (via Secrets Manager valueFrom):
POSTGRES_PASSWORD— fetched at task start, not baked into imageSECRET_KEY— app signing keyREDIS_PASSWORD— Valkey auth token
3.10 EFS (modules/efs/)
Created when enable_workers || enable_anisette:
| Instance | Root Dir | Used By |
|---|---|---|
anisette_storage |
/anisette |
anisette-v3 service |
worker_storage |
/data |
tracker-fetcher-2 service |
Access points use UID/GID 1000, permissions 0750. Mount targets in both private subnets. KMS-encrypted.
3.11 Logs (modules/logs/)
CloudWatch log group per service and per job, all KMS-encrypted, 365-day retention.
| Group | Path |
|---|---|
| api | /aws/ecs/tracker-restapi-staging/api |
| frontend | /aws/ecs/tracker-restapi-staging/frontend |
| admin | /aws/ecs/tracker-restapi-staging/admin |
| anisette | /aws/ecs/tracker-restapi-staging/anisette |
| migrations | /aws/ecs/tracker-restapi-staging/migrations |
| seed-users | /aws/ecs/tracker-restapi-staging/seed-users |
| tracker-fetcher-2 | /aws/ecs/tracker-restapi-staging/tracker-fetcher-2 |
| unified-geofence | /aws/ecs/tracker-restapi-staging/unified-geofence |
| notification-service | /aws/ecs/tracker-restapi-staging/notification-service |
| materialized-view-service | /aws/ecs/tracker-restapi-staging/materialized-view-service |
3.12 Secrets (modules/secrets/)
Three secrets created with randomly generated 32-character passwords (mixed case, digits, specials):
| Logical Key | Secret Name in Secrets Manager |
|---|---|
postgres_password |
tracker/staging/postgres-password |
secret_key |
tracker/staging/secret-key |
redis_password |
tracker/staging/redis-password |
Passwords are generated once by Terraform. They are never stored in .tfvars, .env, or any repository file. Applications and the database bootstrap script fetch them from Secrets Manager at runtime.
4. Prerequisites
AWS CLI & Profile
Configure an AWS CLI profile named glimpse-staging:
aws configure --profile glimpse-staging
# AWS Access Key ID: ...
# AWS Secret Access Key: ...
# Default region: eu-west-2
# Default output: json
Verify access:
aws sts get-caller-identity --profile glimpse-staging
Terraform
Version >= 1.7.0. Install via tfenv or directly:
tfenv install 1.7.0
tfenv use 1.7.0
terraform --version
Docker Buildx
Required for multi-platform builds:
docker buildx version
# If missing: docker buildx install
S3 Backend Bootstrap (one-time, before first terraform init)
The S3 bucket and DynamoDB lock table must exist before Terraform can store state. Create them manually or with a bootstrap script:
aws s3api create-bucket \
--profile glimpse-staging \
--region eu-west-2 \
--bucket glimpse-staging-tracker-restapi-tfstate \
--create-bucket-configuration LocationConstraint=eu-west-2
aws s3api put-bucket-versioning \
--profile glimpse-staging \
--bucket glimpse-staging-tracker-restapi-tfstate \
--versioning-configuration Status=Enabled
aws dynamodb create-table \
--profile glimpse-staging \
--region eu-west-2 \
--table-name glimpse-staging-tracker-restapi-tflock \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
5. First-Time Bootstrap
Step 1 — Prepare local config files
cd infra/envs/staging
# Backend config (never commit this file — contains account-specific paths)
cp backend.hcl.example backend.hcl
Edit backend.hcl:
bucket = "glimpse-staging-tracker-restapi-tfstate"
key = "staging/terraform.tfstate"
region = "eu-west-2"
profile = "glimpse-staging"
dynamodb_table = "glimpse-staging-tracker-restapi-tflock"
encrypt = true
Edit or verify terraform.tfvars:
project_name = "tracker-restapi"
environment = "staging"
aws_region = "eu-west-2"
vpc_cidr = "10.40.0.0/24"
public_hostname = "tracker.staging.glimpse.technology"
admin_hostname = "tracker-admin.staging.glimpse.technology"
enable_anisette = true
enable_workers = true
secret_names = {
postgres_password = "tracker/staging/postgres-password"
secret_key = "tracker/staging/secret-key"
redis_password = "tracker/staging/redis-password"
}
Step 2 — Set placeholder image tags
Before any images exist in ECR, use the bootstrap tag so task definitions are valid:
cat > image-tags.auto.tfvars.json <<'EOF'
{
"image_tags": {
"api": "sha-bootstrap",
"frontend": "sha-bootstrap",
"admin": "sha-bootstrap",
"services": "sha-bootstrap",
"anisette": "sha-bootstrap"
}
}
EOF
Step 3 — Initialise Terraform
terraform init -backend-config=backend.hcl
Step 4 — Plan and apply
terraform plan -out=tfplan
terraform apply tfplan
The first apply provisions everything: KMS, VPC, subnets, security groups, ECR, ACM certificates, WAF, ALB, the database EC2 instance, EFS, log groups, Secrets Manager secrets, and ECS task definitions. Services do not start successfully until images exist in ECR (see next steps).
Expected apply time: 12–18 minutes (ACM DNS validation is the longest step).
Step 5 — Add DNS validation records to Cloudflare
After apply completes, get the validation records:
terraform output -json acm_validation_records
terraform output -json acm_admin_validation_records
Each record has name, type (CNAME), and value. Add both to Cloudflare for glimpse.technology. ACM validates automatically once the records propagate (usually < 5 minutes).
Step 6 — Add ALB hostname to Cloudflare
terraform output -raw alb_dns_name
Create a CNAME in Cloudflare:
tracker.staging→<alb_dns_name>tracker-admin.staging→<alb_dns_name>(or a CNAME of the above)
Set proxy mode to DNS only (grey cloud) to avoid Cloudflare interfering with ALB SSL termination.
Step 7 — Authenticate with ECR and build images
For a shorter developer-oriented summary of this workflow, see Deployment Pipeline.
make staging-build
This command:
- builds the current commit once
- pushes the immutable tag to the staging ECR
- mirrors the same tag into the production ECR
- writes
infra/envs/staging/image-tags.auto.tfvars.json
Or use the script directly:
python scripts/build_staging_images.py \
--push \
--aws-profile glimpse-staging \
--mirror-aws-profile glimpse-prod \
--target api
If you need the lower-level docker buildx command, the scripts pass these variables through. They set the registry prefix to the target environment's ECR host and pin the image with the immutable sha-<commit> tag.
Step 8 — Update image tags and re-apply
Apply Terraform in staging after the build script has refreshed the tag map:
terraform -chdir=infra/envs/staging apply -auto-approve
This registers new task definition revisions pointing at the real image tags. ECS rolling-deploys each service automatically (50% min healthy, 200% max).
Step 9 — Promote to production
Once staging is validated, promote the exact same tag map to production:
make prod-promote
terraform -chdir=infra/envs/prod apply -auto-approve
This keeps production on the same commit-tested image set as staging without rebuilding.
Step 10 — Direct production operations
make prod-build, make prod-plan, and make prod-apply remain available for manual production workflows, but the normal release path is staging-first promotion.
Step 11 — Run database migrations
On first deploy, run migrations before traffic reaches the API:
# Get required values from Terraform outputs
CLUSTER=$(terraform -chdir=infra/envs/staging output -raw ecs_cluster_arn)
TASK_DEF=$(terraform -chdir=infra/envs/staging output -raw migration_task_definition_arn)
SUBNETS=$(terraform -chdir=infra/envs/staging output -json private_subnet_ids | jq -r 'join(",")')
SG=$(terraform -chdir=infra/envs/staging output -raw ecs_security_group_id)
aws ecs run-task \
--profile glimpse-staging \
--region eu-west-2 \
--cluster "$CLUSTER" \
--task-definition "$TASK_DEF" \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}"
Monitor the migration logs:
aws logs tail /aws/ecs/tracker-restapi-staging/migrations \
--profile glimpse-staging \
--region eu-west-2 \
--follow
Step 12 — Seed default users
SEED_TASK_DEF=$(terraform -chdir=infra/envs/staging output -json ecs_task_definition_arns | jq -r '."seed-users"')
aws ecs run-task \
--profile glimpse-staging \
--region eu-west-2 \
--cluster "$CLUSTER" \
--task-definition "$SEED_TASK_DEF" \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}"
Monitor:
aws logs tail /aws/ecs/tracker-restapi-staging/seed-users \
--profile glimpse-staging \
--region eu-west-2 \
--follow
6. Daily Developer Workflow
Check current state
terraform -chdir=infra/envs/staging show | grep -A2 "image_tag"
# or
cat infra/envs/staging/image-tags.auto.tfvars.json
# or
cat infra/envs/prod/image-tags.auto.tfvars.json
View service logs
# API service
aws logs tail /aws/ecs/tracker-restapi-staging/api \
--profile glimpse-staging --region eu-west-2 --follow
# Worker (swap group name as needed)
aws logs tail /aws/ecs/tracker-restapi-staging/tracker-fetcher-2 \
--profile glimpse-staging --region eu-west-2 --follow
Connect to a running container (ECS Exec)
CLUSTER=$(terraform -chdir=infra/envs/staging output -raw ecs_cluster_arn)
# Find the task ID
TASK_ID=$(aws ecs list-tasks \
--profile glimpse-staging \
--region eu-west-2 \
--cluster "$CLUSTER" \
--service-name tracker-restapi-staging-api \
--query 'taskArns[0]' --output text | awk -F/ '{print $NF}')
aws ecs execute-command \
--profile glimpse-staging \
--region eu-west-2 \
--cluster "$CLUSTER" \
--task "$TASK_ID" \
--container api \
--command "/bin/bash" \
--interactive
Bootstrap an Anisette account with ECS Exec
Use ./scripts/bootstrap_anisette_account.sh when you need to create or refresh the stored Anisette account for the worker stack.
The script starts a temporary Fargate task from the tracker-fetcher-2 task definition with ECS Exec enabled, opens an interactive shell in the container, and then runs the bootstrap command inside that container.
By default it targets the staging Terraform root. If you export AWS_PROFILE=glimpse-prod, it will default to infra/envs/prod unless you override TERRAFORM_DIR explicitly.
For production use, the prod Terraform root must already have the worker services and Anisette enabled so the script can read ecs_task_definition_arns, ecs_security_group_id, private_subnet_ids, and anisette_service_url from state.
./scripts/bootstrap_anisette_account.sh
What the script does:
- Reads the staging Terraform outputs from
infra/envs/staging. - Starts a one-off
tracker-fetcher-2task with--enable-execute-command. - Waits for the ECS Exec agent to become ready.
- Opens
/bin/shin the running container. - Inside the shell, exports:
ANISETTE_SERVERACCOUNT_STORE_PATH- Runs:
python scripts/authenticate_findmy.py
The default account file path is /data/account.json. If the bootstrap succeeds, the task can optionally trigger a fresh deployment of the worker service so it picks up the new account state.
Requirements:
aws,jq, andterraformavailable on the machine running the script- AWS credentials with permission to run ECS tasks, use ECS Exec, and stop the temporary task
- ECS Exec enabled for the task definition and cluster
enable_anisette = truein the target Terraform environment
If you need to run the same steps manually, use the same temporary worker task pattern rather than the long-running API service. The bootstrap script depends on the container image that includes scripts/authenticate_findmy.py.
Connect to the database host
DB_INSTANCE=$(terraform -chdir=infra/envs/staging output -raw database_instance_id)
aws ssm start-session \
--profile glimpse-staging \
--region eu-west-2 \
--target "$DB_INSTANCE"
# Once inside:
sudo -u postgres psql -d tracker
Check service health
# List all running tasks
aws ecs list-tasks \
--profile glimpse-staging \
--region eu-west-2 \
--cluster tracker-restapi-staging
# Describe specific service
aws ecs describe-services \
--profile glimpse-staging \
--region eu-west-2 \
--cluster tracker-restapi-staging \
--services tracker-restapi-staging-api
Apply infrastructure changes (no image changes)
terraform -chdir=infra/envs/staging plan
terraform -chdir=infra/envs/staging apply
7. Deploying Container Updates
Build and push a new image
make staging-build
Update the image tag map
cat infra/envs/staging/image-tags.auto.tfvars.json
Or edit infra/envs/staging/image-tags.auto.tfvars.json directly:
{
"image_tags": {
"api": "sha-abc1234",
"frontend": "sha-abc1234",
"admin": "sha-abc1234",
"services": "sha-abc1234",
"anisette": "sha-bootstrap"
}
}
Apply — registers new task definition revisions and triggers rolling deploy
terraform -chdir=infra/envs/staging apply -auto-approve
ECS rolls each service to the new task definition. Deployment settings: 50% minimum healthy, 200% maximum — so new tasks start before old ones stop.
If the new image requires a schema migration
Run migrations before or immediately after applying the new image tag. The migration job uses the same image as the API service:
# (re-run Step 9 from the bootstrap section)
Roll back to a previous image tag
ECR tags are immutable but old revisions remain. Update image-tags.auto.tfvars.json to the previous tag and re-apply:
jq '.image_tags.api = "sha-previoussha"' \
infra/envs/staging/image-tags.auto.tfvars.json > /tmp/t.json \
&& mv /tmp/t.json infra/envs/staging/image-tags.auto.tfvars.json
terraform -chdir=infra/envs/staging apply -auto-approve
8. Running Database Migrations
Migrations run as a Fargate one-off task using the migrations job task definition. The task runs bash ./scripts/run_migrations.sh inside the tracker-api container.
When to run
- After the first deploy (bootstrap)
- After any deploy that includes a schema-altering change
- After manually rolling back to a previous API version (check if a down-migration is needed)
Run command
CLUSTER=$(terraform -chdir=infra/envs/staging output -raw ecs_cluster_arn)
TASK_DEF=$(terraform -chdir=infra/envs/staging output -raw migration_task_definition_arn)
SUBNETS=$(terraform -chdir=infra/envs/staging output -json private_subnet_ids | jq -r 'join(",")')
SG=$(terraform -chdir=infra/envs/staging output -raw ecs_security_group_id)
aws ecs run-task \
--profile glimpse-staging \
--region eu-west-2 \
--cluster "$CLUSTER" \
--task-definition "$TASK_DEF" \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}"
Monitor
aws logs tail /aws/ecs/tracker-restapi-staging/migrations \
--profile glimpse-staging \
--region eu-west-2 \
--follow
Verify success
Check the task exit code:
aws ecs describe-tasks \
--profile glimpse-staging \
--region eu-west-2 \
--cluster "$CLUSTER" \
--tasks <TASK_ARN>
A stopCode of EssentialContainerExited with exitCode: 0 means success.
9. Seeding Default Users
Seeds initial users/roles needed for the application to be usable. Runs python ./scripts/seed_default_users.py inside the tracker-api container.
When to run
- Once after the first deploy
- When adding a new environment that needs baseline data
Run command
CLUSTER=$(terraform -chdir=infra/envs/staging output -raw ecs_cluster_arn)
SEED_TASK_DEF=$(terraform -chdir=infra/envs/staging output -json ecs_task_definition_arns | jq -r '."seed-users"')
SUBNETS=$(terraform -chdir=infra/envs/staging output -json private_subnet_ids | jq -r 'join(",")')
SG=$(terraform -chdir=infra/envs/staging output -raw ecs_security_group_id)
aws ecs run-task \
--profile glimpse-staging \
--region eu-west-2 \
--cluster "$CLUSTER" \
--task-definition "$SEED_TASK_DEF" \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}"
Monitor
aws logs tail /aws/ecs/tracker-restapi-staging/seed-users \
--profile glimpse-staging \
--region eu-west-2 \
--follow
10. Destroying the Environment
The ALB has deletion protection enabled. Terraform will fail if you try to destroy without first disabling it.
# Step 1: Remove ALB deletion protection
terraform -chdir=infra/envs/staging apply \
-target=module.alb.aws_lb.this \
-var="enable_deletion_protection=false"
# Step 2: Destroy everything
terraform -chdir=infra/envs/staging destroy
Warning: Destroying the database module permanently deletes the EC2 instance and its EBS volume. Take a manual snapshot or pg_dump backup first if the data is needed.
11. Troubleshooting
ECS service stuck in PENDING or tasks keep stopping
- Check the service events:
aws ecs describe-services \
--profile glimpse-staging --region eu-west-2 \
--cluster tracker-restapi-staging \
--services tracker-restapi-staging-api \
--query 'services[0].events[0:5]'
-
Check container logs for the stopped task — tasks log even when they exit immediately.
-
Common causes:
- Image tag doesn't exist in ECR → push the image or fix the tag in
image-tags.auto.tfvars.json - Secret ARN wrong →
terraform output secret_arnsand compare with task definition - Not enough capacity in subnets → unlikely with Fargate but check VPC endpoints
ACM certificate stuck in PENDING_VALIDATION
DNS CNAME records haven't been added to Cloudflare, or haven't propagated yet. Run:
dig CNAME <validation_record_name>
If no answer, check Cloudflare. ACM checks every few minutes once records are present.
Database bootstrap failed (instance running but PostgreSQL not available)
SSM into the instance and check the user-data log:
sudo cat /var/log/cloud-init-output.log | tail -100
If the aws secretsmanager get-secret-value call failed (IAM not ready at boot time), increment bootstrap_revision in terraform.tfvars and re-apply. This replaces the instance and re-runs user-data.
Terraform plan shows unexpected replacements
If Terraform wants to replace the database EC2 instance unexpectedly, check:
bootstrap_revisionhasn't changed accidentally- AMI data source hasn't resolved to a new AMI (update the AMI ID to pin it)
Can't connect to database from ECS
- Verify security group rules (ECS SG → DB SG on port 5432)
- Check the
POSTGRES_SERVERenvironment variable in the task definition matchesdatabase_private_dns_nameoutput - SSM into the DB host and verify PostgreSQL is listening:
sudo ss -tlnp | grep 5432
Worker services not processing tasks
- Check Anisette is running (workers depend on it for Apple auth):
aws ecs describe-services \
--profile glimpse-staging --region eu-west-2 \
--cluster tracker-restapi-staging \
--services tracker-restapi-staging-anisette
-
Verify
ANISETTE_SERVERenv var in the worker task definition points tohttp://anisette-v3.anisette-v3.local:6969 -
Check EFS mount is healthy — if the
/dataEFS mount fails, the task won't start
Viewing all Terraform outputs
terraform -chdir=infra/envs/staging output
# or a specific value:
terraform -chdir=infra/envs/staging output -raw alb_dns_name