Database Connection Pooling and PgBouncer

Overview

This document describes the PostgreSQL connection pooling strategy used in the Tracker API, including integration with PgBouncer for production deployments.

What is Connection Pooling?

Connection pooling maintains a pool of reusable database connections rather than creating a new connection for each request. This provides:

Performance: Eliminates the overhead of establishing new connections
Resource Management: Limits total connections to prevent database overload
Connection Reuse: Efficiently shares connections across application workers
Stability: Handles connection failures with automatic retry logic

Implementation

SQLAlchemy Connection Pooling

We use SQLAlchemy's built-in QueuePool for connection management:

Sync Engine: PostgreSQL connections via psycopg3
Async Engine: PostgreSQL connections via asyncpg
Connection Lifecycle: Automatic management with retry logic

Configuration Modes

The application automatically configures connection pooling based on the environment:

Development Mode (Direct PostgreSQL)

ENVIRONMENT = "development"  # or "dev" or "test"
USE_PGBOUNCER = False

# Pool configuration
pool_size = 10          # Base pool size
max_overflow = 10       # Additional connections when needed
pool_timeout = 30       # Seconds to wait for connection
pool_recycle = 3600     # Recycle connections after 1 hour

Connection Math:

4 workers × 10 base = 40 connections
4 workers × 10 overflow = 40 additional (max 80 total)
Leaves room for monitoring, admin connections

Production Mode (With PgBouncer)

ENVIRONMENT = "production"
USE_PGBOUNCER = True

# Pool configuration
pool_size = 20          # Larger pool with PgBouncer
max_overflow = 30       # More aggressive overflow
pool_timeout = 30       # Seconds to wait for connection
pool_recycle = 3600     # Recycle connections after 1 hour
prepare_threshold = 0   # CRITICAL: Disable prepared statements!

Connection Math:

4 workers × 20 base = 80 connections to PgBouncer
4 workers × 30 overflow = 120 additional (max 200 to PgBouncer)
PgBouncer multiplexes down to ~25-50 actual PostgreSQL connections

PgBouncer Integration

Why PgBouncer?

PgBouncer is a lightweight connection pooler that sits between the application and PostgreSQL:

[App Workers] → [PgBouncer] → [PostgreSQL]
200 app conns → 25-50 DB conns

Benefits:

Connection Multiplexing: Many app connections → few DB connections
Resource Efficiency: PostgreSQL uses less memory per connection
Better Scaling: Support more application workers
Connection Limiting: Protect PostgreSQL from connection exhaustion

Critical Configuration

Application Side (SQLAlchemy)

Most Important Setting:

connect_args = {
    "prepare_threshold": 0,  # CRITICAL for PgBouncer transaction mode
    "options": "-c idle_in_transaction_session_timeout=60000",
    "connect_timeout": 10,
}

Why prepare_threshold=0?

PgBouncer's transaction mode doesn't support prepared statements because:

Prepared statements are session-specific
PgBouncer multiplexes connections, breaking the session continuity
Without this setting, you'll get "prepared statement does not exist" errors

PgBouncer Side

Recommended Configuration:

[pgbouncer]
# Pool mode - MUST be transaction for our setup
pool_mode = transaction

# Connection limits
max_client_conn = 1000        # Total app connections
default_pool_size = 25        # PostgreSQL connections per database
reserve_pool_size = 5         # Emergency reserve

# Timeouts
server_idle_timeout = 600     # Close idle server connections (10 min)
server_lifetime = 3600        # Recycle connections (1 hour)
query_timeout = 0             # No query timeout (handle in app)

# Authentication
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

Pool Mode Comparison

Mode	Use Case	Prepared Statements	Transaction Scope
transaction	✅ Recommended	❌ Not supported	Per transaction
session	Long transactions	✅ Supported	Per session
statement	Very simple queries	❌ Not supported	Per statement

Monitoring

Pool Health Endpoint

Development/Test Only:

GET /health/db-pool

Response:

{
  "status": "healthy",
  "pool_stats": {
    "size": 10,
    "checked_in": 8,
    "checked_out": 2,
    "overflow": 0,
    "total": 10,
    "utilization_percent": 20.0
  },
  "configuration": {
    "using_pgbouncer": false,
    "environment": "development"
  }
}

Status Levels:

healthy: Utilization < 80%, connections available
warning: Utilization > 80%
critical: No connections available (checked_in = 0)

Debug Logging

In development, connection lifecycle events are logged:

DEBUG: New database connection established (pid: 140234567890)
DEBUG: Connection checked out from pool (pid: 140234567890)
DEBUG: Connection returned to pool (pid: 140234567890)

PgBouncer Monitoring

Check PgBouncer stats:

psql -h localhost -p 6432 -U pgbouncer pgbouncer

SHOW POOLS;
SHOW STATS;
SHOW CLIENTS;
SHOW SERVERS;

Configuration Files

Modified Files

app/core/config.py - USE_PGBOUNCER detection
app/core/database.py - Main API engine configuration
services/shared/config.py - Services USE_PGBOUNCER detection
services/shared/database.py - Services engine configuration
services/materialized_view_service/resilient_jobstore.py - APScheduler engine

Environment Variables

Required:

POSTGRES_SERVER=localhost         # Or PgBouncer host
POSTGRES_PORT=5432               # Or PgBouncer port (e.g., 6432)
POSTGRES_USER=tracker
POSTGRES_PASSWORD=SuperSecretKey
POSTGRES_DB=tracker

For PgBouncer:

ENVIRONMENT=production           # Enables USE_PGBOUNCER=True
POSTGRES_SERVER=pgbouncer-host  # PgBouncer hostname
POSTGRES_PORT=6432              # PgBouncer port (not 5432)

Best Practices

Connection Management

✅ DO:

Use context managers for database sessions
Keep transactions short
Handle exceptions properly with rollback
Close sessions in finally blocks (automatic with our setup)

❌ DON'T:

Hold connections during external API calls
Keep transactions open during long operations
Share connections across threads
Return unconsumed cursors

Example: Good Transaction Management

# ❌ BAD - Long transaction holding locks
def bad_process_order(order_id: int):
    with get_db() as db:
        order = db.query(Order).filter_by(id=order_id).first()

        # Transaction open during slow external call!
        payment_result = stripe_api.charge(order.amount)  # 3-5 seconds

        order.status = 'completed'
        db.commit()

# ✅ GOOD - Minimal transaction scope
def good_process_order(order_id: int):
    # Read data without holding a transaction
    with get_db() as db:
        order = db.query(Order).filter_by(id=order_id).first()

    # External operations outside transaction
    payment_result = stripe_api.charge(order.amount)

    # Quick transaction just for the update
    with get_db() as db:
        order = db.query(Order).filter_by(id=order_id).first()
        order.status = 'completed'
        db.commit()

Troubleshooting

Common Issues

"prepared statement does not exist"

Cause: Prepared statements enabled with PgBouncer transaction mode

Solution: Ensure prepare_threshold=0 in connect_args

connect_args = {
    "prepare_threshold": 0,  # Disable prepared statements
}

"connection timeout" errors

Cause: Pool exhaustion or slow queries

Solutions:

Check pool utilization: GET /health/db-pool
Increase pool size (with PgBouncer only)
Optimize slow queries
Reduce transaction scope

Stuck transactions

Cause: Transactions held open too long

Solution: Already configured with idle_in_transaction_session_timeout=60000

This automatically terminates transactions idle for >60 seconds.

High connection count

Cause: Connections not being returned to pool

Solutions:

Ensure all database sessions use context managers
Check for connection leaks in custom code
Review debug logs for unreturned connections

Diagnostic Queries

Check active connections:

SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query,
    state_change
FROM pg_stat_activity
WHERE datname = 'tracker'
ORDER BY state_change DESC;

Find long-running queries:

SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY duration DESC
LIMIT 10;

Architecture Diagram

Without PgBouncer (Development)

┌─────────────────────┐
│ App Worker 1        │──┐
│ Pool: 10 + 10       │  │
├─────────────────────┤  │
│ App Worker 2        │  ├──→ ┌──────────────────┐
│ Pool: 10 + 10       │  │    │ PostgreSQL       │
├─────────────────────┤  │    │ max_connections  │
│ App Worker 3        │  │    │ = 100            │
│ Pool: 10 + 10       │  │    └──────────────────┘
├─────────────────────┤  │
│ App Worker 4        │  │
│ Pool: 10 + 10       │──┘
└─────────────────────┘

Total: 40-80 connections to PostgreSQL

With PgBouncer (Production)

┌─────────────────────┐
│ App Worker 1        │──┐
│ Pool: 20 + 30       │  │
├─────────────────────┤  │
│ App Worker 2        │  │    ┌──────────────┐
│ Pool: 20 + 30       │  ├───→│  PgBouncer   │
├─────────────────────┤  │    │  max: 1000   │
│ App Worker 3        │  │    │  pool: 25    │──→ ┌──────────────┐
│ Pool: 20 + 30       │  │    └──────────────┘    │ PostgreSQL   │
├─────────────────────┤  │           ↓            │ 25-50 conns  │
│ App Worker 4        │  │    Connection          └──────────────┘
│ Pool: 20 + 30       │──┘    Multiplexing
└─────────────────────┘

Total: 80-200 app connections → 25-50 PostgreSQL connections

Performance Tuning

Connection Pool Sizing

Formula for Direct PostgreSQL:

pool_size = (PostgreSQL max_connections - admin_reserve) / num_workers / safety_margin

Example:

PostgreSQL max_connections = 100
Admin reserve = 10
Workers = 4
Safety margin = 1.5

Result: pool_size = (100 - 10) / 4 / 1.5 = 15 per worker

With PgBouncer:

Can be more aggressive since PgBouncer handles PostgreSQL connections:

pool_size = PgBouncer max_client_conn / num_workers * 0.4

Pool Recycling

Current Settings:

pool_recycle = 3600 (1 hour)
Balances stability vs connection churn
Prevents stale connections
Aligns with PgBouncer server_lifetime

Pre-Ping

Enabled: pool_pre_ping=True

Tests connections before use, automatically handling:

Network interruptions
Database restarts
Stale connections