Skip to content

Database Connection Pooling and PgBouncer

Overview

This document describes the PostgreSQL connection pooling strategy used in the Tracker API, including integration with PgBouncer for production deployments.

What is Connection Pooling?

Connection pooling maintains a pool of reusable database connections rather than creating a new connection for each request. This provides:

  • Performance: Eliminates the overhead of establishing new connections
  • Resource Management: Limits total connections to prevent database overload
  • Connection Reuse: Efficiently shares connections across application workers
  • Stability: Handles connection failures with automatic retry logic

Implementation

SQLAlchemy Connection Pooling

We use SQLAlchemy's built-in QueuePool for connection management:

  • Sync Engine: PostgreSQL connections via psycopg3
  • Async Engine: PostgreSQL connections via asyncpg
  • Connection Lifecycle: Automatic management with retry logic

Configuration Modes

The application automatically configures connection pooling based on the environment:

Development Mode (Direct PostgreSQL)

ENVIRONMENT = "development"  # or "dev" or "test"
USE_PGBOUNCER = False

# Pool configuration
pool_size = 10          # Base pool size
max_overflow = 10       # Additional connections when needed
pool_timeout = 30       # Seconds to wait for connection
pool_recycle = 3600     # Recycle connections after 1 hour

Connection Math:

  • 4 workers × 10 base = 40 connections
  • 4 workers × 10 overflow = 40 additional (max 80 total)
  • Leaves room for monitoring, admin connections

Production Mode (With PgBouncer)

ENVIRONMENT = "production"
USE_PGBOUNCER = True

# Pool configuration
pool_size = 20          # Larger pool with PgBouncer
max_overflow = 30       # More aggressive overflow
pool_timeout = 30       # Seconds to wait for connection
pool_recycle = 3600     # Recycle connections after 1 hour
prepare_threshold = 0   # CRITICAL: Disable prepared statements!

Connection Math:

  • 4 workers × 20 base = 80 connections to PgBouncer
  • 4 workers × 30 overflow = 120 additional (max 200 to PgBouncer)
  • PgBouncer multiplexes down to ~25-50 actual PostgreSQL connections

PgBouncer Integration

Why PgBouncer?

PgBouncer is a lightweight connection pooler that sits between the application and PostgreSQL:

[App Workers] → [PgBouncer] → [PostgreSQL]
200 app conns → 25-50 DB conns

Benefits:

  1. Connection Multiplexing: Many app connections → few DB connections
  2. Resource Efficiency: PostgreSQL uses less memory per connection
  3. Better Scaling: Support more application workers
  4. Connection Limiting: Protect PostgreSQL from connection exhaustion

Critical Configuration

Application Side (SQLAlchemy)

Most Important Setting:

connect_args = {
    "prepare_threshold": 0,  # CRITICAL for PgBouncer transaction mode
    "options": "-c idle_in_transaction_session_timeout=60000",
    "connect_timeout": 10,
}

Why prepare_threshold=0?

PgBouncer's transaction mode doesn't support prepared statements because:

  • Prepared statements are session-specific
  • PgBouncer multiplexes connections, breaking the session continuity
  • Without this setting, you'll get "prepared statement does not exist" errors

PgBouncer Side

Recommended Configuration:

[pgbouncer]
# Pool mode - MUST be transaction for our setup
pool_mode = transaction

# Connection limits
max_client_conn = 1000        # Total app connections
default_pool_size = 25        # PostgreSQL connections per database
reserve_pool_size = 5         # Emergency reserve

# Timeouts
server_idle_timeout = 600     # Close idle server connections (10 min)
server_lifetime = 3600        # Recycle connections (1 hour)
query_timeout = 0             # No query timeout (handle in app)

# Authentication
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

Pool Mode Comparison

Mode Use Case Prepared Statements Transaction Scope
transaction Recommended ❌ Not supported Per transaction
session Long transactions ✅ Supported Per session
statement Very simple queries ❌ Not supported Per statement

Monitoring

Pool Health Endpoint

Development/Test Only:

GET /health/db-pool

Response:

{
  "status": "healthy",
  "pool_stats": {
    "size": 10,
    "checked_in": 8,
    "checked_out": 2,
    "overflow": 0,
    "total": 10,
    "utilization_percent": 20.0
  },
  "configuration": {
    "using_pgbouncer": false,
    "environment": "development"
  }
}

Status Levels:

  • healthy: Utilization < 80%, connections available
  • warning: Utilization > 80%
  • critical: No connections available (checked_in = 0)

Debug Logging

In development, connection lifecycle events are logged:

DEBUG: New database connection established (pid: 140234567890)
DEBUG: Connection checked out from pool (pid: 140234567890)
DEBUG: Connection returned to pool (pid: 140234567890)

PgBouncer Monitoring

Check PgBouncer stats:

psql -h localhost -p 6432 -U pgbouncer pgbouncer
SHOW POOLS;
SHOW STATS;
SHOW CLIENTS;
SHOW SERVERS;

Configuration Files

Modified Files

  1. app/core/config.py - USE_PGBOUNCER detection
  2. app/core/database.py - Main API engine configuration
  3. services/shared/config.py - Services USE_PGBOUNCER detection
  4. services/shared/database.py - Services engine configuration
  5. services/materialized_view_service/resilient_jobstore.py - APScheduler engine

Environment Variables

Required:

POSTGRES_SERVER=localhost         # Or PgBouncer host
POSTGRES_PORT=5432               # Or PgBouncer port (e.g., 6432)
POSTGRES_USER=tracker
POSTGRES_PASSWORD=SuperSecretKey
POSTGRES_DB=tracker

For PgBouncer:

ENVIRONMENT=production           # Enables USE_PGBOUNCER=True
POSTGRES_SERVER=pgbouncer-host  # PgBouncer hostname
POSTGRES_PORT=6432              # PgBouncer port (not 5432)

Best Practices

Connection Management

DO:

  • Use context managers for database sessions
  • Keep transactions short
  • Handle exceptions properly with rollback
  • Close sessions in finally blocks (automatic with our setup)

DON'T:

  • Hold connections during external API calls
  • Keep transactions open during long operations
  • Share connections across threads
  • Return unconsumed cursors

Example: Good Transaction Management

# ❌ BAD - Long transaction holding locks
def bad_process_order(order_id: int):
    with get_db() as db:
        order = db.query(Order).filter_by(id=order_id).first()

        # Transaction open during slow external call!
        payment_result = stripe_api.charge(order.amount)  # 3-5 seconds

        order.status = 'completed'
        db.commit()

# ✅ GOOD - Minimal transaction scope
def good_process_order(order_id: int):
    # Read data without holding a transaction
    with get_db() as db:
        order = db.query(Order).filter_by(id=order_id).first()

    # External operations outside transaction
    payment_result = stripe_api.charge(order.amount)

    # Quick transaction just for the update
    with get_db() as db:
        order = db.query(Order).filter_by(id=order_id).first()
        order.status = 'completed'
        db.commit()

Troubleshooting

Common Issues

"prepared statement does not exist"

Cause: Prepared statements enabled with PgBouncer transaction mode

Solution: Ensure prepare_threshold=0 in connect_args

connect_args = {
    "prepare_threshold": 0,  # Disable prepared statements
}

"connection timeout" errors

Cause: Pool exhaustion or slow queries

Solutions:

  1. Check pool utilization: GET /health/db-pool
  2. Increase pool size (with PgBouncer only)
  3. Optimize slow queries
  4. Reduce transaction scope

Stuck transactions

Cause: Transactions held open too long

Solution: Already configured with idle_in_transaction_session_timeout=60000

This automatically terminates transactions idle for >60 seconds.

High connection count

Cause: Connections not being returned to pool

Solutions:

  1. Ensure all database sessions use context managers
  2. Check for connection leaks in custom code
  3. Review debug logs for unreturned connections

Diagnostic Queries

Check active connections:

SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query,
    state_change
FROM pg_stat_activity
WHERE datname = 'tracker'
ORDER BY state_change DESC;

Find long-running queries:

SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY duration DESC
LIMIT 10;

Architecture Diagram

Without PgBouncer (Development)

┌─────────────────────┐
│ App Worker 1        │──┐
│ Pool: 10 + 10       │  │
├─────────────────────┤  │
│ App Worker 2        │  ├──→ ┌──────────────────┐
│ Pool: 10 + 10       │  │    │ PostgreSQL       │
├─────────────────────┤  │    │ max_connections  │
│ App Worker 3        │  │    │ = 100            │
│ Pool: 10 + 10       │  │    └──────────────────┘
├─────────────────────┤  │
│ App Worker 4        │  │
│ Pool: 10 + 10       │──┘
└─────────────────────┘

Total: 40-80 connections to PostgreSQL

With PgBouncer (Production)

┌─────────────────────┐
│ App Worker 1        │──┐
│ Pool: 20 + 30       │  │
├─────────────────────┤  │
│ App Worker 2        │  │    ┌──────────────┐
│ Pool: 20 + 30       │  ├───→│  PgBouncer   │
├─────────────────────┤  │    │  max: 1000   │
│ App Worker 3        │  │    │  pool: 25    │──→ ┌──────────────┐
│ Pool: 20 + 30       │  │    └──────────────┘    │ PostgreSQL   │
├─────────────────────┤  │           ↓            │ 25-50 conns  │
│ App Worker 4        │  │    Connection          └──────────────┘
│ Pool: 20 + 30       │──┘    Multiplexing
└─────────────────────┘

Total: 80-200 app connections → 25-50 PostgreSQL connections

Performance Tuning

Connection Pool Sizing

Formula for Direct PostgreSQL:

pool_size = (PostgreSQL max_connections - admin_reserve) / num_workers / safety_margin

Example:

  • PostgreSQL max_connections = 100
  • Admin reserve = 10
  • Workers = 4
  • Safety margin = 1.5

Result: pool_size = (100 - 10) / 4 / 1.5 = 15 per worker

With PgBouncer:

Can be more aggressive since PgBouncer handles PostgreSQL connections:

pool_size = PgBouncer max_client_conn / num_workers * 0.4

Pool Recycling

Current Settings:

  • pool_recycle = 3600 (1 hour)
  • Balances stability vs connection churn
  • Prevents stale connections
  • Aligns with PgBouncer server_lifetime

Pre-Ping

Enabled: pool_pre_ping=True

Tests connections before use, automatically handling:

  • Network interruptions
  • Database restarts
  • Stale connections

References