Database Connection Pooling and PgBouncer
Overview
This document describes the PostgreSQL connection pooling strategy used in the Tracker API, including integration with PgBouncer for production deployments.
What is Connection Pooling?
Connection pooling maintains a pool of reusable database connections rather than creating a new connection for each request. This provides:
- Performance: Eliminates the overhead of establishing new connections
- Resource Management: Limits total connections to prevent database overload
- Connection Reuse: Efficiently shares connections across application workers
- Stability: Handles connection failures with automatic retry logic
Implementation
SQLAlchemy Connection Pooling
We use SQLAlchemy's built-in QueuePool for connection management:
- Sync Engine: PostgreSQL connections via psycopg3
- Async Engine: PostgreSQL connections via asyncpg
- Connection Lifecycle: Automatic management with retry logic
Configuration Modes
The application automatically configures connection pooling based on the environment:
Development Mode (Direct PostgreSQL)
ENVIRONMENT = "development" # or "dev" or "test"
USE_PGBOUNCER = False
# Pool configuration
pool_size = 10 # Base pool size
max_overflow = 10 # Additional connections when needed
pool_timeout = 30 # Seconds to wait for connection
pool_recycle = 3600 # Recycle connections after 1 hour
Connection Math:
- 4 workers × 10 base = 40 connections
- 4 workers × 10 overflow = 40 additional (max 80 total)
- Leaves room for monitoring, admin connections
Production Mode (With PgBouncer)
ENVIRONMENT = "production"
USE_PGBOUNCER = True
# Pool configuration
pool_size = 20 # Larger pool with PgBouncer
max_overflow = 30 # More aggressive overflow
pool_timeout = 30 # Seconds to wait for connection
pool_recycle = 3600 # Recycle connections after 1 hour
prepare_threshold = 0 # CRITICAL: Disable prepared statements!
Connection Math:
- 4 workers × 20 base = 80 connections to PgBouncer
- 4 workers × 30 overflow = 120 additional (max 200 to PgBouncer)
- PgBouncer multiplexes down to ~25-50 actual PostgreSQL connections
PgBouncer Integration
Why PgBouncer?
PgBouncer is a lightweight connection pooler that sits between the application and PostgreSQL:
[App Workers] → [PgBouncer] → [PostgreSQL]
200 app conns → 25-50 DB conns
Benefits:
- Connection Multiplexing: Many app connections → few DB connections
- Resource Efficiency: PostgreSQL uses less memory per connection
- Better Scaling: Support more application workers
- Connection Limiting: Protect PostgreSQL from connection exhaustion
Critical Configuration
Application Side (SQLAlchemy)
Most Important Setting:
connect_args = {
"prepare_threshold": 0, # CRITICAL for PgBouncer transaction mode
"options": "-c idle_in_transaction_session_timeout=60000",
"connect_timeout": 10,
}
Why prepare_threshold=0?
PgBouncer's transaction mode doesn't support prepared statements because:
- Prepared statements are session-specific
- PgBouncer multiplexes connections, breaking the session continuity
- Without this setting, you'll get "prepared statement does not exist" errors
PgBouncer Side
Recommended Configuration:
[pgbouncer]
# Pool mode - MUST be transaction for our setup
pool_mode = transaction
# Connection limits
max_client_conn = 1000 # Total app connections
default_pool_size = 25 # PostgreSQL connections per database
reserve_pool_size = 5 # Emergency reserve
# Timeouts
server_idle_timeout = 600 # Close idle server connections (10 min)
server_lifetime = 3600 # Recycle connections (1 hour)
query_timeout = 0 # No query timeout (handle in app)
# Authentication
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
Pool Mode Comparison
| Mode | Use Case | Prepared Statements | Transaction Scope |
|---|---|---|---|
| transaction | ✅ Recommended | ❌ Not supported | Per transaction |
| session | Long transactions | ✅ Supported | Per session |
| statement | Very simple queries | ❌ Not supported | Per statement |
Monitoring
Pool Health Endpoint
Development/Test Only:
GET /health/db-pool
Response:
{
"status": "healthy",
"pool_stats": {
"size": 10,
"checked_in": 8,
"checked_out": 2,
"overflow": 0,
"total": 10,
"utilization_percent": 20.0
},
"configuration": {
"using_pgbouncer": false,
"environment": "development"
}
}
Status Levels:
healthy: Utilization < 80%, connections availablewarning: Utilization > 80%critical: No connections available (checked_in = 0)
Debug Logging
In development, connection lifecycle events are logged:
DEBUG: New database connection established (pid: 140234567890)
DEBUG: Connection checked out from pool (pid: 140234567890)
DEBUG: Connection returned to pool (pid: 140234567890)
PgBouncer Monitoring
Check PgBouncer stats:
psql -h localhost -p 6432 -U pgbouncer pgbouncer
SHOW POOLS;
SHOW STATS;
SHOW CLIENTS;
SHOW SERVERS;
Configuration Files
Modified Files
app/core/config.py- USE_PGBOUNCER detectionapp/core/database.py- Main API engine configurationservices/shared/config.py- Services USE_PGBOUNCER detectionservices/shared/database.py- Services engine configurationservices/materialized_view_service/resilient_jobstore.py- APScheduler engine
Environment Variables
Required:
POSTGRES_SERVER=localhost # Or PgBouncer host
POSTGRES_PORT=5432 # Or PgBouncer port (e.g., 6432)
POSTGRES_USER=tracker
POSTGRES_PASSWORD=SuperSecretKey
POSTGRES_DB=tracker
For PgBouncer:
ENVIRONMENT=production # Enables USE_PGBOUNCER=True
POSTGRES_SERVER=pgbouncer-host # PgBouncer hostname
POSTGRES_PORT=6432 # PgBouncer port (not 5432)
Best Practices
Connection Management
✅ DO:
- Use context managers for database sessions
- Keep transactions short
- Handle exceptions properly with rollback
- Close sessions in finally blocks (automatic with our setup)
❌ DON'T:
- Hold connections during external API calls
- Keep transactions open during long operations
- Share connections across threads
- Return unconsumed cursors
Example: Good Transaction Management
# ❌ BAD - Long transaction holding locks
def bad_process_order(order_id: int):
with get_db() as db:
order = db.query(Order).filter_by(id=order_id).first()
# Transaction open during slow external call!
payment_result = stripe_api.charge(order.amount) # 3-5 seconds
order.status = 'completed'
db.commit()
# ✅ GOOD - Minimal transaction scope
def good_process_order(order_id: int):
# Read data without holding a transaction
with get_db() as db:
order = db.query(Order).filter_by(id=order_id).first()
# External operations outside transaction
payment_result = stripe_api.charge(order.amount)
# Quick transaction just for the update
with get_db() as db:
order = db.query(Order).filter_by(id=order_id).first()
order.status = 'completed'
db.commit()
Troubleshooting
Common Issues
"prepared statement does not exist"
Cause: Prepared statements enabled with PgBouncer transaction mode
Solution: Ensure prepare_threshold=0 in connect_args
connect_args = {
"prepare_threshold": 0, # Disable prepared statements
}
"connection timeout" errors
Cause: Pool exhaustion or slow queries
Solutions:
- Check pool utilization:
GET /health/db-pool - Increase pool size (with PgBouncer only)
- Optimize slow queries
- Reduce transaction scope
Stuck transactions
Cause: Transactions held open too long
Solution: Already configured with idle_in_transaction_session_timeout=60000
This automatically terminates transactions idle for >60 seconds.
High connection count
Cause: Connections not being returned to pool
Solutions:
- Ensure all database sessions use context managers
- Check for connection leaks in custom code
- Review debug logs for unreturned connections
Diagnostic Queries
Check active connections:
SELECT
pid,
usename,
application_name,
client_addr,
state,
query,
state_change
FROM pg_stat_activity
WHERE datname = 'tracker'
ORDER BY state_change DESC;
Find long-running queries:
SELECT
pid,
now() - query_start as duration,
state,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY duration DESC
LIMIT 10;
Architecture Diagram
Without PgBouncer (Development)
┌─────────────────────┐
│ App Worker 1 │──┐
│ Pool: 10 + 10 │ │
├─────────────────────┤ │
│ App Worker 2 │ ├──→ ┌──────────────────┐
│ Pool: 10 + 10 │ │ │ PostgreSQL │
├─────────────────────┤ │ │ max_connections │
│ App Worker 3 │ │ │ = 100 │
│ Pool: 10 + 10 │ │ └──────────────────┘
├─────────────────────┤ │
│ App Worker 4 │ │
│ Pool: 10 + 10 │──┘
└─────────────────────┘
Total: 40-80 connections to PostgreSQL
With PgBouncer (Production)
┌─────────────────────┐
│ App Worker 1 │──┐
│ Pool: 20 + 30 │ │
├─────────────────────┤ │
│ App Worker 2 │ │ ┌──────────────┐
│ Pool: 20 + 30 │ ├───→│ PgBouncer │
├─────────────────────┤ │ │ max: 1000 │
│ App Worker 3 │ │ │ pool: 25 │──→ ┌──────────────┐
│ Pool: 20 + 30 │ │ └──────────────┘ │ PostgreSQL │
├─────────────────────┤ │ ↓ │ 25-50 conns │
│ App Worker 4 │ │ Connection └──────────────┘
│ Pool: 20 + 30 │──┘ Multiplexing
└─────────────────────┘
Total: 80-200 app connections → 25-50 PostgreSQL connections
Performance Tuning
Connection Pool Sizing
Formula for Direct PostgreSQL:
pool_size = (PostgreSQL max_connections - admin_reserve) / num_workers / safety_margin
Example:
- PostgreSQL max_connections = 100
- Admin reserve = 10
- Workers = 4
- Safety margin = 1.5
Result: pool_size = (100 - 10) / 4 / 1.5 = 15 per worker
With PgBouncer:
Can be more aggressive since PgBouncer handles PostgreSQL connections:
pool_size = PgBouncer max_client_conn / num_workers * 0.4
Pool Recycling
Current Settings:
pool_recycle = 3600(1 hour)- Balances stability vs connection churn
- Prevents stale connections
- Aligns with PgBouncer server_lifetime
Pre-Ping
Enabled: pool_pre_ping=True
Tests connections before use, automatically handling:
- Network interruptions
- Database restarts
- Stale connections