Skip to content

Circuit Breaker Pattern - Resilient Connection Management

Overview

The circuit breaker pattern is a critical component of our resilient connection management system, designed to prevent cascading failures when database or Redis connections become unstable. This guide explains how the circuit breaker works and why it's essential for production environments.

What is a Circuit Breaker?

A circuit breaker is a design pattern that monitors connection health and automatically prevents further attempts when a service becomes unavailable. Think of it like an electrical circuit breaker - when there's a fault, it "trips" to prevent damage.

How Our Circuit Breaker Works

States and Transitions

Our circuit breaker operates in three distinct states:

graph LR A[Closed] -->|Failures reach threshold| B[Open] B -->|Recovery timeout| C[Half-Open] C -->|Success| A C -->|Failure| B

State Descriptions

Closed State (Normal Operation)

  • Behavior: All requests are allowed to pass through
  • Monitoring: Tracks consecutive failures
  • Transition: Opens after reaching failure threshold (default: 3 failures)

Open State (Failing)

  • Behavior: Immediately rejects all requests without attempting connection
  • Purpose: Prevents overwhelming the failing service
  • Duration: Stays open for recovery timeout (default: 30 seconds)
  • Transition: Automatically moves to half-open after timeout

Half-Open State (Testing Recovery)

  • Behavior: Allows a single test request to check if service recovered
  • Purpose: Gracefully tests if the service is back online
  • Transition:
  • Success → Returns to closed state
  • Failure → Returns to open state

Configuration Parameters

Circuit Breaker Settings

# Database connections
failure_threshold=3          # Failures before opening
recovery_timeout=30.0    # Seconds before attempting recovery
expected_exception=OperationalError  # Exception type to monitor

# Redis connections
failure_threshold=5        # More lenient for Redis
recovery_timeout=60.0      # Longer recovery time
expected_exception=redis.ConnectionError

Exponential Backoff Settings

base_delay=0.5      # Initial retry delay (seconds)
max_delay=10.0      # Maximum retry delay (seconds)
multiplier=1.5      # Delay multiplier for each attempt
jitter=True         # Add randomness to prevent thundering herd

Real-World Example

Scenario: Database Connection Timeout

# Without circuit breaker
for i in range(10):
    try:
        # This will fail immediately, overwhelming the database
        await db.execute("SELECT 1")
    except OperationalError:
        # Immediate retry causes more load
        await asyncio.sleep(1)

# With circuit breaker
result = await db_resilience.execute_with_resilience(
    lambda: db.execute("SELECT 1")
)
# Circuit breaker handles retries with exponential backoff

State Transitions in Action

  1. Initial State: Closed - Normal operation
  2. Failure Detected: 3 consecutive database timeouts
  3. Circuit Opens: All requests rejected for 30 seconds
  4. Recovery Period: System has time to recover
  5. Half-Open Test: Single test request allowed
  6. Service Recovered: Returns to closed state
  7. Service Still Down: Returns to open state

Benefits in Production

1. Prevents Cascading Failures

  • Stops overwhelming failing services
  • Provides time for recovery
  • Reduces load on infrastructure

2. Improves User Experience

  • Fast failure detection
  • Graceful degradation
  • Predictable behavior during outages

3. Resource Protection

  • Prevents connection pool exhaustion
  • Reduces CPU/memory usage during failures
  • Protects against retry storms

4. Monitoring and Observability

  • Clear state transitions
  • Configurable thresholds
  • Detailed logging for debugging

Integration with Existing Code

Database Operations

from services.shared.connection_resilience import db_resilience

# Resilient database query
async def get_tracker_count():
    return await db_resilience.execute_with_resilience(
        lambda: session.execute("SELECT COUNT(*) FROM trackers")
    )

Redis Operations

from services.shared.taskiq_resilience import resilient_redis_operation

# Resilient Redis operation
await resilient_redis_operation(
    "cache_set",
    redis_client.set,
    "key", "value"
)

Monitoring and Alerting

Key Metrics to Monitor

  • Circuit breaker state changes (open/closed/half-open)
  • Failure rates before circuit opens
  • Recovery times after circuit opens
  • Success rates during half-open state

Log Analysis

# Circuit breaker opening
logger.warning("database: Connection failed (attempt 3), retrying in 4.5s")

# Circuit breaker recovery
logger.info("database: Successfully reconnected after 4 attempts")

Troubleshooting Guide

Common Issues and Solutions

Circuit Breaker Opens Too Frequently

  • Symptom: Frequent state changes between open/closed
  • Solution: Increase failure_threshold or recovery_timeout
  • Example: Change failure threshold from 3 to 5

Recovery Takes Too Long

  • Symptom: Long periods in open state
  • Solution: Decrease recovery_timeout
  • Example: Change from 30s to 15s

Too Many Retries

  • Symptom: Excessive retry attempts
  • Solution: Adjust backoff parameters
  • Example: Increase base_delay or decrease multiplier

Configuration Examples

Conservative Settings (Production)

# For critical database connections
circuit_breaker_config={
    "failure_threshold": 5,
    "recovery_timeout": 60.0,
    "expected_exception": OperationalError,
}

Aggressive Settings (Development)

# For faster feedback during development
circuit_breaker_config={
    "failure_threshold": 2,
    "recovery_timeout": 10.0,
    "expected_exception": OperationalError,
}

Summary

The circuit breaker pattern is a critical component for building resilient systems. By automatically detecting failures and preventing cascading issues, it ensures your tracker-fetcher service remains stable even when database or Redis connections become unreliable. Combined with exponential backoff, it provides a robust solution for handling the aggressive idle connection timeouts common in production environments.