Skip to content

Behavioral Testing Strategy

Overview

This document outlines our behavioral testing approach for the tracker system, which prioritizes testing user workflows and business scenarios over code structure coverage. This strategy ensures our tests validate real-world functionality while maintaining the >80% coverage requirement for QA.

Philosophy: Behavior-Driven vs Coverage-Driven Testing

Traditional Coverage-Driven Approach (What We Moved Away From)

tests/
├── crud/
│   ├── test_brand.py      # Tests CRUD operations by file location
│   └── test_client.py     # Tests CRUD operations by file location
├── models/
│   └── test_status.py     # Tests models by file location
└── schemas/
    └── test_client.py     # Tests schemas by file location

Problems with this approach:

  • Tests are organized around code structure, not user needs
  • Hard to understand what business functionality is being validated
  • Difficult to identify gaps in workflow coverage
  • Tests often become brittle and disconnected from real usage

New Behavioral Approach (Current Strategy)

tests/
├── behaviors/                    # Business workflow tests
│   ├── brand_management/
│   │   ├── test_brand_crud_operations.py      # Brand lifecycle workflows
│   │   ├── test_brand_client_relationships.py # Brand-client associations
│   │   └── test_brand_validation.py           # Brand business rules
│   ├── client_management/
│   │   ├── test_client_crud_operations.py     # Client account workflows
│   │   ├── test_client_brand_associations.py  # Multi-brand scenarios
│   │   └── test_client_validation.py          # Client business rules
│   └── tracker_lifecycle/
│       ├── test_tracker_status_management.py  # Status transition workflows
│       └── test_tracker_operations.py         # Tracker CRUD workflows
├── integration/                  # Cross-feature workflows
│   └── test_complete_workflows.py             # End-to-end scenarios
└── unit/                        # Simple utility tests
    ├── test_config.py           # Configuration utilities
    └── test_url_utils.py        # URL processing utilities

Benefits of this approach:

  • Tests tell a story about system behavior
  • Easy to identify missing business scenarios
  • Tests are organized around user workflows
  • Easier maintenance when features change
  • Better gap identification in functionality coverage

Test Structure and Naming Conventions

File Organization

  • tests/behaviors/: Main behavioral tests organized by business domain
  • tests/integration/: Cross-domain workflow tests
  • tests/unit/: Simple utility and configuration tests

Test Class Naming

Use descriptive names that reflect the business behavior being tested:

# ✅ Good: Describes business behavior
class TestBrandLifecycleManagement:
class TestMultiBrandClientManagement:
class TestClientAccountManagement:

# ❌ Bad: Describes code structure
class TestBrandCRUD:
class TestClientModel:

Test Method Naming

Use descriptive names that tell a story about the business scenario:

# ✅ Good: Describes business scenario
def test_client_creates_new_brand_for_product_line(self):
def test_client_updates_brand_information_for_rebranding(self):
def test_system_handles_request_for_nonexistent_brand(self):

# ❌ Bad: Describes technical operation
def test_create_brand(self):
def test_update_brand(self):
def test_get_brand_not_found(self):

Test Documentation Format

Each test should include comprehensive documentation:

def test_client_creates_new_brand_for_product_line(self, db: Session, test_client: Client) -> None:
    """
    BEHAVIOR: When a client wants to launch a new product line, they create a brand

    BUSINESS SCENARIO: A client (e.g., "Acme Corp") wants to launch a new product line
    called "EcoFriendly Products" with their own branding and logo.

    COVERAGE: app/crud/brand.py create() method
    """

Coverage Strategy

Maintaining QA Requirements

  • Target: >80% overall code coverage
  • Method: Behavioral tests naturally exercise multiple code paths
  • Tracking: Coverage is tracked by business domain, not just by file

Coverage Mapping

Each behavioral test documents which code it covers:

"""
Brand Management CRUD Operations - Behavioral Tests

BEHAVIOR FOCUS: Tests the complete brand management lifecycle from a business perspective.
This covers how clients create, manage, update, and organize their brands within the system.

COVERAGE: Provides 100% coverage for app/crud/brand.py (13/13 lines)
"""

Integration Test Coverage

Integration tests exercise multiple modules in realistic workflows:

def test_complete_brand_creation_to_tracker_assignment(self):
    """
    BEHAVIOR: Complete workflow from brand creation to tracker assignment

    COVERAGE: Exercises multiple modules in realistic sequence:
    - app/crud/brand.py
    - app/crud/client.py
    - app/crud/tracker.py
    - app/api/routes/brands.py
    """

Implementation Guidelines

1. Starting a New Behavioral Test Suite

When creating tests for a new business domain:

  1. Identify the business workflows - What are the main user scenarios?
  2. Create the domain directory - tests/behaviors/domain_name/
  3. Organize by workflow type - CRUD operations, relationships, validations
  4. Write scenario-based tests - Focus on user stories, not code coverage

2. Converting Existing Tests

When converting coverage-driven tests to behavioral tests:

  1. Preserve all test logic - Don't lose existing coverage
  2. Add behavioral context - Enhance with business scenario documentation
  3. Reorganize by workflow - Group related tests by business function
  4. Enhance with integration - Add workflow tests that span multiple components

3. Test Class Organization

Organize test classes around business behaviors:

class TestBrandLifecycleManagement:
    """Core brand CRUD operations and lifecycle"""

class TestMultiBrandClientManagement:
    """Scenarios with multiple brands per client"""

class TestBrandProductionIntegration:
    """How brands integrate with production workflows"""

class TestBrandBusinessRules:
    """Business rules and edge cases"""

class TestBrandSystemIntegrity:
    """System integrity and configuration validation"""

Running Tests

Running Behavioral Tests

# Run all behavioral tests
./run_tests_with_coverage.sh tests/behaviors/

# Run specific business domain
./run_tests_with_coverage.sh tests/behaviors/brand_management/

# Run specific workflow tests
./run_tests_with_coverage.sh tests/behaviors/brand_management/test_brand_crud_operations.py

Running Integration Tests

# Run all integration tests
./run_tests_with_coverage.sh tests/integration/

# Run specific integration workflow
./run_tests_with_coverage.sh tests/integration/test_complete_workflows.py

Coverage Analysis

# Generate coverage report
./run_tests_with_coverage.sh

# View coverage in VSCode
# Install 'Coverage Gutters' extension
# Use Ctrl+Shift+P -> 'Coverage Gutters: Display Coverage'

Benefits for Developers

1. Better Understanding

  • Tests clearly communicate what the system should do
  • New developers can understand business logic by reading tests
  • Tests serve as living documentation of system behavior

2. Easier Maintenance

  • When business requirements change, relevant tests are grouped together
  • Test failures clearly indicate which business workflows are affected
  • Easier to identify missing test coverage for new features

3. Improved Quality

  • Tests validate complete user workflows, not just isolated functions
  • Integration tests catch issues that unit tests might miss
  • Business rules and edge cases are explicitly tested

4. Better Debugging

  • Test names clearly indicate what business scenario failed
  • Test organization makes it easy to find related functionality
  • Comprehensive scenario documentation aids in troubleshooting

Migration Strategy

Phase 1: Foundation (Completed)

  • ✅ Created behavioral directory structure
  • ✅ Converted brand management tests to behavioral approach
  • ✅ Maintained 100% coverage for converted modules
  • ✅ Documented testing strategy

Phase 2: Core Domains (In Progress)

  • ✅ Health monitoring behavioral tests (41 comprehensive tests)
  • ✅ Geofence service behavioral tests (40% coverage, real testing)
  • Fixed critical over-mocking: Replaced mocked service instances with real service testing
  • Fixed SQLAlchemy issues: Corrected func.case() and func.or_() usage
  • Fixed datetime deprecations: Replaced datetime.utcnow() with datetime.now(UTC)
  • ✅ Geocoding service behavioral tests (81% coverage, 21 comprehensive tests)
  • Fixed critical over-mocking: Transformed from 0% to 81% real coverage
  • Real service testing: Tests now use actual GeocodingService instances
  • External API isolation: Only Nominatim API calls are mocked
  • Database integration: Real database interactions with transaction rollback
  • 🔄 Convert client management tests
  • 🔄 Convert tracker lifecycle tests
  • 🔄 Fix remaining over-mocked services (tracker fetcher, health monitoring)
  • 🔄 Add integration workflow tests
  • 🔄 Update TODO tracking to reflect behavioral organization

Phase 3: Advanced Workflows (Planned)

  • ⏳ Production run workflows
  • ⏳ Location management workflows
  • ⏳ Complex multi-client scenarios
  • ⏳ End-to-end system workflows

Phase 4: Optimization (Planned)

  • ⏳ Remove redundant coverage-driven tests
  • ⏳ Optimize test performance
  • ⏳ Add advanced integration scenarios
  • ⏳ Create test data factories for complex scenarios

Best Practices

1. Test Documentation

  • Always include BEHAVIOR, BUSINESS SCENARIO, and COVERAGE sections
  • Use real-world examples in scenario descriptions
  • Document the business value being tested

2. Test Data

  • Use meaningful test data that reflects real business scenarios
  • Create test data that tells a story (e.g., "EcoFriendly Products" brand)
  • Use existing fixtures when possible to maintain consistency

3. Assertions

  • Assert business outcomes, not just technical correctness
  • Include assertions that validate the complete business scenario
  • Test both positive and negative business scenarios

4. Error Handling

  • Test business error scenarios (e.g., "client tries to access deleted brand")
  • Ensure error handling preserves business logic integrity
  • Test edge cases that could occur in real usage

5. Fixture Usage and Transaction Management

  • Use shared fixtures: Always use fixtures from tests/behaviors/fixtures.py and conftest.py
  • Leverage automatic rollbacks: The db fixture automatically handles transaction rollbacks - no manual cleanup needed
  • No test data cruft: Tests should not leave behind data or require manual cleanup
  • Consistent test users: Use admin_user, regular_user, test_client, test_brand fixtures for consistency
  • Secure test passwords: Use the test_password fixture for secure, randomly generated passwords

6. Mocking Strategy

  • Mock external dependencies only: Mock external APIs, services, and network calls
  • Don't mock internal business logic: Test actual business logic paths, not mocked versions
  • Use proper async mocking: Use AsyncMock for async methods, MagicMock for sync methods
  • Mock at the right level: Mock at service boundaries (e.g., HTTP clients, external providers)
  • Verify mock interactions: Assert that mocks were called with correct parameters

7. Test Class Organization

  • Use @pytest.mark.behavioral: All behavioral test classes must have this decorator
  • Group by business workflow: Organize test classes around business behaviors, not technical structure
  • Descriptive class names: Use names that describe business scenarios (e.g., TestBrandLifecycleManagement)
  • Logical test grouping: Group related business scenarios within the same test class

8. Database and State Management

  • No manual database cleanup: Rely on automatic transaction rollbacks
  • Use database session properly: Always use the db: Session fixture parameter
  • Test data isolation: Each test should be independent and not rely on other test data
  • Commit when needed: Use db.commit() when testing scenarios that require committed data
  • Refresh objects: Use db.refresh(obj) after commits to get updated state

9. Async Testing Patterns

  • Use asyncio.run(): For testing async service methods in sync test functions
  • Use @pytest.mark.asyncio: For tests that are themselves async functions
  • Mock async dependencies: Use AsyncMock with new_callable=AsyncMock parameter
  • Handle event loops: Be aware of event loop management in async tests

10. Critical Anti-Pattern: Over-Mocking Our Own Code

⚠️ CRITICAL ISSUE: One of the most dangerous testing anti-patterns is over-mocking our own business logic instead of testing it. This creates a false sense of security where tests pass but real bugs go undetected.

❌ What NOT to Mock (Our Own Code)

Never mock these internal components:

  • Our service classes and business logic methods
  • Our database models and relationships
  • Our internal APIs and processing logic
  • Our queue management systems
  • Our health monitoring logic
  • Our caching and optimization logic

❌ Bad Example: Over-Mocking Our Service

# ❌ DANGEROUS: Mocking our own service
def test_geocoding_workflow(self, db: Session):
    with patch("services.geocoding_service.service.GeocodingService") as mock_service:
        mock_instance = mock_service.return_value
        mock_instance.geocode_coordinate.return_value = expected_result

        # This tests NOTHING - just mock behavior!
        result = mock_instance.geocode_coordinate(lat, lon)
        assert result == expected_result  # Always passes!

Problems with this approach:

  • Tests validate mock behavior, not real functionality
  • Real bugs in business logic go undetected
  • 0% coverage of actual service code
  • False confidence in test suite
  • Regressions not caught until production

✅ What TO Mock (External Dependencies)

Always mock these external components:

  • External APIs (Nominatim, Apple services, payment gateways)
  • File system operations and network calls
  • Third-party libraries and services
  • Infrastructure components (when testing business logic)
  • Time-dependent operations (for deterministic tests)

✅ Good Example: Testing Real Service

# ✅ CORRECT: Test real service, mock external APIs only
def test_geocoding_workflow(self, db: Session):
    from services.geocoding_service.service import GeocodingService

    # Create REAL service instance
    service = GeocodingService("test_geocoding")

    # Mock ONLY external API (Nominatim)
    with patch.object(service.provider, "geocode", new_callable=AsyncMock) as mock_api:
        mock_api.return_value = {"nearest_city": "London"}

        # Test REAL service method
        result = await service.geocode_coordinate(51.5074, -0.1278)

        # Verify REAL business logic
        assert result.nearest_city == "London"
        assert result.cache_hit is False
        assert result.lat_rounded == 51.51  # Real coordinate rounding

        # Verify cache was created in database
        cache_entry = db.query(GeocodingCache).filter(...).first()
        assert cache_entry.nearest_city == "London"

🎯 The Correct Testing Pattern

1. Test Real Business Logic:

# ✅ Create real service instances
service = GeocodingService("test")
tracker_service = TrackerService(db)
health_monitor = HealthMonitor()

# ✅ Test real methods with real parameters
result = service.process_batch_geocoding(max_locations=10)
status = tracker_service.update_tracker_status(tracker_id, "DELIVERED")
health = health_monitor.check_service_health()

2. Mock External Dependencies Only:

# ✅ Mock external APIs
with patch.object(service.provider, "geocode") as mock_api:
    mock_api.return_value = {"city": "London"}

# ✅ Mock external services
with patch("requests.get") as mock_http:
    mock_http.return_value.json.return_value = {"status": "ok"}

# ✅ Mock file operations
with patch("builtins.open", mock_open(read_data="test")) as mock_file:

3. Use Real Database with Transaction Rollback:

# ✅ Real database interactions
def test_service_workflow(self, db: Session):
    # Create real test data (automatically rolled back)
    tracker = Tracker(name="Test Tracker")
    db.add(tracker)
    db.commit()

    # Test real service with real database
    service = TrackerService(db)
    result = service.process_tracker(tracker.id)

    # Verify real database changes
    updated_tracker = db.query(Tracker).filter(Tracker.id == tracker.id).first()
    assert updated_tracker.status == "PROCESSED"

    # Transaction automatically rolled back by fixture

🚨 Warning Signs of Over-Mocking

Red flags that indicate over-mocking:

  • Mocking service constructors or initialization
  • Patching multiple internal modules in one test
  • 0% coverage of service files despite "passing" tests
  • Tests that never call real business logic methods
  • Mocking database sessions instead of using real ones
  • Tests that only verify mock.assert_called_with()

📊 Coverage Impact

Over-Mocked Services (Before Fix):

  • Geocoding Service: 0% real coverage
  • Tracker Fetcher: 0% real coverage
  • Health Monitoring: 0% real coverage

Properly Tested Services (After Fix):

  • Geocoding Service: 81% real coverage
  • Geofence Service: 40% real coverage

🔧 Migration Strategy

When you find over-mocked tests:

  1. Identify the real service being mocked
  2. Replace mock with real service instance
  3. Mock only external dependencies
  4. Use real database with transaction rollback
  5. Verify real business logic outcomes

Example migration:

# ❌ Before: Over-mocked
with patch("services.geocoding_service.service.GeocodingService") as mock_service:
    mock_instance = mock_service.return_value
    mock_instance.geocode_coordinate.return_value = mock_result

# ✅ After: Real testing
from services.geocoding_service.service import GeocodingService
service = GeocodingService("test")
with patch.object(service.provider, "geocode") as mock_external_api:
    mock_external_api.return_value = {"city": "London"}
    result = await service.geocode_coordinate(lat, lon)

📋 Code Review Checklist

Before approving any test, verify:

  • Are we testing real service instances?
  • Are we mocking only external dependencies?
  • Does the test exercise actual business logic?
  • Is the service coverage >40% for the tested module?
  • Do assertions verify real business outcomes?
  • Are we using real database with transaction rollback?

🎯 Success Metrics

Healthy test suite indicators:

  • Service files have >40% real coverage
  • Tests catch real bugs during development
  • External dependencies properly isolated
  • Business logic thoroughly validated
  • Database interactions tested with real data

11. Other Anti-Patterns to Avoid

  • No hardcoded credentials: Never use hardcoded passwords or API keys
  • No test skipping: All tests must run - use pytest.mark.skip only for temporary issues
  • No empty except blocks: Always handle specific exceptions with proper error messages
  • No print statements: Use logging or test output mechanisms instead
  • No manual test data creation: Use fixtures and factory functions
  • No test interdependencies: Tests should not depend on execution order or other test results

Conclusion

This behavioral testing strategy ensures our test suite validates real business functionality while maintaining technical coverage requirements. By organizing tests around user workflows rather than code structure, we create a more maintainable, understandable, and effective test suite that serves both QA requirements and developer productivity.

The strategy preserves all existing test coverage while making tests more meaningful and easier to maintain. As we continue to migrate and enhance our test suite, we'll build a comprehensive validation system that truly reflects how the tracker system is used in practice.