Behavioral Testing Strategy
Overview
This document outlines our behavioral testing approach for the tracker system, which prioritizes testing user workflows and business scenarios over code structure coverage. This strategy ensures our tests validate real-world functionality while maintaining the >80% coverage requirement for QA.
Philosophy: Behavior-Driven vs Coverage-Driven Testing
Traditional Coverage-Driven Approach (What We Moved Away From)
tests/
├── crud/
│ ├── test_brand.py # Tests CRUD operations by file location
│ └── test_client.py # Tests CRUD operations by file location
├── models/
│ └── test_status.py # Tests models by file location
└── schemas/
└── test_client.py # Tests schemas by file location
Problems with this approach:
- Tests are organized around code structure, not user needs
- Hard to understand what business functionality is being validated
- Difficult to identify gaps in workflow coverage
- Tests often become brittle and disconnected from real usage
New Behavioral Approach (Current Strategy)
tests/
├── behaviors/ # Business workflow tests
│ ├── brand_management/
│ │ ├── test_brand_crud_operations.py # Brand lifecycle workflows
│ │ ├── test_brand_client_relationships.py # Brand-client associations
│ │ └── test_brand_validation.py # Brand business rules
│ ├── client_management/
│ │ ├── test_client_crud_operations.py # Client account workflows
│ │ ├── test_client_brand_associations.py # Multi-brand scenarios
│ │ └── test_client_validation.py # Client business rules
│ └── tracker_lifecycle/
│ ├── test_tracker_status_management.py # Status transition workflows
│ └── test_tracker_operations.py # Tracker CRUD workflows
├── integration/ # Cross-feature workflows
│ └── test_complete_workflows.py # End-to-end scenarios
└── unit/ # Simple utility tests
├── test_config.py # Configuration utilities
└── test_url_utils.py # URL processing utilities
Benefits of this approach:
- Tests tell a story about system behavior
- Easy to identify missing business scenarios
- Tests are organized around user workflows
- Easier maintenance when features change
- Better gap identification in functionality coverage
Test Structure and Naming Conventions
File Organization
tests/behaviors/: Main behavioral tests organized by business domaintests/integration/: Cross-domain workflow teststests/unit/: Simple utility and configuration tests
Test Class Naming
Use descriptive names that reflect the business behavior being tested:
# ✅ Good: Describes business behavior
class TestBrandLifecycleManagement:
class TestMultiBrandClientManagement:
class TestClientAccountManagement:
# ❌ Bad: Describes code structure
class TestBrandCRUD:
class TestClientModel:
Test Method Naming
Use descriptive names that tell a story about the business scenario:
# ✅ Good: Describes business scenario
def test_client_creates_new_brand_for_product_line(self):
def test_client_updates_brand_information_for_rebranding(self):
def test_system_handles_request_for_nonexistent_brand(self):
# ❌ Bad: Describes technical operation
def test_create_brand(self):
def test_update_brand(self):
def test_get_brand_not_found(self):
Test Documentation Format
Each test should include comprehensive documentation:
def test_client_creates_new_brand_for_product_line(self, db: Session, test_client: Client) -> None:
"""
BEHAVIOR: When a client wants to launch a new product line, they create a brand
BUSINESS SCENARIO: A client (e.g., "Acme Corp") wants to launch a new product line
called "EcoFriendly Products" with their own branding and logo.
COVERAGE: app/crud/brand.py create() method
"""
Coverage Strategy
Maintaining QA Requirements
- Target: >80% overall code coverage
- Method: Behavioral tests naturally exercise multiple code paths
- Tracking: Coverage is tracked by business domain, not just by file
Coverage Mapping
Each behavioral test documents which code it covers:
"""
Brand Management CRUD Operations - Behavioral Tests
BEHAVIOR FOCUS: Tests the complete brand management lifecycle from a business perspective.
This covers how clients create, manage, update, and organize their brands within the system.
COVERAGE: Provides 100% coverage for app/crud/brand.py (13/13 lines)
"""
Integration Test Coverage
Integration tests exercise multiple modules in realistic workflows:
def test_complete_brand_creation_to_tracker_assignment(self):
"""
BEHAVIOR: Complete workflow from brand creation to tracker assignment
COVERAGE: Exercises multiple modules in realistic sequence:
- app/crud/brand.py
- app/crud/client.py
- app/crud/tracker.py
- app/api/routes/brands.py
"""
Implementation Guidelines
1. Starting a New Behavioral Test Suite
When creating tests for a new business domain:
- Identify the business workflows - What are the main user scenarios?
- Create the domain directory -
tests/behaviors/domain_name/ - Organize by workflow type - CRUD operations, relationships, validations
- Write scenario-based tests - Focus on user stories, not code coverage
2. Converting Existing Tests
When converting coverage-driven tests to behavioral tests:
- Preserve all test logic - Don't lose existing coverage
- Add behavioral context - Enhance with business scenario documentation
- Reorganize by workflow - Group related tests by business function
- Enhance with integration - Add workflow tests that span multiple components
3. Test Class Organization
Organize test classes around business behaviors:
class TestBrandLifecycleManagement:
"""Core brand CRUD operations and lifecycle"""
class TestMultiBrandClientManagement:
"""Scenarios with multiple brands per client"""
class TestBrandProductionIntegration:
"""How brands integrate with production workflows"""
class TestBrandBusinessRules:
"""Business rules and edge cases"""
class TestBrandSystemIntegrity:
"""System integrity and configuration validation"""
Running Tests
Running Behavioral Tests
# Run all behavioral tests
./run_tests_with_coverage.sh tests/behaviors/
# Run specific business domain
./run_tests_with_coverage.sh tests/behaviors/brand_management/
# Run specific workflow tests
./run_tests_with_coverage.sh tests/behaviors/brand_management/test_brand_crud_operations.py
Running Integration Tests
# Run all integration tests
./run_tests_with_coverage.sh tests/integration/
# Run specific integration workflow
./run_tests_with_coverage.sh tests/integration/test_complete_workflows.py
Coverage Analysis
# Generate coverage report
./run_tests_with_coverage.sh
# View coverage in VSCode
# Install 'Coverage Gutters' extension
# Use Ctrl+Shift+P -> 'Coverage Gutters: Display Coverage'
Benefits for Developers
1. Better Understanding
- Tests clearly communicate what the system should do
- New developers can understand business logic by reading tests
- Tests serve as living documentation of system behavior
2. Easier Maintenance
- When business requirements change, relevant tests are grouped together
- Test failures clearly indicate which business workflows are affected
- Easier to identify missing test coverage for new features
3. Improved Quality
- Tests validate complete user workflows, not just isolated functions
- Integration tests catch issues that unit tests might miss
- Business rules and edge cases are explicitly tested
4. Better Debugging
- Test names clearly indicate what business scenario failed
- Test organization makes it easy to find related functionality
- Comprehensive scenario documentation aids in troubleshooting
Migration Strategy
Phase 1: Foundation (Completed)
- ✅ Created behavioral directory structure
- ✅ Converted brand management tests to behavioral approach
- ✅ Maintained 100% coverage for converted modules
- ✅ Documented testing strategy
Phase 2: Core Domains (In Progress)
- ✅ Health monitoring behavioral tests (41 comprehensive tests)
- ✅ Geofence service behavioral tests (40% coverage, real testing)
- Fixed critical over-mocking: Replaced mocked service instances with real service testing
- Fixed SQLAlchemy issues: Corrected
func.case()andfunc.or_()usage - Fixed datetime deprecations: Replaced
datetime.utcnow()withdatetime.now(UTC) - ✅ Geocoding service behavioral tests (81% coverage, 21 comprehensive tests)
- Fixed critical over-mocking: Transformed from 0% to 81% real coverage
- Real service testing: Tests now use actual
GeocodingServiceinstances - External API isolation: Only Nominatim API calls are mocked
- Database integration: Real database interactions with transaction rollback
- 🔄 Convert client management tests
- 🔄 Convert tracker lifecycle tests
- 🔄 Fix remaining over-mocked services (tracker fetcher, health monitoring)
- 🔄 Add integration workflow tests
- 🔄 Update TODO tracking to reflect behavioral organization
Phase 3: Advanced Workflows (Planned)
- ⏳ Production run workflows
- ⏳ Location management workflows
- ⏳ Complex multi-client scenarios
- ⏳ End-to-end system workflows
Phase 4: Optimization (Planned)
- ⏳ Remove redundant coverage-driven tests
- ⏳ Optimize test performance
- ⏳ Add advanced integration scenarios
- ⏳ Create test data factories for complex scenarios
Best Practices
1. Test Documentation
- Always include BEHAVIOR, BUSINESS SCENARIO, and COVERAGE sections
- Use real-world examples in scenario descriptions
- Document the business value being tested
2. Test Data
- Use meaningful test data that reflects real business scenarios
- Create test data that tells a story (e.g., "EcoFriendly Products" brand)
- Use existing fixtures when possible to maintain consistency
3. Assertions
- Assert business outcomes, not just technical correctness
- Include assertions that validate the complete business scenario
- Test both positive and negative business scenarios
4. Error Handling
- Test business error scenarios (e.g., "client tries to access deleted brand")
- Ensure error handling preserves business logic integrity
- Test edge cases that could occur in real usage
5. Fixture Usage and Transaction Management
- Use shared fixtures: Always use fixtures from
tests/behaviors/fixtures.pyandconftest.py - Leverage automatic rollbacks: The
dbfixture automatically handles transaction rollbacks - no manual cleanup needed - No test data cruft: Tests should not leave behind data or require manual cleanup
- Consistent test users: Use
admin_user,regular_user,test_client,test_brandfixtures for consistency - Secure test passwords: Use the
test_passwordfixture for secure, randomly generated passwords
6. Mocking Strategy
- Mock external dependencies only: Mock external APIs, services, and network calls
- Don't mock internal business logic: Test actual business logic paths, not mocked versions
- Use proper async mocking: Use
AsyncMockfor async methods,MagicMockfor sync methods - Mock at the right level: Mock at service boundaries (e.g., HTTP clients, external providers)
- Verify mock interactions: Assert that mocks were called with correct parameters
7. Test Class Organization
- Use
@pytest.mark.behavioral: All behavioral test classes must have this decorator - Group by business workflow: Organize test classes around business behaviors, not technical structure
- Descriptive class names: Use names that describe business scenarios (e.g.,
TestBrandLifecycleManagement) - Logical test grouping: Group related business scenarios within the same test class
8. Database and State Management
- No manual database cleanup: Rely on automatic transaction rollbacks
- Use database session properly: Always use the
db: Sessionfixture parameter - Test data isolation: Each test should be independent and not rely on other test data
- Commit when needed: Use
db.commit()when testing scenarios that require committed data - Refresh objects: Use
db.refresh(obj)after commits to get updated state
9. Async Testing Patterns
- Use
asyncio.run(): For testing async service methods in sync test functions - Use
@pytest.mark.asyncio: For tests that are themselves async functions - Mock async dependencies: Use
AsyncMockwithnew_callable=AsyncMockparameter - Handle event loops: Be aware of event loop management in async tests
10. Critical Anti-Pattern: Over-Mocking Our Own Code
⚠️ CRITICAL ISSUE: One of the most dangerous testing anti-patterns is over-mocking our own business logic instead of testing it. This creates a false sense of security where tests pass but real bugs go undetected.
❌ What NOT to Mock (Our Own Code)
Never mock these internal components:
- Our service classes and business logic methods
- Our database models and relationships
- Our internal APIs and processing logic
- Our queue management systems
- Our health monitoring logic
- Our caching and optimization logic
❌ Bad Example: Over-Mocking Our Service
# ❌ DANGEROUS: Mocking our own service
def test_geocoding_workflow(self, db: Session):
with patch("services.geocoding_service.service.GeocodingService") as mock_service:
mock_instance = mock_service.return_value
mock_instance.geocode_coordinate.return_value = expected_result
# This tests NOTHING - just mock behavior!
result = mock_instance.geocode_coordinate(lat, lon)
assert result == expected_result # Always passes!
Problems with this approach:
- Tests validate mock behavior, not real functionality
- Real bugs in business logic go undetected
- 0% coverage of actual service code
- False confidence in test suite
- Regressions not caught until production
✅ What TO Mock (External Dependencies)
Always mock these external components:
- External APIs (Nominatim, Apple services, payment gateways)
- File system operations and network calls
- Third-party libraries and services
- Infrastructure components (when testing business logic)
- Time-dependent operations (for deterministic tests)
✅ Good Example: Testing Real Service
# ✅ CORRECT: Test real service, mock external APIs only
def test_geocoding_workflow(self, db: Session):
from services.geocoding_service.service import GeocodingService
# Create REAL service instance
service = GeocodingService("test_geocoding")
# Mock ONLY external API (Nominatim)
with patch.object(service.provider, "geocode", new_callable=AsyncMock) as mock_api:
mock_api.return_value = {"nearest_city": "London"}
# Test REAL service method
result = await service.geocode_coordinate(51.5074, -0.1278)
# Verify REAL business logic
assert result.nearest_city == "London"
assert result.cache_hit is False
assert result.lat_rounded == 51.51 # Real coordinate rounding
# Verify cache was created in database
cache_entry = db.query(GeocodingCache).filter(...).first()
assert cache_entry.nearest_city == "London"
🎯 The Correct Testing Pattern
1. Test Real Business Logic:
# ✅ Create real service instances
service = GeocodingService("test")
tracker_service = TrackerService(db)
health_monitor = HealthMonitor()
# ✅ Test real methods with real parameters
result = service.process_batch_geocoding(max_locations=10)
status = tracker_service.update_tracker_status(tracker_id, "DELIVERED")
health = health_monitor.check_service_health()
2. Mock External Dependencies Only:
# ✅ Mock external APIs
with patch.object(service.provider, "geocode") as mock_api:
mock_api.return_value = {"city": "London"}
# ✅ Mock external services
with patch("requests.get") as mock_http:
mock_http.return_value.json.return_value = {"status": "ok"}
# ✅ Mock file operations
with patch("builtins.open", mock_open(read_data="test")) as mock_file:
3. Use Real Database with Transaction Rollback:
# ✅ Real database interactions
def test_service_workflow(self, db: Session):
# Create real test data (automatically rolled back)
tracker = Tracker(name="Test Tracker")
db.add(tracker)
db.commit()
# Test real service with real database
service = TrackerService(db)
result = service.process_tracker(tracker.id)
# Verify real database changes
updated_tracker = db.query(Tracker).filter(Tracker.id == tracker.id).first()
assert updated_tracker.status == "PROCESSED"
# Transaction automatically rolled back by fixture
🚨 Warning Signs of Over-Mocking
Red flags that indicate over-mocking:
- Mocking service constructors or initialization
- Patching multiple internal modules in one test
- 0% coverage of service files despite "passing" tests
- Tests that never call real business logic methods
- Mocking database sessions instead of using real ones
- Tests that only verify mock.assert_called_with()
📊 Coverage Impact
Over-Mocked Services (Before Fix):
- Geocoding Service: 0% real coverage
- Tracker Fetcher: 0% real coverage
- Health Monitoring: 0% real coverage
Properly Tested Services (After Fix):
- Geocoding Service: 81% real coverage
- Geofence Service: 40% real coverage
🔧 Migration Strategy
When you find over-mocked tests:
- Identify the real service being mocked
- Replace mock with real service instance
- Mock only external dependencies
- Use real database with transaction rollback
- Verify real business logic outcomes
Example migration:
# ❌ Before: Over-mocked
with patch("services.geocoding_service.service.GeocodingService") as mock_service:
mock_instance = mock_service.return_value
mock_instance.geocode_coordinate.return_value = mock_result
# ✅ After: Real testing
from services.geocoding_service.service import GeocodingService
service = GeocodingService("test")
with patch.object(service.provider, "geocode") as mock_external_api:
mock_external_api.return_value = {"city": "London"}
result = await service.geocode_coordinate(lat, lon)
📋 Code Review Checklist
Before approving any test, verify:
- Are we testing real service instances?
- Are we mocking only external dependencies?
- Does the test exercise actual business logic?
- Is the service coverage >40% for the tested module?
- Do assertions verify real business outcomes?
- Are we using real database with transaction rollback?
🎯 Success Metrics
Healthy test suite indicators:
- Service files have >40% real coverage
- Tests catch real bugs during development
- External dependencies properly isolated
- Business logic thoroughly validated
- Database interactions tested with real data
11. Other Anti-Patterns to Avoid
- No hardcoded credentials: Never use hardcoded passwords or API keys
- No test skipping: All tests must run - use
pytest.mark.skiponly for temporary issues - No empty except blocks: Always handle specific exceptions with proper error messages
- No print statements: Use logging or test output mechanisms instead
- No manual test data creation: Use fixtures and factory functions
- No test interdependencies: Tests should not depend on execution order or other test results
Conclusion
This behavioral testing strategy ensures our test suite validates real business functionality while maintaining technical coverage requirements. By organizing tests around user workflows rather than code structure, we create a more maintainable, understandable, and effective test suite that serves both QA requirements and developer productivity.
The strategy preserves all existing test coverage while making tests more meaningful and easier to maintain. As we continue to migrate and enhance our test suite, we'll build a comprehensive validation system that truly reflects how the tracker system is used in practice.