Skip to content

UTF-8 Encoding Error Fix

Issue Description

Users were experiencing a UTF-8 encoding error when editing production runs, particularly when applying images. The error manifested as:

Database session error: 'utf-8' codec can't decode byte 0xb4 in position 7: invalid start byte
INFO:     172.31.27.121:35786 - "PUT /api/v1/production-runs/3 HTTP/1.1" 500 Internal Server Error

Root Cause

The issue was caused by two related problems:

  1. Unsafe error handling: In database session management and CRUD operations, when exceptions occurred that contained binary data, the error handling code attempted to convert the exception to a string using str(e), which could fail with a UnicodeDecodeError if the exception contained non-UTF-8 bytes.

  2. Geographic data serialization: The application uses PostGIS/geographic data types (WKBElement) that contain binary spatial data. When these objects were being cached, the serialization process failed because WKBElement objects contain binary data that cannot be directly serialized to JSON.

Solution

The fix involved implementing safe error handling throughout the codebase to gracefully handle exceptions that might contain binary data. The solution uses a three-tier fallback approach:

  1. First attempt: Try to convert the exception to a string using str(e)
  2. Second attempt: If that fails with UnicodeDecodeError, use repr(e) which safely represents binary data
  3. Final fallback: If all else fails, use a generic error message

Files Modified

1. app/api/deps.py

Updated the get_db() function to safely handle exceptions:

except Exception as e:
    # If there's an error, rollback the transaction
    db.rollback()
    # Safely handle exceptions that might contain binary data
    try:
        error_msg = str(e)
        print(f"Database session error: {error_msg}")
    except UnicodeDecodeError:
        # If the exception contains binary data that can't be decoded as UTF-8
        print(f"Database session error: {repr(e)}")
    except Exception:
        # Fallback for any other encoding issues
        print("Database session error: Unable to decode error message")
    raise

2. app/crud/base.py

Updated the base CRUD methods (get() and get_multi()) with safe error handling:

except Exception as e:
    # If there's an error, rollback the transaction and try again
    db.rollback()
    # Safely handle exceptions that might contain binary data
    try:
        error_msg = str(e)
        print(f"Error in base get method: {error_msg}")
    except UnicodeDecodeError:
        print(f"Error in base get method: {repr(e)}")
    except Exception:
        print("Error in base get method: Unable to decode error message")
    # Try one more time with a fresh transaction
    return db.query(self.model).filter(self.model.id == id).first()

3. app/crud/production.py

Updated all error handling blocks in the production run CRUD operations to use the same safe error handling pattern.

4. app/core/redis.py

Enhanced cache serialization to handle geographic/spatial data types:

# Handle WKBElement (geographic/spatial data) - skip these as they contain binary data
if hasattr(value, '__class__') and 'WKBElement' in str(type(value)):
    return None  # Skip WKBElement objects to avoid serialization issues

# Handle other geographic/spatial types that might contain binary data
if hasattr(value, '__class__') and any(geo_type in str(type(value)) for geo_type in ['Geometry', 'Point', 'Polygon', 'LineString']):
    return None  # Skip geographic objects that might contain binary data

Also updated the orjson_default function to handle these types safely.

Testing

A comprehensive test script (test_utf8_encoding_fix.py) was created to verify the fix:

  • Tests that UTF-8 encoding errors are handled gracefully
  • Tests that normal exceptions still work correctly
  • Simulates the exact error condition that was causing the issue

Benefits

  1. Prevents crashes: The application no longer crashes when encountering binary data in exceptions
  2. Better debugging: Error messages are still logged, but safely
  3. Graceful degradation: The application continues to function even when encountering encoding issues
  4. Geographic data handling: WKBElement and other spatial data types are now handled safely in cache serialization
  5. Backward compatibility: Normal exceptions continue to work as expected

Usage

The fix is automatically applied to all database operations. No changes are required in application code or configuration.

Verification

To verify the fix is working:

  1. Run the test script: python test_utf8_encoding_fix.py
  2. The test should pass with all UTF-8 encoding error handling tests successful
  3. Try editing production runs with images - the operation should complete without UTF-8 encoding errors

This fix addresses the specific issue where editing production runs with images would fail with UTF-8 encoding errors. The root cause was a combination of:

  1. Unsafe error handling when exceptions contained binary data
  2. Cache serialization failures when trying to serialize geographic data (WKBElement objects)

The improved error handling and geographic data serialization benefits the entire application by making it more robust against encoding-related issues, particularly when working with spatial/geographic data.