Error Handling & Retry Logic
Municipal GIS data pipelines operate in inherently unstable environments. County APIs enforce strict rate limits, zoning PDFs shift encoding standards mid-cycle, and coordinate reference systems drift across jurisdictions. When architecting an Automated Zoning Change & Municipal GIS Tracking system, transient failures are not exceptions—they are baseline conditions. Without deterministic error handling and retry logic, partial geometries, orphaned parcel records, and silent compliance gaps will corrupt downstream PropTech analytics. This guide details production-grade patterns for isolating failures, recovering gracefully, and maintaining spatial integrity across automated ingestion cycles.
Pipeline Architecture and Failure Boundaries jump to heading
Retry logic must be scoped to specific pipeline stages rather than applied globally. A blanket retry on a malformed GeoJSON wastes compute cycles and masks systemic data quality issues. Implement tiered error boundaries:
- Network/Transport Layer: HTTP 429, 502, 504, TLS drops, and socket timeouts. Safe to retry with exponential backoff.
- Parsing/Extraction Layer: Malformed XML, broken PDF encodings, missing spatial references. Requires validation gates and fallback parsers.
- Spatial/Topology Layer: Invalid polygons, self-intersections, CRS mismatches. Requires geometric repair routines or dead-letter routing.
- Compliance/Write Layer: Database constraint violations, duplicate zoning IDs, transaction rollbacks. Requires idempotent upserts and immutable audit logging.
Integrating these boundaries into your PDF & HTML Scraping Pipelines ensures that document extraction failures do not cascade into geometry corruption. The retry strategy should be declarative, state-aware, and bounded by maximum attempts and total timeout windows.
Exponential Backoff with Jitter in Python jump to heading
Municipal servers frequently throttle bulk requests. Naive linear retries trigger IP bans or queue starvation. Use tenacity for declarative retry decorators that combine exponential backoff, jitter, and custom retry predicates. Refer to the official Tenacity documentation for advanced circuit breaker integrations.
import tenacity
import requests
import logging
from requests.exceptions import HTTPError, ConnectionError, Timeout
logger = logging.getLogger(__name__)
def is_retryable_error(exception: Exception) -> bool:
if isinstance(exception, (ConnectionError, Timeout)):
return True
if isinstance(exception, HTTPError):
status = exception.response.status_code
return status in (429, 500, 502, 503, 504)
return False
@tenacity.retry(
retry=tenacity.retry_if_exception(is_retryable_error),
wait=tenacity.wait_exponential(multiplier=2, min=1, max=30),
stop=tenacity.stop_after_attempt(5) | tenacity.stop_after_delay(120),
before_sleep=tenacity.before_sleep_log(logger, logging.WARNING),
reraise=True
)
def fetch_zoning_parcel(parcel_id: str, endpoint: str, headers: dict) -> dict:
response = requests.get(f"{endpoint}/{parcel_id}", headers=headers, timeout=15)
response.raise_for_status()
return response.json()
The stop condition enforces both attempt and wall-clock limits, preventing runaway processes during prolonged municipal outages. Adding jitter (via tenacity.wait_random_exponential or custom wait chains) prevents thundering herd problems when multiple workers restart simultaneously.
Spatial Validation and Dead-Letter Routing jump to heading
Network retries are useless if the payload contains invalid geometries. Spatial validation must occur immediately after parsing. Use shapely and geopandas to enforce OGC Simple Features compliance before any database write. See the Shapely validation module for topology repair specifications.
import geopandas as gpd
from shapely.validation import make_valid
from shapely.geometry import shape
def validate_and_repair_geometry(feature: dict, target_crs: str = "EPSG:4326") -> gpd.GeoDataFrame:
try:
geom = shape(feature["geometry"])
if not geom.is_valid:
geom = make_valid(geom)
if geom.is_empty or geom.area == 0:
raise ValueError("Degenerate geometry after repair")
return gpd.GeoDataFrame([feature], geometry=[geom], crs=target_crs)
except Exception as e:
logger.error(f"Unrepairable geometry for feature {feature.get('id')}: {e}")
raise ValueError("Dead-letter routing required")
When make_valid fails or produces degenerate geometries, route the record to a quarantine table rather than retrying. Spatial corruption requires human-in-the-loop review or automated topology correction via GIS Export Sync Workflows.
Idempotent Upserts and Transactional Safety jump to heading
Database writes must survive partial failures. Implement idempotent upserts using PostgreSQL ON CONFLICT clauses or SQLAlchemy equivalents. Combine this with transactional isolation to prevent phantom reads during zoning boundary updates:
from sqlalchemy import text
from sqlalchemy.exc import IntegrityError
def upsert_zoning_record(engine, record: dict):
stmt = text("""
INSERT INTO zoning_parcels (parcel_id, zoning_code, geometry, updated_at)
VALUES (:parcel_id, :zoning_code, ST_GeomFromText(:geom, 4326), NOW())
ON CONFLICT (parcel_id) DO UPDATE SET
zoning_code = EXCLUDED.zoning_code,
geometry = EXCLUDED.geometry,
updated_at = NOW()
""")
try:
with engine.begin() as conn:
conn.execute(stmt, record)
except IntegrityError as e:
logger.warning(f"Constraint violation for parcel {record['parcel_id']}: {e}")
# Log to audit trail, do not retry blindly
Idempotency guarantees that retrying a network timeout won’t duplicate records or violate unique constraints. Always pair upserts with immutable audit logs tracking source hash, retry count, and final disposition.
Observability and Circuit Breakers jump to heading
Error handling extends beyond code. Integrate retry metrics into your observability stack. Track retry_attempts, dead_letter_count, and spatial_repair_rate per municipality. Implement circuit breakers to halt ingestion when failure rates exceed thresholds (e.g., >15% over 5 minutes). This prevents resource exhaustion during county system migrations or planned maintenance windows.
Resilient municipal GIS pipelines treat errors as first-class citizens. By scoping retries to transport layers, enforcing spatial validation gates, and implementing idempotent writes, teams can maintain data integrity across volatile county feeds. Pair these patterns with structured logging and dead-letter routing to ensure that every zoning change, parcel split, and boundary adjustment is captured, validated, and traceable.