Skip to content

ADR-005: Observability Boundaries

Status: ACCEPTED (Retrospectively documented)
Date: 2026-04-03
Phase: 3 (Observability) - Delivered 2026-03-27
Decision Date: 2026-03-27
Author: Forward Team
Supersedes: n/a
Superseded by: n/a

Context

Phase 3 (Observability) required a clear separation between SAFETY and OBSERVABILITY concerns to prevent observability failures from blocking trading operations. The system needed to handle monitoring, reporting, and alerting without interfering with the critical execution path.

Decision

We establish a strict boundary between SAFETY and OBSERVABILITY domains:

Boundary Rules

Domain Responsibility Failure Impact Examples
SAFETY Trading-critical checks BLOCKS trading Circuit breaker, position limits, balance checks
OBSERVABILITY Monitoring & reporting Never blocks Health checks, logs, reports, metrics

Implementation

┌─────────────────────────────────────────────────────────────┐
│                      SAFETY DOMAIN                          │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │ Circuit      │  │ Position     │  │ Balance      │        │
│  │ Breaker      │  │ Risk Engine  │  │ Projection   │        │
│  └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                             │
│  Failure → BLOCK trading → Event store → Recovery         │
└─────────────────────────────────────────────────────────────┘
                            ▼ (triggers, never blocks)
┌─────────────────────────────────────────────────────────────┐
│                   OBSERVABILITY DOMAIN                       │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │ Health       │  │ Logger       │  │ Report       │        │
│  │ Checker      │  │ Service      │  │ Service      │        │
│  └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                             │
│  Failure → WARN only → Never affects SAFETY                 │
└─────────────────────────────────────────────────────────────┘

Consequences

Positive

  • Trading reliability: OBSERVABILITY failures never halt trading
  • Clear operational model: Teams know which failures require immediate action
  • Testable separation: G3 Recovery tests verify boundary adherence
  • Simplified incident response: SAFETY alerts = stop trading; OBSERVABILITY alerts = investigate

Negative

  • Duplication: Some checks exist in both domains (e.g., health vs. mandatory circuit breaker)
  • Monitoring blind spots: If OBSERVABILITY fails, we may not detect issues until SAFETY triggers

Boundary Enforcement

Code Patterns

// SAFETY check - can block
circuitBreaker.recordSafetyFailure('memory_critical', details);
if (circuitBreaker.isOpen()) {
  blockTrading(); // HARD STOP
}

// OBSERVABILITY check - never blocks
health.checkMemory()
  .then(result => {
    if (result.status === 'CRITICAL') {
      alertEngine.warn('Memory high'); // Notification only
    }
  })
  .catch(err => {
    logger.error('Health check failed', err); // Log only
    // NEVER blocks trading
  });

Module Responsibility Matrix

Module Domain Blocks Trading? File
circuit_breaker.js SAFETY YES src/circuit_breaker.js
risk_engine.js SAFETY YES src/risk_engine.js
health_checker.js OBSERVABILITY NO src/health_checker.js
logger.js OBSERVABILITY NO src/logger.js
report_service.js OBSERVABILITY NO src/report_service.js

Validation

Acceptance Gate G3 (Recovery Scenarios) validates this boundary: - G3.1: Circuit breaker opens (SAFETY) and recovers correctly - G3.2: State projection reset (OBSERVABILITY) with no zombie state - G3.5: Recovery is observable (OBSERVABILITY never affects SAFETY logic)

References


This ADR was documented retrospectively on 2026-04-03 after implementation was completed in Phase 3 (completed 2026-03-27). The architecture has been operational since that date.