ADR-005: Observability Boundaries
Status: ACCEPTED (Retrospectively documented)
Date: 2026-04-03
Phase: 3 (Observability) - Delivered 2026-03-27
Decision Date: 2026-03-27
Author: Forward Team
Supersedes: n/a
Superseded by: n/a
Context
Phase 3 (Observability) required a clear separation between SAFETY and OBSERVABILITY concerns to prevent observability failures from blocking trading operations. The system needed to handle monitoring, reporting, and alerting without interfering with the critical execution path.
Decision
We establish a strict boundary between SAFETY and OBSERVABILITY domains:
Boundary Rules
| Domain | Responsibility | Failure Impact | Examples |
|---|---|---|---|
| SAFETY | Trading-critical checks | BLOCKS trading | Circuit breaker, position limits, balance checks |
| OBSERVABILITY | Monitoring & reporting | Never blocks | Health checks, logs, reports, metrics |
Implementation
┌─────────────────────────────────────────────────────────────┐
│ SAFETY DOMAIN │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Circuit │ │ Position │ │ Balance │ │
│ │ Breaker │ │ Risk Engine │ │ Projection │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Failure → BLOCK trading → Event store → Recovery │
└─────────────────────────────────────────────────────────────┘
│
▼ (triggers, never blocks)
┌─────────────────────────────────────────────────────────────┐
│ OBSERVABILITY DOMAIN │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Health │ │ Logger │ │ Report │ │
│ │ Checker │ │ Service │ │ Service │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Failure → WARN only → Never affects SAFETY │
└─────────────────────────────────────────────────────────────┘
Consequences
Positive
- Trading reliability: OBSERVABILITY failures never halt trading
- Clear operational model: Teams know which failures require immediate action
- Testable separation: G3 Recovery tests verify boundary adherence
- Simplified incident response: SAFETY alerts = stop trading; OBSERVABILITY alerts = investigate
Negative
- Duplication: Some checks exist in both domains (e.g., health vs. mandatory circuit breaker)
- Monitoring blind spots: If OBSERVABILITY fails, we may not detect issues until SAFETY triggers
Boundary Enforcement
Code Patterns
// SAFETY check - can block
circuitBreaker.recordSafetyFailure('memory_critical', details);
if (circuitBreaker.isOpen()) {
blockTrading(); // HARD STOP
}
// OBSERVABILITY check - never blocks
health.checkMemory()
.then(result => {
if (result.status === 'CRITICAL') {
alertEngine.warn('Memory high'); // Notification only
}
})
.catch(err => {
logger.error('Health check failed', err); // Log only
// NEVER blocks trading
});
Module Responsibility Matrix
| Module | Domain | Blocks Trading? | File |
|---|---|---|---|
circuit_breaker.js |
SAFETY | YES | src/circuit_breaker.js |
risk_engine.js |
SAFETY | YES | src/risk_engine.js |
health_checker.js |
OBSERVABILITY | NO | src/health_checker.js |
logger.js |
OBSERVABILITY | NO | src/logger.js |
report_service.js |
OBSERVABILITY | NO | src/report_service.js |
Validation
Acceptance Gate G3 (Recovery Scenarios) validates this boundary: - G3.1: Circuit breaker opens (SAFETY) and recovers correctly - G3.2: State projection reset (OBSERVABILITY) with no zombie state - G3.5: Recovery is observable (OBSERVABILITY never affects SAFETY logic)
References
- Phase 3 Deliverables
- G3 Acceptance Tests
src/health.js- Health service with explicit domain classificationsrc/circuit_breaker.js- SAFETY boundary enforcer
This ADR was documented retrospectively on 2026-04-03 after implementation was completed in Phase 3 (completed 2026-03-27). The architecture has been operational since that date.