All projects
Rampart banner
Ops platform · 2026 Live

Rampart

Enforcement-first operational OS for field service ops. Deterministic workflow engine plus AI-augmented incident command.

Tech stack
Python FastAPI Pydantic v2 PostgreSQL Redis Streams APScheduler React Vite Groq Docker Compose

The problem

Field service operations sit on top of a workflow that has to survive incomplete closeouts, SLA games, manager overrides made for the wrong reason, and a 3am incident where nobody is sure who acknowledged what. Most teams reach for a CRM with a status dropdown and hope the audit story works itself out. It does not. The control system has to refuse a bad transition before it lands, capture every override with justification and approval, escalate on its own when an SLA breaks, and prove all of it after the fact.

Goals

  • Make the workflow a real state machine with guards, side effects, and atomic transitions
  • Centralize every authorization through one enforcement engine that returns allow, deny, allow_with_override, or escalate, with reason codes
  • Capture every override with actor, role, justification, supervisor approval, and expiry
  • Watch SLAs in the background and raise warning then breach events, escalating up the on-call ladder if nobody acknowledges
  • Open an incident room automatically on breach so the bridge has the job, events, responders, timeline, and chat in one place
  • Add an AI layer for triage, dispatch ranking, closeout drafting, and audit Q&A without letting it touch the deterministic core

The solution

  • Declarative FSM in Python: states, transitions, guards run pre-transition, side effects run post-transition, transitions and audit rows committed in the same Postgres transaction
  • Enforcement engine with a versioned rule catalog (R001 closeout evidence, R003 override capture, more to come), returning structured decisions with reason codes
  • Redis Streams event bus that every subscriber (dashboard, SLA watcher, AI layer) reads from, decoupling the operational primitives
  • SLA watcher as a background worker that emits sla.warning then sla.breach and auto-opens an incident in the same transaction
  • Incident command engine with severity-based escalation ladder, on-call rotation lookup, responder tracking, and a system timeline
  • Provider abstraction for the AI layer: a deterministic Echo provider for tests and offline demos, a Groq provider (llama-3.3-70b-versatile) for real LLM use, swappable through a single env var
  • Four agents (triage, dispatch, closeout, audit chat) each writing to an ai_recommendations table the deterministic core never reads, so an LLM can suggest but never decide
  • React command-centre dashboard polling the API, surfacing live job board, event stream, incident bridge, triage card, and audit chat panel

My role

  • Solo architect and engineer, system design through deploy
  • Five-layer architecture (engine, ops, AI, API, dashboard) and the JD-aligned module layout
  • FSM, enforcement engine, audit model, and override flow
  • SLA watcher, escalation ladder, and incident command bridge
  • Provider abstraction plus the four AI agents and their schemas
  • Phase plan, test strategy (51 passing tests against real Postgres), and screenshot-driven case study

UI direction

Operator-first command-centre dashboard. Left column is a colour-coded live job board, right column is the Redis Streams event tail. Click into a job and the incident room opens with timeline, responders, chat, triage card, and the action ladder. A floating audit chat panel sits bottom-right for natural-language questions over the audit log.

User flows

False closeout, denied and audited

  1. 1 Technician POSTs a closeout transition with missing photo, missing checklist, or out-of-radius geo
  2. 2 FSM consults the enforcement engine; R001 returns deny with the exact reason codes for each missing piece of evidence
  3. 3 Denied transition still writes to the audit log alongside a per-rule row listing what was missing
  4. 4 Job stays at closeout_pending; the audit story is complete (who tried, when, why blocked) even though state did not advance
  5. 5 Dashboard event stream shows transition.denied with the rule that fired

SLA breach to incident bridge

  1. 1 SLA watcher (background worker) sees an open job approaching its deadline and emits sla.warning to the Redis stream
  2. 2 Deadline passes with no closeout; watcher emits sla.breach and the incident bridge opens a HIGH incident in the same transaction
  3. 3 On-call dispatcher is seated as the level-1 responder, a system message lands in the incident chat
  4. 4 Escalation to level 2 pulls the on-call supervisor in; every responder change goes through the ladder, every message persists
  5. 5 Supervisor approves a manual override (R003): justification recorded, expiry set, the override row links the denied transition to the new allow_with_override transition

AI triage and audit chat, deterministic core untouched

  1. 1 Incident opens; the API endpoint POSTs to /ai/triage/incidents/{id} which builds a structured context (job, recent events, severity, responders) in a single transaction
  2. 2 Provider.generate_json runs the triage schema; output (severity tier, recommended action, confidence, rationale) lands in ai_recommendations
  3. 3 Dashboard triage card polls /ai/recommendations/by-target and surfaces the recommendation with a re-run button
  4. 4 Operator asks an audit question in the audit chat panel; the agent hands a fixed-window slice (last 50 transitions, last 10 incidents, last 30 stream events) plus the question to the provider
  5. 5 Answer plus citations write back; deterministic core never imports the AI module, no SQL access for the LLM, every output is auditable

Screenshots

Click any image to open at full size.

Key learnings

  • Atomic transition plus audit-row commit is the single design decision that buys the whole audit story; trying to log after the fact loses the ordering guarantee
  • Returning a structured decision (allow, deny, allow_with_override, escalate) with reason codes turns the enforcement engine into the one place to reason about authorization, which is worth more than the rules themselves
  • An Echo provider that produces plausible structured outputs is not a stub. It is the proof that swapping providers is a one-line factory change, and it keeps the test suite hermetic
  • Keeping the deterministic core blind to ai_recommendations is what lets the AI layer ship without changing the safety story: an LLM can suggest, a human commits, and the existing rules run on the commit
  • Phases as commits (Phase 1, 2, 3, 4 each a single commit on main) make the progression legible to a reviewer who scrolls git log before reading any code

Want something like Rampart?

I'm open to senior contract work. Let's talk about what you're building.

Get in touch