Rampart
Enforcement-first operational OS for field service ops. Deterministic workflow engine plus AI-augmented incident command.
The problem
Field service operations sit on top of a workflow that has to survive incomplete closeouts, SLA games, manager overrides made for the wrong reason, and a 3am incident where nobody is sure who acknowledged what. Most teams reach for a CRM with a status dropdown and hope the audit story works itself out. It does not. The control system has to refuse a bad transition before it lands, capture every override with justification and approval, escalate on its own when an SLA breaks, and prove all of it after the fact.
Goals
- Make the workflow a real state machine with guards, side effects, and atomic transitions
- Centralize every authorization through one enforcement engine that returns allow, deny, allow_with_override, or escalate, with reason codes
- Capture every override with actor, role, justification, supervisor approval, and expiry
- Watch SLAs in the background and raise warning then breach events, escalating up the on-call ladder if nobody acknowledges
- Open an incident room automatically on breach so the bridge has the job, events, responders, timeline, and chat in one place
- Add an AI layer for triage, dispatch ranking, closeout drafting, and audit Q&A without letting it touch the deterministic core
The solution
- Declarative FSM in Python: states, transitions, guards run pre-transition, side effects run post-transition, transitions and audit rows committed in the same Postgres transaction
- Enforcement engine with a versioned rule catalog (R001 closeout evidence, R003 override capture, more to come), returning structured decisions with reason codes
- Redis Streams event bus that every subscriber (dashboard, SLA watcher, AI layer) reads from, decoupling the operational primitives
- SLA watcher as a background worker that emits sla.warning then sla.breach and auto-opens an incident in the same transaction
- Incident command engine with severity-based escalation ladder, on-call rotation lookup, responder tracking, and a system timeline
- Provider abstraction for the AI layer: a deterministic Echo provider for tests and offline demos, a Groq provider (llama-3.3-70b-versatile) for real LLM use, swappable through a single env var
- Four agents (triage, dispatch, closeout, audit chat) each writing to an ai_recommendations table the deterministic core never reads, so an LLM can suggest but never decide
- React command-centre dashboard polling the API, surfacing live job board, event stream, incident bridge, triage card, and audit chat panel
My role
- → Solo architect and engineer, system design through deploy
- → Five-layer architecture (engine, ops, AI, API, dashboard) and the JD-aligned module layout
- → FSM, enforcement engine, audit model, and override flow
- → SLA watcher, escalation ladder, and incident command bridge
- → Provider abstraction plus the four AI agents and their schemas
- → Phase plan, test strategy (51 passing tests against real Postgres), and screenshot-driven case study
UI direction
Operator-first command-centre dashboard. Left column is a colour-coded live job board, right column is the Redis Streams event tail. Click into a job and the incident room opens with timeline, responders, chat, triage card, and the action ladder. A floating audit chat panel sits bottom-right for natural-language questions over the audit log.
User flows
False closeout, denied and audited
- 1 Technician POSTs a closeout transition with missing photo, missing checklist, or out-of-radius geo
- 2 FSM consults the enforcement engine; R001 returns deny with the exact reason codes for each missing piece of evidence
- 3 Denied transition still writes to the audit log alongside a per-rule row listing what was missing
- 4 Job stays at closeout_pending; the audit story is complete (who tried, when, why blocked) even though state did not advance
- 5 Dashboard event stream shows transition.denied with the rule that fired
SLA breach to incident bridge
- 1 SLA watcher (background worker) sees an open job approaching its deadline and emits sla.warning to the Redis stream
- 2 Deadline passes with no closeout; watcher emits sla.breach and the incident bridge opens a HIGH incident in the same transaction
- 3 On-call dispatcher is seated as the level-1 responder, a system message lands in the incident chat
- 4 Escalation to level 2 pulls the on-call supervisor in; every responder change goes through the ladder, every message persists
- 5 Supervisor approves a manual override (R003): justification recorded, expiry set, the override row links the denied transition to the new allow_with_override transition
AI triage and audit chat, deterministic core untouched
- 1 Incident opens; the API endpoint POSTs to /ai/triage/incidents/{id} which builds a structured context (job, recent events, severity, responders) in a single transaction
- 2 Provider.generate_json runs the triage schema; output (severity tier, recommended action, confidence, rationale) lands in ai_recommendations
- 3 Dashboard triage card polls /ai/recommendations/by-target and surfaces the recommendation with a re-run button
- 4 Operator asks an audit question in the audit chat panel; the agent hands a fixed-window slice (last 50 transitions, last 10 incidents, last 30 stream events) plus the question to the provider
- 5 Answer plus citations write back; deterministic core never imports the AI module, no SQL access for the LLM, every output is auditable
Screenshots
Click any image to open at full size.
Key learnings
- Atomic transition plus audit-row commit is the single design decision that buys the whole audit story; trying to log after the fact loses the ordering guarantee
- Returning a structured decision (allow, deny, allow_with_override, escalate) with reason codes turns the enforcement engine into the one place to reason about authorization, which is worth more than the rules themselves
- An Echo provider that produces plausible structured outputs is not a stub. It is the proof that swapping providers is a one-line factory change, and it keeps the test suite hermetic
- Keeping the deterministic core blind to ai_recommendations is what lets the AI layer ship without changing the safety story: an LLM can suggest, a human commits, and the existing rules run on the commit
- Phases as commits (Phase 1, 2, 3, 4 each a single commit on main) make the progression legible to a reviewer who scrolls git log before reading any code
Want something like Rampart?
I'm open to senior contract work. Let's talk about what you're building.
Get in touch