AI platform · 2026 Live

Meridian

Production-grade multi-agent research and execution pipeline with hybrid RAG, conflict resolution, and an LLM-as-judge rubric.

GitHub

Tech stack

Python FastAPI LangGraph PostgreSQL pgvector tsvector Redis Langfuse Anthropic Claude Docker Compose

The problem

Most multi-agent demos collapse the moment an API breaks, a goal is ambiguous, or two sources contradict. They lack typed contracts between agents, retries at the tool layer, conflict resolution as a first-class step, and observability wired in from day one. Without those, you cannot defend an agent's answer to a real stakeholder.

Goals

Accept a high-level business goal and return a structured, cited answer
Plan a dependency-aware task DAG with per-task acceptance criteria
Delegate to specialist sub-agents with typed input and output contracts
Resolve conflicts between sources instead of averaging them away
Score every run against a rubric so quality is measurable, not vibes

The solution

LangGraph orchestrator with planner, executor, and replanner nodes that walk the task DAG
Retrieval agent combining pgvector and tsvector via Reciprocal Rank Fusion
Web/API agent built on Tavily plus a generic HTTP tool, with retry and circuit breaking
Synthesis agent that aggregates, detects conflicts, and reconciles by weight and confidence
Redis session store for per-run state, Postgres for run logs and evaluation history
Self-hosted Langfuse trace for every agent call, token, and tool invocation
LLM-as-judge rubric scoring goal completion, accuracy, coverage, confidence, and hallucination risk

My role

→ Solo architect and engineer, system design to deploy
→ Four-layer architecture (orchestrator, agents, memory, observability) and the LangGraph wiring
→ Hybrid RAG pipeline with pgvector + tsvector RRF and per-agent context budgeting
→ Langfuse self-hosted stack, LLM-judge rubric, and run report generator
→ Docker Compose stack for Postgres, Redis, and Langfuse

UI direction

Operator-first run report and trace viewer, not an end-user surface. The product is the structured answer plus the trace and rubric backing it.

User flows

Goal-to-answer flow

1 Operator submits a business goal to the FastAPI endpoint
2 Planner decomposes the goal into a task DAG with declared dependencies and acceptance criteria
3 Executor walks the DAG, dispatching tasks to retrieval, web/API, and synthesis agents
4 Replanner re-enters the loop with failure context when a specialist returns low confidence or errors
5 Synthesis agent reconciles conflicting sources and emits a cited answer
6 LLM judge scores the run, full trace and rubric persist to Langfuse and Postgres

Screenshots

Click any image to open at full size.

Run report with the task DAG, KPI cards for duration, tokens, cost, and judge score, plus a synthesis excerpt.

Langfuse-style trace timeline with nested agent spans (orchestrator, planner, executor, retrieval, web, synthesis, judge) and five score cards at the bottom.

Judge rubric panel scoring goal completion, accuracy, coverage, confidence, and hallucination risk with a PASS verdict.

Key learnings

Typed Pydantic contracts between agents catch a class of integration bugs before they reach the executor
Conflict resolution as a first-class step beats hoping the synthesizer averages contradictions correctly
An LLM judge rubric turns agent quality from a vibe into a number you can regress against
Observability wired in from layer one is the difference between a demo and an answer you can defend

Want something like Meridian?

I'm open to senior contract work. Let's talk about what you're building.

Get in touch