All projects
Axon banner
AI platform · 2025 Live

Axon

Multi-tenant AI agentic SaaS with LangGraph, RAG, MCP, and production observability.

Tech stack
Next.js 15 Fastify 5 Python LangGraph FastAPI PostgreSQL 16 pgvector Redis BullMQ Drizzle ORM Better Auth Langfuse Prometheus Grafana Loki

The problem

AI agents in production need audit trails, cost control, and tenant isolation that tutorials skip. Most open-source agent frameworks assume a single user and a single wallet. Real multi-tenant AI platforms have to account for per-tenant spend caps, per-tenant data isolation, and full observability on tokens, latency, and errors.

Goals

  • Queue-first architecture so long-running agent work never blocks a request
  • Runtime row-level security so tenant isolation survives a leaky query
  • Full observability stack covering traces, metrics, logs, and LLM-specific telemetry
  • MCP tool integration so agents can reach into databases and internal APIs cleanly
  • Billing that tracks real token usage, not estimated seat counts

The solution

  • LangGraph agents with streaming chat end-to-end
  • RAG pipeline with upload, chunk, embed, and hybrid search over pgvector
  • MCP servers (postgres and a custom template) with an agent bridge
  • Runtime RLS via the axon_app role and a withOrg pattern on every query
  • BullMQ workers with a Bull Board admin UI for queue visibility
  • Prometheus, Grafana, Loki, and Langfuse wired end-to-end for observability
  • Stripe billing plus CI/CD and production deploy scaffolding

My role

  • Architecture across Next.js 15, Fastify, and Python FastAPI services
  • Drizzle schema design with runtime RLS enforcement
  • LangGraph agent graphs and streaming chat pipeline
  • RAG ingestion pipeline and hybrid search
  • Observability stack integration and dashboards
  • Deployment on Oracle Cloud with Caddy and Cloudflare Tunnel

UI direction

Admin-first aesthetic with a focus on queue and telemetry visibility over chrome. Streaming chat UI built for long-running agent loops, not for casual chat.

User flows

Agentic chat flow

  1. 1 User sends a message in a streaming chat session
  2. 2 Request enters the queue with tenant scoping applied
  3. 3 Worker runs the LangGraph agent with tools bound via MCP
  4. 4 Retrieval hits pgvector and full-text search in parallel
  5. 5 Response streams back with token usage logged to Langfuse

Document ingestion

  1. 1 User uploads a document through the app
  2. 2 File is chunked and embedded in a worker
  3. 3 Chunks land in pgvector with tenant scoping
  4. 4 Hybrid retrieval becomes available to the agent

Key learnings

  • Runtime RLS via a dedicated Postgres role is the most defensible tenant isolation pattern for AI workloads
  • Queue-first design is not optional once you run agent loops with tool calls
  • Langfuse pays for itself in one debugging session when an agent burns tokens on the wrong path
  • MCP is the right abstraction for giving agents access to databases and internal APIs without leaking credentials

Want something like Axon?

I'm open to senior contract work. Let's talk about what you're building.

Get in touch