An advanced, high-performance open-source ContextFlow featuring real-time voice interaction and deep memory capabilities. By leveraging a cutting-edge Hybrid Graph-Vector RAG (Retrieval-Augmented Generation) architecture, this project combines the structured relationship-tracking of knowledge graphs with the high-speed semantic retrieval of vector databases to deliver unparalleled contextual awareness and ultra-low latency conversational AI.
- Hybrid Graph-Vector RAG 🧠 — Combines PostgreSQL (
pgvector) for rapid semantic retrieval with Neo4j for traceable, multi-hop reasoning via Reciprocal Rank Fusion (RRF). - Deep Personalization & Memory 🗂️ — Mem0 maintains an evolving long-term memory profile per user, automatically extracting entities and preferences from conversations.
- Multi-Agent Orchestration 🤖 — CrewAI powers modular, role-based AI crews (Support Crew for RAG, Memory Crew for graph updates) decoupled from the voice transport layer.
- Real-Time Voice & Turn Detection 🔊 — LiveKit handles sub-millisecond WebRTC voice streaming, native turn-detection, and graceful barge-in interruption.
- High-Performance Async Backend ⚡ — FastAPI with async SQLAlchemy 2.0, FastCRUD for zero-boilerplate data access, and Pydantic V2 for strict schema validation.
- Evaluation & Traceability 📊 — LangSmith / OpenTelemetry for execution tracing, RAGAS for RAG quality evaluation (Faithfulness, Context Precision).
| Component | Technology | Description |
|---|---|---|
| Backend Framework | FastAPI | Async-first framework for real-time voice and streaming LLMs. |
| Database (Relational) | PostgreSQL + SQLAlchemy 2.0 | Async ORM with connection pooling via asyncpg. |
| Database (Vector) | PostgreSQL + pgvector | Semantic similarity search — ACID-compliant, co-located with relational data. |
| Graph Database | Neo4j | Multi-hop relationship mapping for deep memory and reasoning. |
| CRUD Layer | FastCRUD + Pydantic V2 | Schema-driven, zero-boilerplate CRUD with Alembic migrations. |
| AI Orchestration | CrewAI | Role-based multi-agent crews (Support, Memory) with YAML config. |
| Memory Engine | Mem0 | Automated long-term user memory backed by pgvector + Neo4j. |
| Real-Time Voice | LiveKit | WebRTC audio, STT/TTS pipeline, turn detection, barge-in handling. |
| Frontend Testing Harness | Gradio | Rapid Python UI for chat demos, workflow validation, and RAG / memory debugging before a production frontend exists. |
| Cache & Session | Redis | Sub-millisecond session state and short-term context. |
| Evaluation | RAGAS + LangSmith | RAG quality metrics and full execution trace observability. |
ContextFlow/
├── app/ # Main FastAPI application (CrewAI, DBs, Auth)
├── agents/
│ ├── voice/ # 🎙️ LiveKit Voice Worker (Standalone uv project)
│ │ ├── src/agent.py # Voice entrypoint
│ │ └── pyproject.toml # Isolated dependencies (livekit-agents, httpx)
│ └── crews/ # CrewAI orchestration (Support, Memory)
├── tests/ # pytest (API, agents, RAGAS eval)
├── alembic/ # Database migrations
├── docker-compose.yml # Full stack deployment (FastAPI, Voice Worker, DBs)
└── pyproject.toml # FastAPI Main Dependencies
To avoid dependency conflicts (specifically opentelemetry-sdk mismatches between crewai and livekit-agents), the Voice Agent runs as a completely isolated process.
- Voice Worker (
agents/voice): Connects to LiveKit Cloud, streams STT/TTS, handles turn detection. - FastAPI RAG Endpoint: Runs CrewAI, connects to Postgres + Neo4j.
- Integration: The Voice Worker calls
POST /api/v1/rag/queryover HTTP to fetch knowledge, keeping environments clean.
Ensure you have the following installed before proceeding:
- Python: 3.11+ (for FastAPI Backend) & 3.13 (for Voice Worker)
- Databases: PostgreSQL 15+ (with
pgvector), Neo4j 5+, and a Redis Server (unless using Docker) - Package Manager:
uv(pip install uv)
Before starting the application, you must configure the environment variables:
# Copy the example environment file
cp .env.example .envNote: Open the newly created
.envfile and populate it with your specific API keys (OpenAI, LiveKit) and database credentials.
You can start the project using one of the following methods:
This method spins up the entire environment—including the FastAPI backend, Voice Worker, and all databases (PostgreSQL, Redis, Neo4j)—with hot-reloading enabled.
# Run the complete stack in the background
docker compose --profile dev --profile voice-dev up -d --build
# To view logs: docker compose logs -f
# To stop the stack: docker compose downIf you prefer not to use Docker for the application services, you can run the backend and the voice worker separately using two terminals. Ensure your databases are already running locally.
Terminal 1: Start the FastAPI Backend
uv sync # Installs the main dependencies
uv run uvicorn app.main:app --reloadTerminal 2: Start the LiveKit Voice Worker
cd agents/voice
uv sync # Creates an isolated environment for the Voice Agent
uv run python src/agent.py devThe repo includes an optional demo dependency group for a lightweight frontend testing surface with Gradio:
uv sync --extra demo
uv run python demo/gradio_app.pyRecommended design direction for the Gradio layer:
- Start with
gr.ChatInterfacefor the fastest text-chat test harness around the FastAPI endpoints. - Move to
gr.Blocksonce you need custom layout, feedback controls, session inspection, or side panels for memory / RAG debugging. - Use tabs or grouped panels to separate core chat, memory inspection, and retrieval diagnostics during development.
- Treat Gradio as the frontend testing platform, not the long-term production UI.
Useful references:
- ChatInterface docs: https://www.gradio.app/docs/gradio/chatinterface
- Blocks guide: https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks
- API reference: https://www.gradio.app/docs
# FastAPI unit & integration tests
uv run pytest
# RAG quality evaluation (requires OpenAI API key)
uv run pytest tests/test_eval/
# Voice worker tests (isolated env)
cd agents/voice
uv run pytestInitial RAGAS support is scaffolded for answer-level evaluation now, with a clean path to add retrieval metrics once the backend exposes retrieved contexts in an eval-friendly response.
# Requires OPENAI_API_KEY and a running FastAPI backend
uv run python scripts/run_ragas_eval.pyCurrent fixture dataset:
tests/test_eval/fixtures/ragas_cases.json
Current framework entrypoints:
app/evals/ragas_framework.pyscripts/run_ragas_eval.py
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.