Production-grade multi-agent AI system for luxury chauffeur booking, powered by Semantic Kernel, Azure AI Search (RAG), and real-time SQL-based availability.
Bravado Solutions Enterprise software development company building scalable AI systems, SaaS platforms, and cloud-native applications.
Most AI systems fail because they stop at conversation. This project demonstrates a real-world agentic AI system that goes beyond chat to:
- Understand user intent
- Retrieve enterprise knowledge (RAG)
- Check real-time availability
- Execute bookings with transactional safety
- Maintain conversational memory
- Operate as a scalable API service
An AI Concierge capable of:
- Answering fleet, pricing, and service queries (RAG).
- Checking real-time vehicle availability.
- Booking chauffeur rides with atomic transactions.
- Maintaining session-based conversations.
- Persisting memory for context-aware responses.
This system follows an enterprise agentic architecture:
graph TD
%% User Layer
User((User)) -->|Booking / Info Request| API[FastAPI Orchestrator]
subgraph "The Brain: Agentic Core"
API --> SK[Semantic Kernel]
SK <-->|Reasoning Loop| GPT[Azure OpenAI GPT-4o]
SK <-->|Context Retrieval| Mem[Persistent Memory Store]
end
subgraph "The Hands: Plugin Layer"
SK --> Plugins{Function Dispatcher}
Plugins --> KP[Knowledge Plugin]
Plugins --> BP[Booking Plugin]
Plugins --> AP[Availability Plugin]
end
subgraph "Data & Knowledge"
KP -->|Semantic Search| AIS[Azure AI Search]
BP -->|SQL Transactions| DB[(SQLite Fleet DB)]
AIS ---|1M+ Docs| Docs[Fleet, Pricing, Events]
end
%% Response Flow
Plugins -->|Executed Action| SK
SK -->|Final Answer| API
API -->|Confirmation| User
%% Styling
style SK fill:#0078d4,stroke:#005a9e,color:#fff
style GPT fill:#107c10,stroke:#094a09,color:#fff
style Mem fill:#5c2d91,stroke:#3a1c5c,color:#fff
style DB fill:#f29111,stroke:#b36b08,color:#fff
- User Request → Initiated via FastAPI or Local CLI.
- Orchestrator (Semantic Kernel) → Plans response, analyzes intent, and invokes tools.
- Plugins (Tool Layer) → Knowledge (Azure AI Search), Availability (Fleet DB), and Booking logic.
- Memory Layer → Stores and retrieves past interactions for long-term context.
- Execution → Final response generated and action performed.
- Central Reasoning Engine: Powered by Semantic Kernel to coordinate the model and tools.
- Task Planning: Handles intent recognition and dynamic tool selection.
- Execution Loop: Manages the flow between the LLM and plugin results.
- Knowledge Plugin: High-speed RAG via Azure AI Search for fleet and policy queries.
- Availability Plugin: Real-time queries to the Fleet Database for vehicle stock.
- Booking Plugin: Transactional logic to secure rides and update SQL state.
- Persistent Storage: Interaction history stored via Azure AI Search Vector Store.
- Context Retention: Maintains user preferences across multiple sessions.
- Extensible: Supports Redis, Pinecone, or other vector providers.
- Framework: Built with FastAPI for high-concurrency async performance.
- Infrastructure: Includes session management, Redis rate limiting, and structured logging.
chauffeur-agentic-rag/
│
├── main.py # CLI entry point for local testing
├── app.py # FastAPI application entry point
├── .env.example # Template for environment variables
├── requirements.txt # Python dependencies
├── Dockerfile # API container configuration
├── docker-compose.yml # Multi-container orchestration (API + Redis)
│
├── scripts/
│ └── init_fleet_db.py # Database schema and seed data setup
│
├── kernel/
│ └── builder.py # Semantic Kernel initialization & configuration
│
├── plugins/
│ ├── knowledge_plugin.py # RAG & Azure AI Search logic
│ ├── booking_plugin.py # Transactional ride booking operations
│ └── availability_plugin.py # Real-time fleet SQL queries
│
├── services/
│ └── orchestrator.py # Core agentic reasoning & planning logic
│
├── memory/
│ └── vector_store.py # Persistent context & vector search implementation
│
└── utils/
└── pii_utils.py # Data privacy and PII masking utilities
---
- Production-Ready: Focused on real-world workflows, not just chat-based demos.
- Modular Architecture: Pluggable tools and agents using Semantic Kernel.
- Scalable: API-first design containerized with Docker and Redis.
- Secure by Design: Strict environment isolation and PII masking for memory.
- LLM: Azure OpenAI (GPT-4o)
- RAG: Azure AI Search
- Orchestration: Semantic Kernel v1.x
- API: FastAPI / Uvicorn
- Throttling: Redis
- Database: SQLite (Persistent via Docker Volumes)
We help enterprises move from AI experimentation to production-grade intelligent systems. Our team specializes in Agentic RAG, Cloud-Native SaaS, and Enterprise AI Orchestration.
- Portfolio Case Study: Detailed Agentic AI System - Empire Limousine
🌐 bravadosolutions.com
📧 contact@bravadosolutions.com
Built with ❤️ by Bravado Solutions.