AI/ML Systems Engineer
High-Performance Inference • Distributed Agentic Pipelines • Systems Programming
Bengaluru, IN
- Inference Optimization: Architecting high-throughput, low-latency LLM serving layers utilizing vLLM, PagedAttention, and custom quantization strategies.
- Agentic Workflows & RAG: Building production-grade Model Context Protocol (MCP) servers, multi-agent orchestrations, and context-aware local retrieval systems.
- Performance Systems: Developing low-level state representations (Bitboards) and parallelized search algorithms in Java and Rust.
| Project | Core Architecture & Capabilities | Engineering Decisions & Impact |
|---|---|---|
| Vex (Chess Engine) | Modular bitboard engine using Alpha-Beta search, Lazy SMP parallelization, and Syzygy tablebase integration (~2000 Elo). | Optimized evaluation via Adam-based Texel tuning over 820 parameters across 725k positions. Achieved a statistically significant |
| inference-x | Self-hosted, vLLM-backed inference server with an OpenAI-compatible API, model registry, SSE streaming, and a Textual TUI. | Selected vLLM over native Hugging Face Transformers to leverage PagedAttention, significantly improving memory efficiency and generation throughput during concurrent streaming. |
| personal-notes-assistant | Local-first, MCP-compliant RAG server indexing markdown knowledge bases into Milvus with pluggable LLM backends and live FS watching. | Implemented the Model Context Protocol (MCP) architecture to turn a local vector database into an autonomous tool accessible by Claude Desktop. |
| OpenCast | High-throughput data pipeline orchestrating ARIMA/Holt-Winters time-series forecasting over 498 ECO opening codes. | Built a hybrid stack: a multi-threaded Rust fetcher for maximum throughput against the Lichess API paired with a Python/statsmodels engine for statistical forecasting. |
| chronicle-n8n | Private, automated RSS ingestion and summarization engine streaming to a self-cleaning Notion data warehouse. | Orchestrated a zero-ingress cost, fully local pipeline by embedding Ollama (llama3) inside an asynchronous workflow to maintain total data privacy. |
Python • Java • Rust • TypeScript • JavaScript • SQL • HTML/CSS
vLLM • LangGraph • CrewAI • Model Context Protocol (MCP) • LlamaIndex • OpenAI Agents SDK • LangChain • Ollama
PyTorch • TensorFlow • scikit-learn • pandas • NumPy • statsmodels • Milvus • ChromaDB • MongoDB • MySQL
Docker • GitHub Actions • n8n • Supabase • Git • Postman • Figma



