Skip to content
View coeusyk's full-sized avatar

Highlights

  • Pro

Block or report coeusyk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
coeusyk/README.md

Yash Karecha

AI/ML Systems Engineer
High-Performance Inference • Distributed Agentic Pipelines • Systems Programming
Bengaluru, IN


⚡ Engineering Focus

  • Inference Optimization: Architecting high-throughput, low-latency LLM serving layers utilizing vLLM, PagedAttention, and custom quantization strategies.
  • Agentic Workflows & RAG: Building production-grade Model Context Protocol (MCP) servers, multi-agent orchestrations, and context-aware local retrieval systems.
  • Performance Systems: Developing low-level state representations (Bitboards) and parallelized search algorithms in Java and Rust.

🔨 Featured Systems & Architecture

Project Core Architecture & Capabilities Engineering Decisions & Impact
Vex (Chess Engine) Modular bitboard engine using Alpha-Beta search, Lazy SMP parallelization, and Syzygy tablebase integration (~2000 Elo). Optimized evaluation via Adam-based Texel tuning over 820 parameters across 725k positions. Achieved a statistically significant $+185.7 \pm 54.2$ Elo gain validated via Sequential Probability Ratio Testing (SPRT).
inference-x Self-hosted, vLLM-backed inference server with an OpenAI-compatible API, model registry, SSE streaming, and a Textual TUI. Selected vLLM over native Hugging Face Transformers to leverage PagedAttention, significantly improving memory efficiency and generation throughput during concurrent streaming.
personal-notes-assistant Local-first, MCP-compliant RAG server indexing markdown knowledge bases into Milvus with pluggable LLM backends and live FS watching. Implemented the Model Context Protocol (MCP) architecture to turn a local vector database into an autonomous tool accessible by Claude Desktop.
OpenCast High-throughput data pipeline orchestrating ARIMA/Holt-Winters time-series forecasting over 498 ECO opening codes. Built a hybrid stack: a multi-threaded Rust fetcher for maximum throughput against the Lichess API paired with a Python/statsmodels engine for statistical forecasting.
chronicle-n8n Private, automated RSS ingestion and summarization engine streaming to a self-cleaning Notion data warehouse. Orchestrated a zero-ingress cost, fully local pipeline by embedding Ollama (llama3) inside an asynchronous workflow to maintain total data privacy.

🧰 Tech Stack

💻 Systems & Languages

PythonJavaRustTypeScriptJavaScriptSQLHTML/CSS

🧠 LLM Infrastructure & Agentic Frameworks

vLLMLangGraphCrewAIModel Context Protocol (MCP)LlamaIndexOpenAI Agents SDKLangChainOllama

📊 Machine Learning & Data Engineering

PyTorchTensorFlowscikit-learnpandasNumPystatsmodelsMilvusChromaDBMongoDBMySQL

⚙️ DevOps & Tooling

DockerGitHub Actionsn8nSupabaseGitPostmanFigma


📊 Core Performance Metrics

GitHub Streak

Pinned Loading

  1. chess-engine chess-engine Public

    A competitive Java chess engine with bitboard representation, alpha-beta search, classical evaluation, and full UCI protocol support.

    Java 1

  2. opencast opencast Public

    Chess opening analytics pipeline — monthly win-rate forecasting, engine-human delta scoring, and AI-generated insights across 500+ ECO openings from Lichess data.

    Python 2

  3. personal-notes-assistant personal-notes-assistant Public

    A RAG server for your Obsidian vault.

    Python 3

  4. sudokuverse sudokuverse Public

    Modern Flask-based Sudoku game with user authentication, statistics tracking, and Docker deployment. Interactive UI with multiple difficulty levels and performance analytics.

    Python 2

  5. drft drft Public

    A Python-powered AI playlist generator that turns your mood or scene prompt into a custom Spotify playlist. Uses Google Gemini for creative curation, n8n for workflow automation, and ngrok for secu…

    PowerShell 2

  6. chronicle-n8n chronicle-n8n Public

    An automated RSS feed summarization pipeline using n8n and a local Ollama LLM to create a private, self-cleaning content feed.

    2