Skip to content
Change the repository type filter

All

    Repositories list

    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      16k78k1.9k2.6kUpdated Apr 24, 2026Apr 24, 2026
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      Apache License 2.0
      8254.5k405334Updated Apr 24, 2026Apr 24, 2026
    • tpu-inference

      Public
      TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      Apache License 2.0
      16930053179Updated Apr 24, 2026Apr 24, 2026
    • speculators

      Public
      A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      Apache License 2.0
      763641621Updated Apr 24, 2026Apr 24, 2026
    • vllm-gaudi

      Public
      Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      Apache License 2.0
      12838271Updated Apr 24, 2026Apr 24, 2026
    • recipes

      Public
      Common recipes to run vLLM
      JavaScript
      Apache License 2.0
      2397381848Updated Apr 24, 2026Apr 24, 2026
    • vllm-ascend

      Public
      Community maintained hardware plugin for vLLM on Ascend
      C++
      Apache License 2.0
      1.1k2k1.4k418Updated Apr 24, 2026Apr 24, 2026
    • vllm-metal

      Public
      Community maintained hardware plugin for vLLM on Apple Silicon
      Python
      Apache License 2.0
      9796441Updated Apr 24, 2026Apr 24, 2026
    • vllm-xpu-kernels

      Public
      The vLLM XPU kernels for Intel GPU
      C++
      Apache License 2.0
      51361430Updated Apr 24, 2026Apr 24, 2026
    • semantic-router

      Public
      System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
      Go
      Apache License 2.0
      6393.8k10272Updated Apr 24, 2026Apr 24, 2026
    • vllm-project.github.io

      Public
      HTML
      873614Updated Apr 24, 2026Apr 24, 2026
    • flash-attention

      Public
      Fast and memory-efficient exact attention
      Python
      BSD 3-Clause "New" or "Revised" License
      2.6k121023Updated Apr 24, 2026Apr 24, 2026
    • router

      Public
      A high-performance and light-weight router for vLLM large scale deployment
      Rust
      Apache License 2.0
      722001319Updated Apr 24, 2026Apr 24, 2026
    • production-stack

      Public
      vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      Apache License 2.0
      3922.3k9765Updated Apr 23, 2026Apr 23, 2026
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      Apache License 2.0
      5624.8k28540Updated Apr 23, 2026Apr 23, 2026
    • llm-compressor

      Public
      Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      Apache License 2.0
      4903.2k6359Updated Apr 23, 2026Apr 23, 2026
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      Apache License 2.0
      6835042Updated Apr 23, 2026Apr 23, 2026
    • A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      Apache License 2.0
      832741214Updated Apr 22, 2026Apr 22, 2026
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      Apache License 2.0
      1461.1k6324Updated Apr 22, 2026Apr 22, 2026
    • vllm-daily

      Public
      vLLM Daily Summarization of Merged PRs
      45000Updated Apr 20, 2026Apr 20, 2026
    • FlashMLA

      Public
      C++
      MIT License
      1k1103Updated Apr 20, 2026Apr 20, 2026
    • dllm-plugin

      Public
      vLLM plugin for block-based diffusion language model (dLLM) support
      Python
      Apache License 2.0
      512110Updated Apr 16, 2026Apr 16, 2026
    • Stateful API logic for agentic applications using vLLM
      Python
      Apache License 2.0
      92313Updated Apr 16, 2026Apr 16, 2026
    • vLLM Model plugin for the encoder-decoder BART model
      Python
      Apache License 2.0
      71126Updated Apr 10, 2026Apr 10, 2026
    • vllm-skills

      Public
      Agent skills for vLLM
      Shell
      Apache License 2.0
      196632Updated Apr 3, 2026Apr 3, 2026
    • vllm-neuron

      Public
      Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      Apache License 2.0
      112931Updated Mar 20, 2026Mar 20, 2026
    • perf-dashboard

      Public
      Performance dashboard for vLLM
      Python
      2101Updated Mar 10, 2026Mar 10, 2026
    • media-kit

      Public
      vLLM Logo Assets
      5820Updated Jan 15, 2026Jan 15, 2026
    • vLLM-in-PyTorch-Conference-2025

      Public
      11200Updated Dec 14, 2025Dec 14, 2025
    • Python
      Apache License 2.0
      124531Updated Dec 4, 2025Dec 4, 2025
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.