Higress Proxy-Wasm plugins for GPUStack, providing AI API traffic processing, observability, and enhanced gateway features.
This repository contains custom Higress Proxy-Wasm plugins designed for GPUStack, distributed as a Python package that includes pre-compiled Wasm plugins and a built-in HTTP file server for serving them.
pip install gpustack-higress-pluginsRequirements: Python >= 3.10
-
gpustack-token-usage - Collects and injects token usage statistics into AI API responses. For streaming responses: time to first token, time per output token, and tokens per second. For non-streaming responses: tokens per second only. Supports real client IP injection and path-based filtering.
-
gpustack-set-header-pre-route - Automatically injects the route name and model name into HTTP request headers before routing, based on configurable path suffixes or prefixes.
# Start the built-in HTTP file server
gpustack-plugins start --port 8080
# Or with custom host
gpustack-plugins start --port 8080 --host 0.0.0.0The server will be available at http://localhost:8080.
# Health check
curl http://localhost:8080/
# Download a plugin
curl http://localhost:8080/wasm-plugins/gpustack-token-usage/1.0.0/plugin.wasm -o plugin.wasm
# Get metadata
curl http://localhost:8080/wasm-plugins/gpustack-token-usage/1.0.0/metadata.txtfrom gpustack_higress_plugins import create_app, router
# Embed in an existing FastAPI app
app.include_router(router)
# Or create a standalone app
app = create_app()apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
name: gpustack-token-usage
namespace: higress-system
spec:
url: http://plugin-server:8080/wasm-plugins/gpustack-token-usage/1.0.0/plugin.wasm
defaultConfig:
realIPToHeader: x-gpustack-real-ip- Go 1.24+
- Python 3.10+
- oras (
brew install oras) — required for fetching remote plugins
# Install Python dependencies
make dev
# Build all plugins (local + remote, requires oras)
make build
# Build only local Go plugins (no oras required)
make -C extensions build-all
# Build specific plugin
make -C extensions build PLUGIN_NAME=gpustack-token-usageIf
orasis not installed,make buildwill build local plugins only and print a warning.
# Test Go plugins
make test
# Test single plugin
make -C extensions test PLUGIN_NAME=gpustack-token-usagemake verify-whlReports each expected plugin (from extensions/*/VERSION and remote_plugins.yaml) as ✓ present, ✗ missing, or version mismatch, and checks that manifest.json is included.
Deploy the plugin server as a separate service and reference it from WasmPlugin resources:
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpustack-higress-plugins
spec:
template:
spec:
containers:
- name: plugins
image: gpustack/higress-plugins:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /
port: 8080
readinessProbe:
httpGet:
path: /
port: 8080# Build Docker image
make image
# Build with custom Go proxy
GOPROXY=https://goproxy.cn,direct make image
# Run standalone
docker run -p 8080:8080 gpustack/higress-plugins:latestgpustack-higress-plugins/
├── extensions/ # Go plugin source code
│ ├── gpustack-token-usage/
│ │ ├── main.go
│ │ ├── go.mod
│ │ └── VERSION
│ ├── gpustack-set-header-pre-route/
│ ├── remote_plugins.yaml # Remote OCI plugin config
│ └── Makefile
├── gpustack_higress_plugins/ # Python package
│ ├── __init__.py
│ ├── main.py # CLI + FastAPI app factory
│ ├── server.py # /wasm-plugins router
│ ├── plugins/ # Compiled .wasm files (generated)
│ └── manifest.json # Plugin index (generated)
├── scripts/ # Build scripts
│ ├── generate_manifest.py
│ ├── generate_metadata.py
│ └── fetch_remote_plugins.py
├── Dockerfile
├── pyproject.toml
└── Makefile
- Package version follows Semantic Versioning (MAJOR.MINOR.PATCH)
- Each plugin has its own version in
extensions/<name>/VERSION - Package version is set from the git tag at release time (placeholder
0.0.0in development) - RC releases (e.g.
0.2.0rc1) are published to TestPyPI; stable releases go to PyPI
Apache License 2.0