Add Python bindings, Docker, CONTRIBUTING.md

unamedkr · claude · unamedkr · commit 417fa3bd1190 · 2026-04-03T02:32:49.000+09:00
Python (bindings/python/turboquant_cli.py):
  subprocess wrapper — zero deps, works with any tq_run binary.
  generate(), perplexity(), memory_stats(), info() methods.

Docker:
  Ubuntu 22.04, builds + tests in container.
  docker-compose.yml with model volume mount.

CONTRIBUTING.md:
  Build guide, how to add architectures/KV types, cross-platform checklist.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,12 @@
+build/
+build-*/
+.git/
+.claude/
+refs/
+models/
+*.gguf
+*.tqm
+*.safetensors
+__pycache__/
+*.pyc
+.venv/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -12,6 +12,30 @@ cmake --build build -j$(nproc 2>/dev/null || sysctl -n hw.ncpu)
 ctest --test-dir build --output-on-failure
 ```
 
+Or with Docker:
+
+```bash
+docker build -t turboquant .
+docker run turboquant models/model.tqm -p "Hello" -k turbo_kv_1b
+```
+
+## Running Tests
+
+```bash
+# All tests
+ctest --test-dir build --output-on-failure
+
+# Specific test
+./build/test_polar
+./build/test_qjl
+
+# With scoring harness (5-dimension evaluation)
+bash score.sh              # Full evaluation
+bash score.sh --quick      # Build + correctness only
+bash score.sh --bench      # Performance benchmarks
+bash score.sh --quality    # Quantization quality metrics
+```
+
 ## What to Work On
 
 Check [Issues](https://github.com/quantumaikr/TurboQuant.cpp/issues) for tasks labeled `good first issue` or `help wanted`.
@@ -22,12 +46,31 @@ Check [Issues](https://github.com/quantumaikr/TurboQuant.cpp/issues) for tasks l
 - Metal GPU compute shaders
 - Long context benchmarks (8K, 32K, 128K tokens)
 
+## Adding a New Model Architecture
+
+1. Add the model config struct to `include/turboquant/tq_engine.h`
+2. Implement the forward pass in `src/engine/` (one file per architecture)
+3. Register the architecture in `tq_load_model()` in `src/engine/tq_model_loader.c`
+4. Add a test in `tests/` and an example in `examples/`
+5. Verify with `bash score.sh --quick`
+
+## Adding a New KV Cache Type
+
+1. Define the type enum in `include/turboquant/tq_types.h` (append to `tq_type` enum)
+2. Add block struct + `static_assert` size check in `include/turboquant/tq_spec.h`
+3. Implement `quantize`/`dequantize`/`attention` in `src/core/tq_<name>.c`
+4. Register in the dispatch table in `src/core/tq_traits.c`
+5. Add unit tests in `tests/test_<name>.cpp`
+6. Update `tools/tq_run.c` to accept the new type name in `parse_kv_type()`
+
 ## Code Standards
 
-- **C11** for core library (`src/`), **C++17** for tests
+- **C11** for core library (`src/`), **C++17** for tests and CUDA/Metal wrappers
 - No external dependencies in core (libc/libm/pthread only)
-- Every public function needs a test
-- Run tests before submitting: `ctest --test-dir build`
+- Every block struct must have `static_assert` size verification
+- Every public function needs a unit test
+- ONNX LSB-first bit-packing convention for all quantized formats
+- Use `refs/` code as algorithm reference -- port to C, don't wrap Python
 
 ## Module Ownership
 
@@ -38,23 +81,25 @@ Each module has exclusive files to prevent merge conflicts:
 | `polar` | `src/core/tq_polar.*`, `tests/test_polar.*` |
 | `qjl` | `src/core/tq_qjl.*`, `tests/test_qjl.*` |
 | `turbo` | `src/core/tq_turbo.*`, `tests/test_turbo.*` |
+| `uniform` | `src/core/tq_uniform.*`, `src/core/tq_value_quant.*` |
 | `engine` | `src/engine/*` |
 | `cache` | `src/cache/*` |
 | `simd` | `src/backend/cpu/*` |
 
-## Pull Request Process
-
-1. Fork and create a feature branch
-2. Make your changes
-3. Ensure all tests pass and no new warnings
-4. Submit a PR with a clear description
-
 ## Cross-Platform Checklist
 
 Before submitting, verify:
 - [ ] NEON intrinsics are inside `#ifdef __ARM_NEON` guards
 - [ ] No GCC warnings (`-Wall -Wextra -Wpedantic`)
 - [ ] Scalar fallback exists for all SIMD code paths
+- [ ] Function pointer types match their typedefs
+
+## Pull Request Process
+
+1. Fork and create a feature branch
+2. Make your changes
+3. Ensure all tests pass and no new warnings
+4. Submit a PR with a clear description
 
 ## License
 
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,31 @@
+FROM ubuntu:22.04
+
+# Avoid interactive prompts during package installation
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Install build dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        cmake \
+        g++ \
+        make \
+        python3 \
+        python3-pip \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy project source (see .dockerignore for exclusions)
+COPY . /turboquant
+WORKDIR /turboquant
+
+# Build the library, tools, and tests
+RUN cmake -B build \
+        -DCMAKE_BUILD_TYPE=Release \
+        -DTQ_BUILD_TESTS=ON \
+        -DTQ_BUILD_BENCH=ON \
+    && cmake --build build -j$(nproc)
+
+# Run the test suite
+RUN ctest --test-dir build --output-on-failure
+
+# Default entrypoint: the tq_run inference CLI
+# Usage: docker run turboquant models/model.tqm -p "Hello" -k turbo_kv_1b
+ENTRYPOINT ["./build/tq_run"]
diff --git a/bindings/python/example.py b/bindings/python/example.py
@@ -0,0 +1,91 @@
+#!/usr/bin/env python3
+"""
+TurboQuant.cpp -- CLI Wrapper Example
+
+Demonstrates the subprocess-based Python bindings that call the tq_run binary.
+No C FFI, no NumPy, no shared library -- just a model file and the tq_run binary.
+
+Prerequisites:
+    cmake -B build -DCMAKE_BUILD_TYPE=Release
+    cmake --build build -j$(nproc)
+
+Usage:
+    python3 bindings/python/example.py models/qwen3.5-0.8b.tqm
+    TURBOQUANT_BIN=./build/tq_run python3 bindings/python/example.py model.gguf
+"""
+
+import sys
+import os
+
+# Allow running from project root or bindings/python/
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+
+from turboquant_cli import TurboQuant
+
+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python example.py <model_path> [kv_type]")
+        print()
+        print("  model_path  Path to .tqm, .safetensors, or .gguf model file")
+        print("  kv_type     KV cache type (default: turbo_kv_1b)")
+        print()
+        print("Examples:")
+        print("  python example.py models/qwen3.5-0.8b.tqm")
+        print("  python example.py model.gguf turbo_kv_3b")
+        sys.exit(1)
+
+    model_path = sys.argv[1]
+    kv_type = sys.argv[2] if len(sys.argv) > 2 else "turbo_kv_1b"
+
+    # Initialize
+    print(f"Loading model: {model_path}")
+    print(f"KV cache type: {kv_type}")
+    tq = TurboQuant(model_path, kv_type=kv_type)
+    print(tq)
+    print()
+
+    # Generate text
+    print("--- Generation ---")
+    text = tq.generate("The capital of France is", max_tokens=64, temperature=0.7)
+    print(text)
+    print()
+
+    # Memory stats
+    print("--- Memory Stats ---")
+    try:
+        stats = tq.memory_stats()
+        print(f"  Tokens in cache:    {stats['tokens']}")
+        print(f"  Compressed size:    {stats['compressed_mb']:.2f} MB")
+        print(f"  FP16 baseline:      {stats['fp16_mb']:.2f} MB")
+        print(f"  Compression ratio:  {stats['ratio']:.2f}x")
+        print(f"  Memory saved:       {stats['saved_mb']:.2f} MB")
+    except Exception as e:
+        print(f"  (Could not get memory stats: {e})")
+    print()
+
+    # Perplexity (if a test file exists)
+    test_file = os.path.join(os.path.dirname(model_path), "test.txt")
+    if os.path.isfile(test_file):
+        print("--- Perplexity ---")
+        try:
+            ppl = tq.perplexity(test_file)
+            print(f"  PPL: {ppl:.4f}")
+        except Exception as e:
+            print(f"  (Could not compute PPL: {e})")
+    else:
+        print(f"--- Perplexity (skipped: no {test_file}) ---")
+
+    # Model info
+    print()
+    print("--- Model Info ---")
+    try:
+        info = tq.info()
+        for line in info.split("\n")[:10]:
+            print(f"  {line}")
+    except Exception as e:
+        print(f"  (Could not get model info: {e})")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/bindings/python/turboquant_cli.py b/bindings/python/turboquant_cli.py
diff --git a/docker-compose.yml b/docker-compose.yml