Skip to content

Commit 417fa3b

Browse files
unamedkrclaude
andcommitted
Add Python bindings, Docker, CONTRIBUTING.md
Python (bindings/python/turboquant_cli.py): subprocess wrapper — zero deps, works with any tq_run binary. generate(), perplexity(), memory_stats(), info() methods. Docker: Ubuntu 22.04, builds + tests in container. docker-compose.yml with model volume mount. CONTRIBUTING.md: Build guide, how to add architectures/KV types, cross-platform checklist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 5b20478 commit 417fa3b

6 files changed

Lines changed: 492 additions & 10 deletions

File tree

.dockerignore

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
build/
2+
build-*/
3+
.git/
4+
.claude/
5+
refs/
6+
models/
7+
*.gguf
8+
*.tqm
9+
*.safetensors
10+
__pycache__/
11+
*.pyc
12+
.venv/

CONTRIBUTING.md

Lines changed: 55 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,30 @@ cmake --build build -j$(nproc 2>/dev/null || sysctl -n hw.ncpu)
1212
ctest --test-dir build --output-on-failure
1313
```
1414

15+
Or with Docker:
16+
17+
```bash
18+
docker build -t turboquant .
19+
docker run turboquant models/model.tqm -p "Hello" -k turbo_kv_1b
20+
```
21+
22+
## Running Tests
23+
24+
```bash
25+
# All tests
26+
ctest --test-dir build --output-on-failure
27+
28+
# Specific test
29+
./build/test_polar
30+
./build/test_qjl
31+
32+
# With scoring harness (5-dimension evaluation)
33+
bash score.sh # Full evaluation
34+
bash score.sh --quick # Build + correctness only
35+
bash score.sh --bench # Performance benchmarks
36+
bash score.sh --quality # Quantization quality metrics
37+
```
38+
1539
## What to Work On
1640

1741
Check [Issues](https://github.com/quantumaikr/TurboQuant.cpp/issues) for tasks labeled `good first issue` or `help wanted`.
@@ -22,12 +46,31 @@ Check [Issues](https://github.com/quantumaikr/TurboQuant.cpp/issues) for tasks l
2246
- Metal GPU compute shaders
2347
- Long context benchmarks (8K, 32K, 128K tokens)
2448

49+
## Adding a New Model Architecture
50+
51+
1. Add the model config struct to `include/turboquant/tq_engine.h`
52+
2. Implement the forward pass in `src/engine/` (one file per architecture)
53+
3. Register the architecture in `tq_load_model()` in `src/engine/tq_model_loader.c`
54+
4. Add a test in `tests/` and an example in `examples/`
55+
5. Verify with `bash score.sh --quick`
56+
57+
## Adding a New KV Cache Type
58+
59+
1. Define the type enum in `include/turboquant/tq_types.h` (append to `tq_type` enum)
60+
2. Add block struct + `static_assert` size check in `include/turboquant/tq_spec.h`
61+
3. Implement `quantize`/`dequantize`/`attention` in `src/core/tq_<name>.c`
62+
4. Register in the dispatch table in `src/core/tq_traits.c`
63+
5. Add unit tests in `tests/test_<name>.cpp`
64+
6. Update `tools/tq_run.c` to accept the new type name in `parse_kv_type()`
65+
2566
## Code Standards
2667

27-
- **C11** for core library (`src/`), **C++17** for tests
68+
- **C11** for core library (`src/`), **C++17** for tests and CUDA/Metal wrappers
2869
- No external dependencies in core (libc/libm/pthread only)
29-
- Every public function needs a test
30-
- Run tests before submitting: `ctest --test-dir build`
70+
- Every block struct must have `static_assert` size verification
71+
- Every public function needs a unit test
72+
- ONNX LSB-first bit-packing convention for all quantized formats
73+
- Use `refs/` code as algorithm reference -- port to C, don't wrap Python
3174

3275
## Module Ownership
3376

@@ -38,23 +81,25 @@ Each module has exclusive files to prevent merge conflicts:
3881
| `polar` | `src/core/tq_polar.*`, `tests/test_polar.*` |
3982
| `qjl` | `src/core/tq_qjl.*`, `tests/test_qjl.*` |
4083
| `turbo` | `src/core/tq_turbo.*`, `tests/test_turbo.*` |
84+
| `uniform` | `src/core/tq_uniform.*`, `src/core/tq_value_quant.*` |
4185
| `engine` | `src/engine/*` |
4286
| `cache` | `src/cache/*` |
4387
| `simd` | `src/backend/cpu/*` |
4488

45-
## Pull Request Process
46-
47-
1. Fork and create a feature branch
48-
2. Make your changes
49-
3. Ensure all tests pass and no new warnings
50-
4. Submit a PR with a clear description
51-
5289
## Cross-Platform Checklist
5390

5491
Before submitting, verify:
5592
- [ ] NEON intrinsics are inside `#ifdef __ARM_NEON` guards
5693
- [ ] No GCC warnings (`-Wall -Wextra -Wpedantic`)
5794
- [ ] Scalar fallback exists for all SIMD code paths
95+
- [ ] Function pointer types match their typedefs
96+
97+
## Pull Request Process
98+
99+
1. Fork and create a feature branch
100+
2. Make your changes
101+
3. Ensure all tests pass and no new warnings
102+
4. Submit a PR with a clear description
58103

59104
## License
60105

Dockerfile

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
FROM ubuntu:22.04
2+
3+
# Avoid interactive prompts during package installation
4+
ENV DEBIAN_FRONTEND=noninteractive
5+
6+
# Install build dependencies
7+
RUN apt-get update && apt-get install -y --no-install-recommends \
8+
cmake \
9+
g++ \
10+
make \
11+
python3 \
12+
python3-pip \
13+
&& rm -rf /var/lib/apt/lists/*
14+
15+
# Copy project source (see .dockerignore for exclusions)
16+
COPY . /turboquant
17+
WORKDIR /turboquant
18+
19+
# Build the library, tools, and tests
20+
RUN cmake -B build \
21+
-DCMAKE_BUILD_TYPE=Release \
22+
-DTQ_BUILD_TESTS=ON \
23+
-DTQ_BUILD_BENCH=ON \
24+
&& cmake --build build -j$(nproc)
25+
26+
# Run the test suite
27+
RUN ctest --test-dir build --output-on-failure
28+
29+
# Default entrypoint: the tq_run inference CLI
30+
# Usage: docker run turboquant models/model.tqm -p "Hello" -k turbo_kv_1b
31+
ENTRYPOINT ["./build/tq_run"]

bindings/python/example.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
#!/usr/bin/env python3
2+
"""
3+
TurboQuant.cpp -- CLI Wrapper Example
4+
5+
Demonstrates the subprocess-based Python bindings that call the tq_run binary.
6+
No C FFI, no NumPy, no shared library -- just a model file and the tq_run binary.
7+
8+
Prerequisites:
9+
cmake -B build -DCMAKE_BUILD_TYPE=Release
10+
cmake --build build -j$(nproc)
11+
12+
Usage:
13+
python3 bindings/python/example.py models/qwen3.5-0.8b.tqm
14+
TURBOQUANT_BIN=./build/tq_run python3 bindings/python/example.py model.gguf
15+
"""
16+
17+
import sys
18+
import os
19+
20+
# Allow running from project root or bindings/python/
21+
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
22+
23+
from turboquant_cli import TurboQuant
24+
25+
26+
def main():
27+
if len(sys.argv) < 2:
28+
print("Usage: python example.py <model_path> [kv_type]")
29+
print()
30+
print(" model_path Path to .tqm, .safetensors, or .gguf model file")
31+
print(" kv_type KV cache type (default: turbo_kv_1b)")
32+
print()
33+
print("Examples:")
34+
print(" python example.py models/qwen3.5-0.8b.tqm")
35+
print(" python example.py model.gguf turbo_kv_3b")
36+
sys.exit(1)
37+
38+
model_path = sys.argv[1]
39+
kv_type = sys.argv[2] if len(sys.argv) > 2 else "turbo_kv_1b"
40+
41+
# Initialize
42+
print(f"Loading model: {model_path}")
43+
print(f"KV cache type: {kv_type}")
44+
tq = TurboQuant(model_path, kv_type=kv_type)
45+
print(tq)
46+
print()
47+
48+
# Generate text
49+
print("--- Generation ---")
50+
text = tq.generate("The capital of France is", max_tokens=64, temperature=0.7)
51+
print(text)
52+
print()
53+
54+
# Memory stats
55+
print("--- Memory Stats ---")
56+
try:
57+
stats = tq.memory_stats()
58+
print(f" Tokens in cache: {stats['tokens']}")
59+
print(f" Compressed size: {stats['compressed_mb']:.2f} MB")
60+
print(f" FP16 baseline: {stats['fp16_mb']:.2f} MB")
61+
print(f" Compression ratio: {stats['ratio']:.2f}x")
62+
print(f" Memory saved: {stats['saved_mb']:.2f} MB")
63+
except Exception as e:
64+
print(f" (Could not get memory stats: {e})")
65+
print()
66+
67+
# Perplexity (if a test file exists)
68+
test_file = os.path.join(os.path.dirname(model_path), "test.txt")
69+
if os.path.isfile(test_file):
70+
print("--- Perplexity ---")
71+
try:
72+
ppl = tq.perplexity(test_file)
73+
print(f" PPL: {ppl:.4f}")
74+
except Exception as e:
75+
print(f" (Could not compute PPL: {e})")
76+
else:
77+
print(f"--- Perplexity (skipped: no {test_file}) ---")
78+
79+
# Model info
80+
print()
81+
print("--- Model Info ---")
82+
try:
83+
info = tq.info()
84+
for line in info.split("\n")[:10]:
85+
print(f" {line}")
86+
except Exception as e:
87+
print(f" (Could not get model info: {e})")
88+
89+
90+
if __name__ == "__main__":
91+
main()

0 commit comments

Comments
 (0)