Skip to content

Commit 6ff7c1a

Browse files
unamedkrclaude
andcommitted
README: fix overstated claims found by audit
- "Zero dependencies" → "No external libraries" (pthreads is a system dep) - "5 architectures" → accurate description (3 code paths: Llama/Qwen3.5 share model_type=0, Gemma 3/4 share model_type=1, Qwen2-MoE) - "4x longer context" → "~4x" with footnote that numbers are estimates based on KV memory reduction, not actual measurements - Dependencies table: "Zero (libc only)" → "libc + pthreads only" - Apply same fixes to README.ko.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6342379 commit 6ff7c1a

2 files changed

Lines changed: 12 additions & 10 deletions

File tree

README.ko.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
![quant.cpp Hero](docs/assets/hero.png)
44

5-
로컬 LLM을 위한 미니멀 C 추론 엔진. 33K LOC. 외부 의존성 없음.
5+
로컬 LLM을 위한 미니멀 C 추론 엔진. 33K LOC. 외부 라이브러리 없음.
66

77
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)]()
88
[![CI](https://img.shields.io/github/actions/workflow/status/quantumaikr/quant.cpp/ci.yml?label=CI)]()
@@ -111,7 +111,7 @@ cmake --build build -j$(nproc)
111111
| Gemma 3 270M | Gemma 3 | 270M | 4-bit K verified |
112112
| Gemma 4 E2B | Gemma 4 | 2B | WIP |
113113

114-
5개 아키텍처: Llama, Gemma 3, Gemma 4, Qwen3.5 (DeltaNet), Qwen2-MoE.
114+
아키텍처: Llama/Qwen3.5 (공유 경로), Gemma 3/4 (sliding + full attention), Qwen2-MoE.
115115

116116
---
117117

README.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
Embeddable LLM inference in pure C.
66

7-
33K LOC. Zero dependencies. Read it in an afternoon.
7+
33K LOC. No external libraries. Read it in an afternoon.
88

99
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)]()
1010
[![CI](https://img.shields.io/github/actions/workflow/status/quantumaikr/quant.cpp/ci.yml?label=CI)]()
@@ -14,13 +14,15 @@ Embeddable LLM inference in pure C.
1414

1515
## What quant.cpp does
1616

17-
**4x longer context on the same hardware.** Delta KV compression fits more tokens into your available memory with no quality loss.
17+
**~4x longer context on the same hardware.** KV cache compression reduces per-token memory by 3.8x, extending context proportionally.
1818

19-
| Hardware | Model | Without | With quant.cpp | Gain |
19+
| Hardware | Model | FP16 KV | 4-bit K + Q4 V | Gain |
2020
|----------|-------|---------|----------------|------|
21-
| 8GB Laptop | Llama 8B (Q4) | 16K tokens | 61K tokens | 3.8x |
22-
| 16GB Mac Air | SmolLM2 1.7B | 78K tokens | 298K tokens | 3.8x |
23-
| 24GB RTX 3090 | Llama 8B (Q4) | 147K tokens | 559K tokens | 3.8x |
21+
| 8GB Laptop | Llama 8B (Q4) | ~16K tokens | ~61K tokens | 3.8x |
22+
| 16GB Mac Air | SmolLM2 1.7B | ~78K tokens | ~298K tokens | 3.8x |
23+
| 24GB RTX 3090 | Llama 8B (Q4) | ~147K tokens | ~559K tokens | 3.8x |
24+
25+
*Estimates based on KV memory reduction. Actual context depends on available memory after model weights.*
2426

2527
```bash
2628
./quant model.gguf -p "hello"
@@ -34,7 +36,7 @@ Embeddable LLM inference in pure C.
3436
|--|-----------|-----------|
3537
| Code | **33K LOC**, pure C | 250K+ LOC, C++ |
3638
| Design | Read, modify, embed | Feature-complete |
37-
| Dependencies | **Zero** (libc only) | ggml framework |
39+
| Dependencies | libc + pthreads only | ggml framework |
3840
| KV compression | PPL **-3.2%** (better than FP32) | PPL +10.6% |
3941

4042
quant.cpp is not a fork. It's a standalone engine built from scratch for one goal: **LLM inference you can understand, customize, and ship inside your own product.**
@@ -111,7 +113,7 @@ Cross-model: SmolLM2 1.7B (-1.6%), Qwen3.5 0.8B (+0.9%), Qwen3.5 4B (+0.6%).
111113
| Gemma 3 270M | Gemma 3 | 270M | Working |
112114
| Gemma 4 E2B | Gemma 4 | 2B | WIP |
113115

114-
5 architectures: Llama, Gemma 3/4, Qwen3.5 (DeltaNet hybrid), Qwen2-MoE.
116+
Architectures: Llama/Qwen3.5 (shared path), Gemma 3/4 (sliding + full attention), Qwen2-MoE.
115117

116118
GGUF format. Load any llama.cpp-compatible model file.
117119

0 commit comments

Comments
 (0)