Skip to content

Commit dac884f

Browse files
unamedkrclaude
andauthored
docs: update README + guide with v0.12.0 ollama-style CLI examples (#46)
PR #43 added pull/list/run/serve subcommands to the quantcpp CLI (now on PyPI as v0.12.0). Update user-facing documentation to lead with the new commands instead of the old single-shot Python pattern. Changes: - README.md Quick Start: lead with `quantcpp pull/run/serve/list`, show short aliases (smollm2:135m, qwen3.5:0.8b, llama3.2:1b), keep Python API as the secondary path - README.ko.md: same restructure in Korean ("빠른 시작") - site/index.html (guide): CTA section now shows CLI commands and Python API side-by-side; new i18n keys cta.label.cli/python in both EN and KO dictionaries Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 449080f commit dac884f

3 files changed

Lines changed: 54 additions & 20 deletions

File tree

README.ko.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,20 +22,33 @@
2222

2323
---
2424

25-
## 3줄로 시작하기
25+
## 빠른 시작
2626

27+
**Ollama 스타일 CLI (v0.12.0+):**
2728
```bash
2829
pip install quantcpp
30+
31+
quantcpp pull llama3.2:1b # HuggingFace에서 다운로드
32+
quantcpp run llama3.2:1b # 대화형 채팅
33+
quantcpp serve llama3.2:1b -p 8080 # OpenAI 호환 HTTP 서버
34+
quantcpp list # 캐시된 모델 목록
35+
```
36+
37+
짧은 별칭: `smollm2:135m`, `qwen3.5:0.8b`, `llama3.2:1b`. `run`/`serve` 첫 실행 시 자동 다운로드. `serve`는 OpenAI 호환 `POST /v1/chat/completions` 엔드포인트를 8080 포트에 제공합니다.
38+
39+
**한 줄 질문:**
40+
```bash
41+
quantcpp run llama3.2:1b "중력이란 무엇인가요?"
2942
```
3043

44+
**Python API (3줄):**
3145
```python
3246
from quantcpp import Model
33-
34-
m = Model.from_pretrained("Llama-3.2-1B") # 모델 자동 다운로드 (~750 MB)
47+
m = Model.from_pretrained("Llama-3.2-1B")
3548
print(m.ask("중력이란 무엇인가요?"))
3649
```
3750

38-
API 키 없음. GPU 없음. 설정 없음. [브라우저에서 바로 체험 →](https://quantumaikr.github.io/quant.cpp/) · [**작동 원리 가이드 →**](https://quantumaikr.github.io/quant.cpp/guide/)
51+
API 키 없음. GPU 없음. 설정 없음. 모델은 `~/.cache/quantcpp/`에 캐시됩니다. [브라우저에서 바로 체험 →](https://quantumaikr.github.io/quant.cpp/) · [**작동 원리 가이드 →**](https://quantumaikr.github.io/quant.cpp/guide/)
3952

4053
---
4154

README.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -37,27 +37,31 @@
3737

3838
## Quick Start
3939

40-
**Terminal (one command):**
40+
**Ollama-style CLI (v0.12.0+):**
4141
```bash
4242
pip install quantcpp
43-
quantcpp "What is gravity?"
43+
44+
quantcpp pull llama3.2:1b # download from HuggingFace
45+
quantcpp run llama3.2:1b # interactive chat
46+
quantcpp serve llama3.2:1b -p 8080 # OpenAI-compatible HTTP server
47+
quantcpp list # show cached models
48+
```
49+
50+
Short aliases: `smollm2:135m`, `qwen3.5:0.8b`, `llama3.2:1b`. Auto-pulls on first `run`/`serve`. The `serve` subcommand exposes `POST /v1/chat/completions` (OpenAI-compatible) on port 8080.
51+
52+
**One-shot question:**
53+
```bash
54+
quantcpp run llama3.2:1b "What is gravity?"
4455
```
4556

46-
**Python (3 lines):**
57+
**Python API (3 lines):**
4758
```python
4859
from quantcpp import Model
4960
m = Model.from_pretrained("Llama-3.2-1B")
5061
print(m.ask("What is gravity?"))
5162
```
5263

53-
**Interactive chat:**
54-
```bash
55-
quantcpp
56-
# You: What is gravity?
57-
# AI: Gravity is a fundamental force...
58-
```
59-
60-
Downloads Llama-3.2-1B (~750 MB) on first use, cached locally. No API key, no GPU. [Try in browser →](https://quantumaikr.github.io/quant.cpp/) · [**How it works — Interactive Guide →**](https://quantumaikr.github.io/quant.cpp/guide/)
64+
Downloads on first use, cached at `~/.cache/quantcpp/`. No API key, no GPU. [Try in browser →](https://quantumaikr.github.io/quant.cpp/) · [**Interactive Guide →**](https://quantumaikr.github.io/quant.cpp/guide/)
6165

6266
---
6367

site/index.html

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -727,12 +727,25 @@ <h2 class="reveal" data-i18n="glossary.title">Glossary</h2>
727727
<section class="cta" style="background:var(--bg2)">
728728
<div class="container reveal">
729729
<h2 style="margin-bottom:1rem" data-i18n="cta.title">Try It Yourself</h2>
730-
<p style="color:var(--text2);margin-bottom:2rem;max-width:500px;margin-left:auto;margin-right:auto" data-i18n="cta.desc">Three lines of Python. No GPU, no API key, no setup.</p>
731-
<pre style="text-align:left;display:inline-block;margin-bottom:2rem"><code>pip install quantcpp
730+
<p style="color:var(--text2);margin-bottom:2rem;max-width:560px;margin-left:auto;margin-right:auto" data-i18n="cta.desc">Ollama-style CLI. No GPU, no API key, no setup.</p>
731+
<div style="display:flex;gap:1.5rem;flex-wrap:wrap;justify-content:center;margin-bottom:2rem;text-align:left">
732+
<div>
733+
<div style="font-size:.75rem;color:var(--text2);margin-bottom:.3rem;font-weight:600" data-i18n="cta.label.cli">CLI (v0.12.0+)</div>
734+
<pre style="margin:0"><code>pip install quantcpp
735+
736+
quantcpp pull llama3.2:1b
737+
quantcpp run llama3.2:1b
738+
quantcpp serve llama3.2:1b -p 8080
739+
quantcpp list</code></pre>
740+
</div>
741+
<div>
742+
<div style="font-size:.75rem;color:var(--text2);margin-bottom:.3rem;font-weight:600" data-i18n="cta.label.python">Python API</div>
743+
<pre style="margin:0"><code>from quantcpp import Model
732744

733-
from quantcpp import Model
734745
m = Model.from_pretrained("Llama-3.2-1B")
735746
print(m.ask("What is gravity?"))</code></pre>
747+
</div>
748+
</div>
736749
<br>
737750
<a href="https://github.com/quantumaikr/quant.cpp" class="cta-btn cta-primary">GitHub</a>
738751
<a href="https://pypi.org/project/quantcpp/" class="cta-btn cta-secondary">PyPI</a>
@@ -896,7 +909,9 @@ <h2 style="margin-bottom:1rem" data-i18n="cta.title">Try It Yourself</h2>
896909
"glossary.gguf.term": "GGUF",
897910
"glossary.gguf.def": "The standard file format for quantized LLM model weights, created by the llama.cpp project. quant.cpp loads GGUF models directly.",
898911
"cta.title": "Try It Yourself",
899-
"cta.desc": "Three lines of Python. No GPU, no API key, no setup.",
912+
"cta.desc": "Ollama-style CLI. No GPU, no API key, no setup.",
913+
"cta.label.cli": "CLI (v0.12.0+)",
914+
"cta.label.python": "Python API",
900915
"rag.label": "Movement",
901916
"rag.title": "Beyond RAG",
902917
"rag.intro": "Traditional RAG splits documents into 512-token chunks, embeds them in a vector database, and retrieves fragments. This was a reasonable engineering compromise when LLMs had 2K context windows. <strong>Now they have 128K. The compromise should have started disappearing.</strong>",
@@ -1083,7 +1098,9 @@ <h2 style="margin-bottom:1rem" data-i18n="cta.title">Try It Yourself</h2>
10831098
"glossary.gguf.term": "GGUF",
10841099
"glossary.gguf.def": "\uC591\uC790\uD654\uB41C LLM \uBAA8\uB378 \uAC00\uC911\uCE58\uC758 \uD45C\uC900 \uD30C\uC77C \uD615\uC2DD. llama.cpp \uD504\uB85C\uC81D\uD2B8\uC5D0\uC11C \uB9CC\uB4E4\uC5C8\uC2B5\uB2C8\uB2E4. quant.cpp\uB294 GGUF \uBAA8\uB378\uC744 \uC9C1\uC811 \uB85C\uB4DC\uD569\uB2C8\uB2E4.",
10851100
"cta.title": "\uC9C1\uC811 \uD574\uBCF4\uAE30",
1086-
"cta.desc": "Python 3\uC904. GPU\uB3C4, API \uD0A4\uB3C4, \uC124\uCE58\uB3C4 \uD544\uC694 \uC5C6\uC2B5\uB2C8\uB2E4.",
1101+
"cta.desc": "Ollama \uC2A4\uD0C0\uC77C CLI. GPU\uB3C4, API \uD0A4\uB3C4, \uC124\uCE58\uB3C4 \uD544\uC694 \uC5C6\uC2B5\uB2C8\uB2E4.",
1102+
"cta.label.cli": "CLI (v0.12.0+)",
1103+
"cta.label.python": "Python API",
10871104
"rag.label": "운동",
10881105
"rag.title": "Beyond RAG",
10891106
"rag.intro": "전통적인 RAG는 문서를 512토큰 청크로 나누고, 벡터 DB에 임베딩하고, 조각을 검색합니다. 이것은 LLM이 2K 컨텍스트만 가졌을 때 합리적인 엔지니어링 타협이었습니다. <strong>지금은 128K입니다. 그 타협은 사라지기 시작했어야 합니다.</strong>",

0 commit comments

Comments
 (0)