Skip to content

Commit 6ca52d8

Browse files
unamedkrclaude
andcommitted
README: add Docker & OpenAI-compatible server sections (EN + KO)
New features were missing from README: Docker image, quant-server with /v1/chat/completions endpoint, and build instructions for server mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 84a6385 commit 6ca52d8

2 files changed

Lines changed: 44 additions & 0 deletions

File tree

README.ko.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,28 @@ python3 -m http.server 8080 # 로컬 서버 시작
246246

247247
---
248248

249+
## Docker & 서버
250+
251+
**Docker** (의존성 제로, ~10MB 이미지):
252+
```bash
253+
docker build -t quant.cpp .
254+
docker run -v ./models:/models quant.cpp /models/model.gguf -p "hello" -k uniform_4b -v q4
255+
```
256+
257+
**OpenAI 호환 서버** (`/v1/chat/completions`):
258+
```bash
259+
cmake -B build -DTQ_BUILD_SERVER=ON && cmake --build build
260+
./build/quant-server model.gguf -p 8080 -k uniform_4b
261+
262+
# OpenAI Python SDK와 호환
263+
curl http://localhost:8080/v1/chat/completions \
264+
-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'
265+
```
266+
267+
`-DTQ_BUILD_SERVER=ON`으로 빌드. SSE 스트리밍 지원. 요청별 KV 압축 설정 가능.
268+
269+
---
270+
249271
## 백엔드 & 성능
250272

251273
| 백엔드 | 플랫폼 | 상태 | 비고 |

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,28 @@ Everything runs client-side. Nothing is uploaded. KV compression active by defau
246246

247247
---
248248

249+
## Docker & Server
250+
251+
**Docker** (zero-dependency, ~10MB image):
252+
```bash
253+
docker build -t quant.cpp .
254+
docker run -v ./models:/models quant.cpp /models/model.gguf -p "hello" -k uniform_4b -v q4
255+
```
256+
257+
**OpenAI-compatible server** (`/v1/chat/completions`):
258+
```bash
259+
cmake -B build -DTQ_BUILD_SERVER=ON && cmake --build build
260+
./build/quant-server model.gguf -p 8080 -k uniform_4b
261+
262+
# Works with the OpenAI Python SDK
263+
curl http://localhost:8080/v1/chat/completions \
264+
-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'
265+
```
266+
267+
Build with `-DTQ_BUILD_SERVER=ON`. Streaming SSE supported. KV compression configurable per request.
268+
269+
---
270+
249271
## Backends & Performance
250272

251273
| Backend | Platform | Status | Notes |

0 commit comments

Comments
 (0)