Skip to content

Commit e8f9087

Browse files
unamedkrclaude
andcommitted
docs: add 3B-4B model selection guide (Phi-4 vs Qwen3.5 vs Gemma 4)
Adds a "Choosing a 3B-4B model" section to docs/supported_models.md with comparison table, pick-by-priority guide, and vocab trade-off analysis based on real benchmarks. Fixes #68 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 91814d4 commit e8f9087

1 file changed

Lines changed: 39 additions & 0 deletions

File tree

docs/supported_models.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,45 @@ Phi-3.5-mini at the recommended Q4_K_M quantization clocks in at
112112
fastest of any model in the registry — the best speed/quality combo
113113
quant.cpp ships.
114114

115+
## Choosing a 3B-4B model: Phi-4-mini vs Qwen3.5-4B vs Gemma-4-E2B
116+
117+
For users who want more quality than SmolLM2-1.7B, here are the three
118+
main contenders in the 3B-4B class and when to pick each:
119+
120+
| | Phi-3.5-mini | Phi-4-mini | Qwen3.5-4B | Gemma 4 E2B |
121+
|---|---|---|---|---|
122+
| **Params** | 3.8B | 3.8B | 4B | ~2.3B eff |
123+
| **Vocab** | **32K** | 200K | 248K | 262K |
124+
| **Q4 size** | 2.4 GB | 2.5 GB | 2.6 GB | 3.2 GB |
125+
| **Speed** | **Fastest** | Moderate | Moderate | Moderate |
126+
| **Quality** | Good | Better (math +14) | **Best overall** | Good |
127+
| **Korean/CJK** | Basic | Improved | **Excellent** | Good |
128+
| **Context** | 128K | 128K | **262K** | 128K |
129+
| **Multimodal** | No | No | No | **Yes** |
130+
| **quant.cpp** | **Supported** | Likely works | Partial | Partial |
131+
132+
### Pick by priority
133+
134+
- **"I want fastest response"****Phi-3.5-mini** — 32K vocab = smallest lm_head, ~8 tok/s on M3
135+
- **"I want best text quality"****Qwen3.5-4B** — highest benchmarks, 262K context, DeltaNet hybrid saves 75% KV memory (partial support, improving)
136+
- **"I want strong math and code"****Phi-4-mini** — HumanEval 74.4, MATH 64.0 (needs testing in quant.cpp)
137+
- **"I need images/audio/video"****Gemma 4 E2B** — only multimodal option at this size (partial support)
138+
- **"I need Korean/Chinese/Japanese"****Qwen3.5-4B** — purpose-built CJK tokenizer
139+
140+
### The vocab trade-off
141+
142+
| Vocab | Relative lm_head cost | Example |
143+
|------:|:---------------------:|---------|
144+
| 32K | 1x (baseline) | Phi-3.5-mini |
145+
| 49K | 1.5x | SmolLM2-1.7B |
146+
| 200K | 6x | Phi-4-mini |
147+
| 248K | 7.7x | Qwen3.5-4B |
148+
| 262K | 8.2x | Gemma 4 E2B |
149+
150+
Smaller vocab = faster generation. A 3.8B model with 32K vocab can be
151+
faster than a 1B model with 128K vocab — tested and confirmed on
152+
Apple M3.
153+
115154
## Reporting an unsupported model
116155

117156
If you tried a model that's not in the matrix above, please open an

0 commit comments

Comments
 (0)