Skip to content

Commit 8860f64

Browse files
authored
fix: sofai graph coloring example — broken model and incorrect problem (#806) (#807)
- Replace phi:2.7b (crashes with GGUF sampler assertion) with granite4:micro - Replace qwen3-4b-thinking with granite4:latest (already used elsewhere) - Fix graph/description mismatch: graph dict was a path but description claimed a triangle, making the problem unsolvable with 2 colors - Use odd 5-cycle (A-B-C-D-E-A) with 3 colors — non-trivial enough that granite4:micro consistently fails, properly exercising the SOFAI retry loop and S1→S2 escalation Closes #806
1 parent 417b7c8 commit 8860f64

3 files changed

Lines changed: 26 additions & 24 deletions

File tree

CONTRIBUTING.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -372,8 +372,6 @@ models must be pulled locally before running the tests that need them.
372372
- `granite4:latest` — melp examples
373373
- `llama3.2` — repair-with-guardian example
374374
- `llama3.2:3b` — tutorial / mify examples (via `META_LLAMA_3_2_3B`)
375-
- `phi:2.7b` — SOFAI graph-colouring example
376-
- `pielee/qwen3-4b-thinking-2507_q8:latest` — SOFAI S2 solver
377375
- `qwen2.5vl:7b` — vision (OpenAI-via-Ollama) example
378376

379377
**Additional test models (`test/`):**
@@ -390,7 +388,7 @@ Pull everything:
390388
```bash
391389
for m in granite4:micro granite4:micro-h deepseek-r1:8b \
392390
granite3-guardian:2b granite3.2-vision granite3.3:8b granite4:latest \
393-
llama3.2 llama3.2:3b phi:2.7b pielee/qwen3-4b-thinking-2507_q8:latest \
391+
llama3.2 llama3.2:3b \
394392
qwen2.5vl:7b granite4:small-h llama3.2:1b llama3:8b llava mistral:7b \
395393
smollm2:1.7b; do ollama pull "$m"; done
396394
```

docs/examples/sofai/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,8 @@ from mellea.stdlib.sampling import SOFAISamplingStrategy
5151
from mellea.stdlib.requirements import req
5252

5353
# Create fast and slow backends
54-
s1_backend = OllamaModelBackend(model_id="phi:2.7b")
55-
s2_backend = OllamaModelBackend(model_id="qwen3-4b-thinking")
54+
s1_backend = OllamaModelBackend(model_id="granite4:micro")
55+
s2_backend = OllamaModelBackend(model_id="granite4:latest")
5656

5757
# Create SOFAI strategy
5858
strategy = SOFAISamplingStrategy(
@@ -99,16 +99,16 @@ SOFAISamplingStrategy(
9999
## Model Selection
100100

101101
### Fast Models (S1)
102-
- phi:2.7b
103-
- llama2:7b
102+
103+
- granite4:micro
104+
- llama3.2:3b
104105
- mistral:7b
105-
- granite-3.2-8b-instruct
106106

107107
### Slow Models (S2)
108-
- qwen3-4b-thinking
108+
109+
- granite4:latest
109110
- llama3:70b
110111
- mixtral:8x7b
111-
- granite-3.3-8b-instruct
112112

113113
## Performance Tips
114114

docs/examples/sofai/sofai_graph_coloring.py

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
88
In this example, we use the SOFAI sampling strategy. Because we wrote this
99
example to run on consumer grade hardware, each model is still relatively small:
10-
1. S1 Solver (phi:2.7b) - Fast model with iterative feedback loop
11-
2. S2 Solver (qwen3-4b-thinking) - Slow model, called once on escalation
10+
1. S1 Solver (granite4:micro) - Fast model with iterative feedback loop
11+
2. S2 Solver (granite4:latest) - Slow model, called once on escalation
1212
3. Custom validator - Provides detailed feedback for constraint violations
1313
1414
Note: This example uses a custom validator (check_graph_coloring). To use the
@@ -29,15 +29,21 @@
2929
from mellea.stdlib.requirements import ValidationResult, req
3030
from mellea.stdlib.sampling import SOFAISamplingStrategy
3131

32-
# Define the graph coloring problem
33-
graph = {"A": ["B"], "B": ["A", "C"], "C": ["B"]}
34-
colors = ["Red", "Blue"]
32+
# Define the graph coloring problem — an odd 5-cycle (needs 3 colors;
33+
# small models often fail on the first attempt, exercising the SOFAI loop).
34+
graph = {
35+
"A": ["B", "E"],
36+
"B": ["A", "C"],
37+
"C": ["B", "D"],
38+
"D": ["C", "E"],
39+
"E": ["D", "A"],
40+
}
41+
colors = ["Red", "Blue", "Green"]
3542

3643
graph_description = (
37-
f"Color the nodes of the graph (A, B, C) using at most {len(colors)} colors "
44+
f"Color the nodes of the graph (A, B, C, D, E) using at most {len(colors)} colors "
3845
f"({', '.join(colors)}). Adjacent nodes must have different colors. "
39-
f"The adjacencies are: A is adjacent to B and C; B is adjacent to A and C; "
40-
f"C is adjacent to A and B."
46+
f"The adjacencies are: A-B, B-C, C-D, D-E, E-A."
4147
)
4248

4349
output_format_instruction = (
@@ -136,10 +142,8 @@ def check_graph_coloring(ctx) -> ValidationResult:
136142
def main():
137143
"""Run the graph coloring example with SOFAI strategy."""
138144
# Initialize backends
139-
s1_solver_backend = OllamaModelBackend(model_id="phi:2.7b")
140-
s2_solver_backend = OllamaModelBackend(
141-
model_id="pielee/qwen3-4b-thinking-2507_q8:latest"
142-
)
145+
s1_solver_backend = OllamaModelBackend(model_id="granite4:micro")
146+
s2_solver_backend = OllamaModelBackend(model_id="granite4:latest")
143147

144148
# Optional: Initialize judge backend for LLM-as-Judge validation
145149
# Uncomment to use a third model for validation instead of custom validator
@@ -191,9 +195,9 @@ def main():
191195

192196
# Determine which solver was used
193197
if i < solver_1_attempts:
194-
solver_name = "S1 Solver (phi:2.7b)"
198+
solver_name = "S1 Solver (granite4:micro)"
195199
else:
196-
solver_name = "S2 Solver (qwen3-4b-thinking)"
200+
solver_name = "S2 Solver (granite4:latest)"
197201

198202
print(f"Solver: {solver_name}")
199203

0 commit comments

Comments
 (0)