Skip to content

Commit 951145d

Browse files
authored
docs: remove pre-IVR validation and update readme with v2 benchmark results (#769)
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
1 parent d3d6040 commit 951145d

2 files changed

Lines changed: 12 additions & 26 deletions

File tree

docs/examples/instruct_validate_repair/qiskit_code_validation/README.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,9 @@ This example demonstrates using Mellea's Instruct-Validate-Repair (IVR) pattern
55
## What This Example Does
66

77
Takes a prompt containing deprecated Qiskit code and:
8-
1. Detects QKT violations in the input code
9-
2. Passes those violations to the LLM as context
10-
3. Generates corrected code that passes QKT validation
11-
4. Automatically repairs the code if validation fails (up to 10 attempts)
8+
1. Generates corrected code using the LLM
9+
2. Validates the output against QKT rules
10+
3. Automatically repairs the code if validation fails (up to 10 attempts)
1211

1312
## Quick Start
1413

@@ -29,10 +28,9 @@ Dependencies (`mellea`, `flake8-qiskit-migration`) are automatically installed.
2928

3029
### The IVR Pipeline
3130

32-
1. **Pre-condition validation**: Validates the input prompt and any code it contains
33-
2. **Instruction**: LLM generates code following structured requirements
34-
3. **Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
35-
4. **Repair loop**: Automatically repairs code that fails validation (up to 10 attempts)
31+
1. **Instruction**: LLM generates code following structured requirements
32+
2. **Post-condition validation**: Validates generated code against QKT rules (see [Qiskit Migration Guide](https://docs.quantum.ibm.com/api/migration-guides))
33+
3. **Repair loop**: Automatically repairs code that fails validation (up to 10 attempts)
3634

3735
### Sampling Strategies
3836

@@ -47,20 +45,20 @@ To switch strategies, edit the `use_multiturn_strategy` variable in `test_qiskit
4745

4846
#### Strategy Performance Comparison
4947

50-
Benchmarks on `mistral-small-3.2-24b-qiskit` model, no system prompt:
48+
Benchmarks on `mistral-small-3.2-24b-qiskit` model:
5149

5250
| Dataset | Strategy | First Pass (QKT) | Post-Repair (QKT) |
5351
|---------|----------|------------|-------------|
54-
| **QHE** | RepairTemplate | 98.0% | **100%** |
55-
| | MultiTurn | **100%** | **100%** |
56-
| **QKT** | RepairTemplate | 98.0% | **100%** |
57-
| | MultiTurn | 93.3% | **100%** |
52+
| **QHE** | RepairTemplate | 97.4% | **100%** |
53+
| | MultiTurn | 95.4% | **100%** |
54+
| **QKT** | RepairTemplate | 88.9% | **100%** |
55+
| | MultiTurn | **97.8%** | **100%** |
5856

5957
**Datasets:**
6058
- **QHE** (QiskitHumanEval): 151 general Qiskit code generation tasks
6159
- **QKT**: 45 Qiskit version migration tasks requiring fixes to deprecated APIs
6260

63-
**Note:** Pass rates measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. On QHE, the model achieves ~32.5% correctness when running the QHE check() test suite against the generated code. Full benchmark data and analysis are available in @ajbozarth's [toolbox repo](https://github.com/ajbozarth/toolbox/tree/main/mellea/qiskit_code_validation/benchmarking).
61+
**Note:** Pass rates measure whether generated code passes QKT validation rules, not whether the code correctly solves the prompt. On QHE, the model achieves ~27.8% correctness when running the QHE check() test suite against the generated code. Full benchmark data and analysis are available in @ajbozarth's [toolbox repo](https://github.com/ajbozarth/toolbox/tree/main/mellea/qiskit_code_validation/benchmarking).
6462

6563
### Code Structure
6664

docs/examples/instruct_validate_repair/qiskit_code_validation/qiskit_code_validation.py

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -93,18 +93,6 @@ def generate_validated_qiskit_code(
9393
Returns:
9494
Tuple of (generated_code, success, attempts_used)
9595
"""
96-
# Pre-validate input code if present — include violations as context rather than failing
97-
is_valid, error_msg = validate_input_code(prompt)
98-
if not is_valid:
99-
print(
100-
f"Input code has QKT violations, including as context for LLM: {error_msg}"
101-
)
102-
prompt = (
103-
f"{prompt}\n\n"
104-
f"Note: the code above has the following Qiskit migration issues that must be fixed:\n"
105-
f"{error_msg}"
106-
)
107-
10896
# Only pass optional kwargs if they have values — avoids passing None to m.instruct()
10997
extra: dict = {}
11098
if grounding_context:

0 commit comments

Comments
 (0)