|
31 | 31 | <a href="./docs/README.ru.md"><strong>Русский</strong></a> |
32 | 32 | </p> |
33 | 33 |
|
34 | | - <p><sub>Localized README files are maintained translations of this document. For normative wording and latest edits, use the English README as the canonical reference.</sub></p> |
| 34 | + <p><sub>Localized README files are maintained translations of this document. The English README is updated first.</sub></p> |
35 | 35 |
|
36 | 36 | <p> |
37 | 37 | <a href="https://github.com/lhy0718/AutoLabOS/actions/workflows/ci.yml"> |
@@ -99,7 +99,7 @@ In practice: |
99 | 99 | 4. Weak evidence triggers backtracking or downgrade instead of automatic polishing. |
100 | 100 | 5. If the review gate passes, `write_paper` drafts a manuscript from bounded evidence. |
101 | 101 |
|
102 | | -The historical 9-node contract remains the architectural baseline. In the current runtime, `figure_audit` is the one approved post-analysis checkpoint inserted between `analyze_results` and `review` so figure-quality critique can checkpoint and resume independently. |
| 102 | +In the current runtime, `figure_audit` sits between `analyze_results` and `review` so figure-quality critique can checkpoint and resume independently. |
103 | 103 |
|
104 | 104 | ```mermaid |
105 | 105 | stateDiagram-v2 |
@@ -196,7 +196,7 @@ The brief is not just a startup note. It is the governed contract for a run. |
196 | 196 |
|
197 | 197 | That makes the brief part of the audit trail, not just part of the prompt. |
198 | 198 |
|
199 | | -In the current contract, `.autolabos/config.yaml` is primarily for provider/runtime defaults and workspace policy. Run-specific research intent, evidence bars, baseline expectations, manuscript-format targets, and manuscript template path belong in the brief. Persisted config may therefore omit brief-owned sections such as research defaults and some manuscript-profile or paper-template fields. |
| 199 | +In practice, `.autolabos/config.yaml` holds provider and workspace defaults, while the brief carries run-specific research intent, evidence bars, baseline expectations, manuscript-format targets, and manuscript template path. |
200 | 200 |
|
201 | 201 | ```bash |
202 | 202 | /new |
@@ -288,12 +288,14 @@ Failure fingerprints are persisted so structural errors and repeated equivalent |
288 | 288 |
|
289 | 289 | ### Reproducibility Through Artifacts |
290 | 290 |
|
| 291 | +Runs stay inspectable because the system persists artifacts, checkpoints, and transitions instead of relying on hidden state. |
| 292 | + |
291 | 293 |
|
292 | 294 | --- |
293 | 295 |
|
294 | | -## Validation And Harness-Oriented Quality Model |
| 296 | +## Quality Model |
295 | 297 |
|
296 | | -AutoLabOS treats validation surfaces as first-class. |
| 298 | +AutoLabOS makes quality checks visible during a run. |
297 | 299 |
|
298 | 300 | - `/doctor` checks environment and workspace readiness before a run starts |
299 | 301 |
|
@@ -415,131 +417,6 @@ AutoLabOS also has built-in harness presets such as `base`, `compact`, `failure- |
415 | 417 |
|
416 | 418 | --- |
417 | 419 |
|
418 | | -## Advanced Details |
419 | | - |
420 | | -<details> |
421 | | -<summary><strong>Execution modes</strong></summary> |
422 | | - |
423 | | -AutoLabOS preserves the governed workflow and safety gates across every mode. |
424 | | - |
425 | | -| Mode | Command | Behavior | |
426 | | -|---|---|---| |
427 | | -| **Interactive** | `autolabos` | Slash-command TUI with explicit approval gates | |
428 | | -| **Minimal approval** | Config: `approval_mode: minimal` | Auto-approves safe transitions | |
429 | | -| **Hybrid approval** | Config: `approval_mode: hybrid` | Auto-advances strong low-risk transitions, pauses risky or low-confidence ones | |
430 | | -| **Overnight** | `/agent overnight [run]` | Unattended single pass, 24-hour limit, conservative backtracking | |
431 | | -| **Autonomous** | `/agent autonomous [run]` | Open-ended bounded research exploration | |
432 | | - |
433 | | -</details> |
434 | | - |
435 | | -<details> |
436 | | -<summary><strong>Governance artifact flow</strong></summary> |
437 | | - |
438 | | -```mermaid |
439 | | -flowchart LR |
440 | | - Brief["Research Brief<br/>completeness artifact"] --> Design["design_experiments"] |
441 | | - Design --> Contract["Experiment Contract<br/>hypothesis, single change,<br/>confound check"] |
442 | | - Design --> Consistency["Brief-Design Consistency<br/>warnings artifact"] |
443 | | - Contract --> Run["run_experiments"] |
444 | | - Run --> Failures["Failure Memory<br/>fingerprinted JSONL"] |
445 | | - Run --> Analyze["analyze_results"] |
446 | | - Analyze --> Decision["Attempt Decision<br/>keep/discard/replicate"] |
447 | | - Decision --> FigureAudit["figure_audit"] |
448 | | - FigureAudit --> Review["review"] |
449 | | - Failures --> Review |
450 | | - Contract --> Review |
451 | | - Review --> Ceiling["Pre-Review Summary<br/>claim ceiling detail"] |
452 | | - Ceiling --> Paper["write_paper"] |
453 | | -``` |
454 | | - |
455 | | -</details> |
456 | | - |
457 | | -<details> |
458 | | -<summary><strong>Artifact flow</strong></summary> |
459 | | - |
460 | | -```mermaid |
461 | | -flowchart TB |
462 | | - A["collect_papers"] --> A1["corpus.jsonl, bibtex.bib"] |
463 | | - A1 --> B["analyze_papers"] |
464 | | - B --> B1["paper_summaries.jsonl, evidence_store.jsonl"] |
465 | | - B1 --> C["generate_hypotheses"] |
466 | | - C --> C1["hypotheses.jsonl"] |
467 | | - C1 --> D["design_experiments"] |
468 | | - D --> D1["experiment_plan.yaml, experiment_contract.json,<br/>brief_design_consistency.json"] |
469 | | - D1 --> E["implement_experiments"] |
470 | | - E --> F["run_experiments"] |
471 | | - F --> F1["metrics.json, failure_memory.jsonl,<br/>objective_evaluation.json"] |
472 | | - F1 --> G["analyze_results"] |
473 | | - G --> G1["result_analysis.json, attempt_decisions.jsonl,<br/>transition_recommendation.json"] |
474 | | - G1 --> H["figure_audit"] |
475 | | - H --> H1["gate1_gate2_issues.json,<br/>figure_audit_summary.json"] |
476 | | - H1 --> I["review"] |
477 | | - I --> I1["pre_review_summary.json, review_packet.json,<br/>minimum_gate.json, paper_critique.json"] |
478 | | - I1 --> J["write_paper"] |
479 | | - J --> J1["main.tex, references.bib,<br/>scientific_validation.json, main.pdf"] |
480 | | -``` |
481 | | - |
482 | | -</details> |
483 | | - |
484 | | -<details> |
485 | | -<summary><strong>Node architecture</strong></summary> |
486 | | - |
487 | | -| Node | Role(s) | What it does | |
488 | | -|---|---|---| |
489 | | -| `collect_papers` | collector, curator | Discovers and curates candidate paper set via Semantic Scholar | |
490 | | -| `analyze_papers` | reader, evidence extractor | Extracts summaries and evidence from selected papers | |
491 | | -| `generate_hypotheses` | hypothesis agent + skeptical reviewer | Synthesizes ideas from literature, then pressure-tests them | |
492 | | -| `design_experiments` | designer + feasibility/statistical/ops panel | Filters plans for practicality, writes experiment contract | |
493 | | -| `implement_experiments` | implementer | Produces code and workspace changes through ACI actions | |
494 | | -| `run_experiments` | runner + failure triager + rerun planner | Drives execution, records failures, decides reruns | |
495 | | -| `analyze_results` | analyst + metric auditor + confounder detector | Checks result reliability, writes attempt decisions | |
496 | | -| `figure_audit` | figure auditor + optional vision critique | Checks evidence alignment, captions/references, and publication readiness before review | |
497 | | -| `review` | 5-specialist panel + claim ceiling + two-layer gate | Structural review - blocks writing if evidence is insufficient | |
498 | | -| `write_paper` | paper writer + reviewer critique | Drafts manuscript, runs post-draft critique, builds PDF | |
499 | | - |
500 | | -</details> |
501 | | - |
502 | | -<details> |
503 | | -<summary><strong>Bounded automation</strong></summary> |
504 | | - |
505 | | -| Node | Internal automation | Bound | |
506 | | -|---|---|---| |
507 | | -| `analyze_papers` | Auto-expands evidence window when too sparse | <= 2 expansions | |
508 | | -| `design_experiments` | Deterministic panel scoring + experiment contract | Runs once per design | |
509 | | -| `run_experiments` | Failure triage + one-shot transient rerun | Never retries structural failures | |
510 | | -| `run_experiments` | Failure memory fingerprinting | >= 3 identical exhausts retries | |
511 | | -| `analyze_results` | Objective rematching + result panel calibration | One rematch before human pause | |
512 | | -| `figure_audit` | Gate 3 figure critique + summary aggregation | Vision critique remains independently resumable | |
513 | | -| `write_paper` | Related-work scout + validation-aware repair | 1 repair pass max | |
514 | | - |
515 | | -</details> |
516 | | - |
517 | | -<details> |
518 | | -<summary><strong>Public output bundle</strong></summary> |
519 | | - |
520 | | -``` |
521 | | -outputs/<title-slug>-<run_id_prefix>/ |
522 | | - ├── paper/ |
523 | | - ├── experiment/ |
524 | | - ├── analysis/ |
525 | | - ├── review/ |
526 | | - ├── results/ |
527 | | - ├── reproduce/ |
528 | | - ├── manifest.json |
529 | | - └── README.md |
530 | | -``` |
531 | | - |
532 | | -</details> |
533 | | - |
534 | | ---- |
535 | | - |
536 | 420 | ## Status |
537 | 421 |
|
538 | | -AutoLabOS is an active OSS research-engineering project. The canonical references for behavior and contracts are the repository docs under `docs/`, especially: |
539 | | - |
540 | | -- `docs/architecture.md` |
541 | | -- `docs/experiment-quality-bar.md` |
542 | | -- `docs/paper-quality-bar.md` |
543 | | -- `docs/reproducibility.md` |
544 | | -- `docs/research-brief-template.md` |
545 | | - |
| 422 | +AutoLabOS is an active OSS research-engineering project. For deeper details beyond this overview, see the documents under docs. |
0 commit comments