Create your own AI-powered academic meeting.
Cyber Colloquium is a desktop research workspace that turns multiple LLMs into a coordinated academic team.
Instead of a single chat loop, the app organizes literature discovery, PDF reading, task decomposition, expert discussion, reviewer cross-checking, structured state tracking, experiment drafting, authorized local execution, and report writing inside one workflow-driven environment.
It is designed for research-heavy work where a single model is often not enough: reading papers, revisiting figures and formulas, debating methods, drafting code, validating execution, and exporting reusable research artifacts.
Cyber Colloquium already supports a real vertical research workflow:
- search and download papers from arXiv
- attach local PDFs, text files, images, JSON, and CSV files
- build a local PDF reader cache for sections, figures, and formula candidates
- optionally generate a literature review before the main discussion starts
- let a
Leaddecompose the research task by specialty - let a
Hostcoordinate sequencing and follow-up rules - let
Expertshandle concrete subproblems - let a
Literature Reviewerdigest references and source grounding - let a
Reportermaintain structured meeting state and synthesize outputs - generate Python experiment drafts
- run authorized local Python smoke tests and full runs inside isolated run workspaces
- generate BibTeX libraries and LaTeX drafts
- optionally compile LaTeX locally with Tectonic when the user authorizes local execution
- export meeting minutes, research reports, and failure snapshots
The default workflow is Research Discussion Review Workflow.
In the desktop UI you can configure:
- role display names
- duty assignment:
Lead,Host,Expert,Literature Reviewer,Reporter - specialty notes
- model name
- base URL
- API key
- vision capability
Workflow settings are editable from the UI and persisted to workflow_config.json.
Key switches now include:
- arXiv discovery on or off
- whether arXiv PDFs should be downloaded automatically
- maximum arXiv search results
- reviewer pass on or off
- Python artifact generation
- authorized local Python smoke test
- authorized local Python full run in the current interpreter
- Python timeout and mapped-input size limits
- BibTeX artifact generation
- LaTeX artifact generation
- authorized Tectonic compile
- enabled roles
- structured summary slots
You can attach:
- PDF files
- text and markdown files
- JSON and CSV files
- images
The Build PDF reader action parses attached PDFs and writes cache files under pdf_reader/.
The cache includes:
- section indexing
- section digests
- figure extraction when possible
- figure summaries for vision-capable reviewer models
- formula candidate extraction
If arXiv discovery is enabled, the workflow searches arXiv from the user request, stores paper metadata, optionally downloads PDFs, and registers those papers into the structured research state before source ingest continues.
Downloaded papers are saved under arxiv_library/.
The ingest stage prepares:
- user question
- local attachments
- downloaded arXiv papers
- PDF reader cache context
- optional literature review input
If Enable literature review is on and a literature-review provider is configured, the literature reviewer digests references before the main team proceeds.
The Lead reads the user request, team specialties, source snippets, and literature context, then outputs a structured delegation plan.
The Host turns the plan into an execution schedule and keeps the team aligned on cross-checking and follow-up rules.
Experts work through assigned subproblems using:
- attachment snippets
- PDF reader retrieval
- literature review context
- structured meeting state
If reviewer participation is enabled, the reviewer cross-checks workpackages and challenges unsupported claims or weak reasoning.
The discussion is consolidated into explicit structured state instead of relying on transcript memory alone.
The app tracks:
- consensus points
- conflicts or risks
- open questions
- action items
- evidence cards
- workflow tasks
- workflow stage records
- checkpoints
- literature library records
- experiment runs
- approval records
- generated artifacts
If open issues remain, the workflow can continue with follow-up passes instead of ending too early.
If Python artifact generation is enabled, the workflow generates a Python experiment draft from the structured state.
If local execution is also enabled in workflow settings and explicitly authorized for the current run, the app can perform:
python -m py_compile- an authorized
smokerun - an authorized
fullrun
Both execution modes run inside a per-run isolated workspace under generated_artifacts/execution_runs/, so generated scripts do not execute directly inside the shared artifact directory.
The workspace also contains an inputs/ directory and an input_manifest.json file so generated code can discover mapped attachments without relying on original absolute file paths.
The Python interpreter used for these runs is the same interpreter that launched the app, so if you start Cyber Colloquium from myenv, the generated code runs in myenv.
The outcome is written back into:
- structured experiment run records
- approval records
- generated run logs
- later meeting notes and final report context
At the end of the main workflow, the app can generate:
- meeting minutes
- research report
- BibTeX library
- LaTeX document draft
If Tectonic compile is enabled in workflow settings and the current run has local execution authorization, the app also attempts a local TeX build and stores the build log and compiled PDF when available.
The current compiler backend is Tectonic, with build outputs written to isolated folders under generated_artifacts/latex_builds/.
These pieces now form one connected research loop:
- arXiv discovery and metadata capture
- optional arXiv PDF download
- local attachment ingest
- PDF reader indexing and retrieval
- optional literature review
- lead delegation
- host coordination
- expert discussion
- reviewer cross-check
- structured state tracking
- experiment draft generation
- authorized local Python validation
- isolated per-run Python execution workspaces
- mapped execution inputs with manifest export
- meeting minutes export
- research report export
- BibTeX export
- LaTeX draft generation
- optional Tectonic compilation
- failure snapshot export
- dark and light themes
These parts are connected, but not meant to be mistaken for full infrastructure:
- literature discovery is currently limited to arXiv
- Python execution is an authorized local validation step, not a sandboxed cluster runner
- experiment dependency installation is still your responsibility
- Python execution limits currently cover timeout and mapped-input size, not full OS-level sandboxing
- LaTeX compilation depends on
tectonicbeing installed and available onPATH - PDF extraction quality still depends on the original PDF structure
- figure and formula extraction are useful, but still imperfect
- there is no persistent database-backed project store yet
So the current version is already a serious vertical research workspace.
If your target is:
discover papers -> read -> discuss -> review -> write, the workflow is already smoothdiscover papers -> draft code -> run local validation -> compile paper assets, the workflow is now connected but still environment-dependentfully autonomous large-scale experimentation, that remains future work
Cyber Colloquium is not just "several models talking at once".
The current app is built around a few stronger ideas:
- graph-structured workflow execution instead of a hidden linear script
- workflow-stage execution instead of free-form multi-chat
- typed role mapping between duties and internal role archetypes
- structured meeting state instead of transcript-only memory
- retrieval-aware PDF support for sections, figures, and formulas
- explicit approval records for local execution
- artifact generation that feeds back into later stages
- experiment results written back into state before final reporting
- benchmark-driven policy scoring with multi-objective loss over quality, cost, latency, intervention, failure, and stability
That is what makes the app feel more like an AI research team than a stack of parallel model tabs.
General-purpose autonomous agents such as OpenClaw-style systems usually optimize for broad tool use: browsing, coding, terminal operations, task automation, and open-ended step planning across many domains.
Cyber Colloquium is narrower by design and stronger inside that niche. It is built for research workflows rather than generic agent automation.
- a general-purpose agent is usually trying to solve many different kinds of tasks with one flexible loop
- Cyber Colloquium is trying to run a research project with a role-structured workflow
| Dimension | General-purpose agent | Cyber Colloquium |
|---|---|---|
| Primary goal | Broad autonomous task execution | End-to-end research collaboration |
| Core unit of work | A single agent loop or planner/executor loop | A graph-structured multi-role research workflow |
| Memory model | Tool traces, messages, step history | Structured research state with consensus, conflicts, open questions, evidence, checkpoints, approvals, and experiment records |
| Literature handling | Usually optional or tool-dependent | Built-in arXiv discovery, PDF ingestion, PDF reader caching, and retrieval over sections, figures, and formulas |
| Collaboration style | One agent with tools, or loosely coordinated agents | Explicit Lead, Host, Expert, Literature Reviewer, and Reporter duties |
| Validation model | Often focuses on task completion | Focuses on reviewer passes, evidence grounding, checkpointing, and follow-up on unresolved issues |
| Execution safety | Varies by agent runtime | Local Python and Tectonic execution are explicitly gated by per-run user authorization |
| Output shape | Actions, patches, traces, or generic answers | Meeting minutes, research reports, BibTeX, LaTeX drafts, run logs, workflow graphs, and policy artifacts |
| Optimization path | Prompting and tool-use heuristics | Benchmarkable workflow graphs, multi-objective scoring, and policy search for research-team coordination |
If you want a system that can operate like a general digital worker across arbitrary software tasks, a general-purpose agent is usually the better fit.
If you want a system that can:
- search and read papers
- decompose a research problem
- coordinate multiple specialist roles
- review claims against evidence
- draft code and papers
- run authorized local validation
- preserve the whole process as structured research artifacts
then Cyber Colloquium is solving a more specific problem.
So the difference is not "more models" versus "fewer models". The real difference is that Cyber Colloquium treats research itself as the product surface: literature, evidence, workflow state, experiments, review, and writing all live inside the same coordinated loop.
flowchart LR
UI["Desktop UI<br/>ui.py"] --> WF["Workflow Executor<br/>workflow.py"]
UI --> CFG["Workflow / UI Config<br/>workflow_config.py<br/>workflow_settings.py<br/>ui_settings.py"]
WF --> ORCH["Orchestrator<br/>orchestrator.py"]
ORCH --> STATE["Structured Research State<br/>state.py"]
ORCH --> TOOLS["Tool Runtime<br/>tool_runtime.py"]
ORCH --> TEAM["Role + Team Assembly<br/>roles.py / team.py"]
ORCH --> PDF["PDF Reader + Retrieval<br/>pdf_reader.py"]
ORCH --> ARXIV["arXiv Discovery<br/>arxiv_client.py"]
TOOLS --> PY["Python Artifact + Local Run"]
TOOLS --> TEX["BibTeX / LaTeX / Tectonic"]
STATE --> EXPORT["Minutes / Report / Failure / Graph Artifacts<br/>meeting_minutes.py"]
CFG --> GRAPH["Workflow Graph Model<br/>workflow_graph.py"]
GRAPH --> WF
GRAPH --> EVAL["Benchmark Evaluation<br/>evaluation.py"]
CFG --> EVAL
STATE --> EVAL
EVAL --> OPT["Policy Optimizer<br/>policy_optimizer.py"]
OPT --> CORPUS["Policy Training Corpus<br/>JSONL + benchmark artifacts"]
In practice:
- the UI controls providers, workflow settings, materials, and run authorization
- the workflow executor follows the configured workflow graph
- the orchestrator runs the AI team, tool calls, retrieval, and artifact generation
- the structured state is the shared memory layer across discussion, review, execution, and export
- the benchmark harness scores workflow quality
- the policy optimizer searches for stronger workflow configurations and exports supervision data for downstream training
The current codebase is split into extension-friendly layers:
src/discussion_app/ui.py: desktop UI, workflow controls, per-run authorization, live discussion feedsrc/discussion_app/orchestrator.py: main orchestration layer and workflow stage handlerssrc/discussion_app/workflow.py: workflow executor and runtime contextsrc/discussion_app/workflow_config.py: typed workflow, team, report, note, and tooling configurationsrc/discussion_app/workflow_graph.py: attribute-annotated workflow graph builder, Mermaid export, and grouped policy snapshotsrc/discussion_app/workflow_settings.py: UI-safe workflow settings adaptersrc/discussion_app/ui_settings.py: UI theme persistencesrc/discussion_app/roles.py: role registry and typed role definitionssrc/discussion_app/team.py: provider-to-role team assemblysrc/discussion_app/state.py: structured discussion state, checkpoints, papers, approvals, experiment runs, and artifact recordssrc/discussion_app/tool_runtime.py: tool protocol, permission policy, registry, and artifact-generation toolssrc/discussion_app/arxiv_client.py: arXiv search, metadata parsing, PDF download, and BibTeX entry generationsrc/discussion_app/pdf_reader.py: PDF indexing, digest generation, figure extraction, and retrieval cachesrc/discussion_app/meeting_minutes.py: export rendering for literature review, minutes, report, and failure snapshotssrc/discussion_app/evaluation.py: benchmark harness and scoring utilitiessrc/discussion_app/policy_optimizer.py: random-search workflow policy optimizer over the benchmark suite
Each provider can be configured independently:
- role display name
- duty
- specialty
- model
- base URL
- API key
- vision support
The current provider path expects OpenAI-compatible chat endpoints.
Workflow settings are saved to workflow_config.json.
Important toggles now include:
- arXiv discovery
- automatic arXiv PDF download
- maximum arXiv results
- max discussion rounds
- checkpoint frequency
- reviewer pass
- enabled roles
- structured summary slots
- Python artifact generation
- local Python execution
- Python timeout and mapped-input limit
- BibTeX artifact generation
- LaTeX artifact generation
- local Tectonic compile
- workflow stage enablement
The selected theme is saved to ui_settings.json.
Generated under meeting_minutes/:
literature_review_*.mdmeeting_minutes_*.mdresearch_report_*.mddiscussion_failure_*.mdworkflow_policy_*.jsonworkflow_graph_*.jsonworkflow_graph_*.mmd
Generated on demand under generated_artifacts/:
*.pyexecution_runs/<project>/<run>/input_manifest.json*_run_log.txt*.bib*.tex*_tectonic_build_log.txt*.pdfwhen local Tectonic build succeeds
Generated under arxiv_library/:
- downloaded arXiv PDFs
arxiv_metadata.json
Generated under pdf_reader/:
- section index JSON
- section digest JSON
- digest markdown
- extracted figure assets when available
Generated under benchmarks/runs/.
- Python 3.10+
- a Conda environment such as
myenv - one or more valid model API keys for real runs
tectoniconPATHif you want automatic LaTeX compilation
Python dependencies:
PySide6requestspypdfpillow
Known-good GUI runtime for this project at the moment:
PySide6==6.8.3
conda activate myenv
pip install -r requirements.txt
python app.pyCyber Colloquium now has two practical operating modes:
-
Research discussion mode- used for normal end-to-end research work inside the desktop app
- input a topic or question
- optionally discover arXiv papers
- read references and discuss with the AI team
- optionally generate Python / BibTeX / LaTeX artifacts
- optionally run local Python smoke/full execution and local Tectonic compilation with explicit authorization
-
Benchmark / policy tuning mode- used to compare workflow policies and optimize the collaboration strategy itself
- run benchmark tasks from the command line
- score workflow quality with the multi-objective loss
- export workflow graph artifacts, policy snapshots, traces, and a training corpus
- use the exported corpus as downstream training data for later fine-tuning or distillation
Important boundary:
- the desktop app already supports benchmark-driven workflow optimization and training-corpus export
- it does not directly fine-tune the underlying model weights inside the UI
- if you want true model fine-tuning later, the current app is the data-generation and policy-search front end for that pipeline
This is the normal way to use the app as an AI research team.
flowchart TD
A["Launch app"] --> B["Configure providers, roles, and specialties"]
B --> C["Attach local files"]
C --> D["Optional: Build PDF reader"]
D --> E["Optional: Enable arXiv discovery and literature review"]
E --> F["Enter research question or topic"]
F --> G["Start discussion"]
G --> H["Lead delegates + Host coordinates"]
H --> I["Experts analyze + Reviewer cross-checks"]
I --> J["Structured state, checkpoints, and follow-up"]
J --> K["Optional: Generate Python / BibTeX / LaTeX artifacts"]
K --> L["Optional: Local Python smoke/full run and Tectonic compile with explicit authorization"]
L --> M["Export literature review, meeting minutes, and research report"]
cd E:\大模型讨论\Cyber-Colloquium-main
conda activate myenv
python app.py- Configure providers, roles, specialties, models, and API keys.
- Open
Workflow Policyand decide whether to enable:- arXiv discovery
- literature review
- reviewer pass
- Python artifact generation
- BibTeX / LaTeX artifact generation
- local execution for this run
- Attach local materials such as PDF, text, JSON, CSV, or images.
- If PDFs are attached, optionally click
Build PDF readerfirst. - Enter the research topic, problem, or discussion goal.
- Click
Start discussion. - The app then runs the default workflow:
- discover arXiv literature if enabled
- ingest source material
- run lead / host / expert / reviewer collaboration
- update structured state and checkpoints
- optionally generate Python artifacts, smoke-test them, and fully run them
- generate meeting notes and research report
- optionally generate BibTeX / LaTeX and compile with Tectonic
- Review outputs under:
meeting_minutes/generated_artifacts/arxiv_library/pdf_reader/
Use Research discussion mode when your main goal is:
- reading papers
- discussing a research direction
- reviewing a method
- planning an experiment
- generating a report or paper draft
- validating generated code with explicit local authorization
This mode is for improving the multi-AI collaboration workflow itself.
flowchart TD
A["Prepare benchmark tasks"] --> B["Run evaluation on one workflow policy"]
B --> C["Collect notes, reports, traces, graph snapshots, and scores"]
C --> D["Run random-search policy optimizer"]
D --> E["Compare objective loss across candidates"]
E --> F["Pick best workflow config"]
F --> G["Export policy training corpus"]
G --> H["Use external training pipeline for later SFT / distillation if needed"]
Benchmark tasks live under:
benchmarks/tasks/train/benchmarks/tasks/dev/benchmarks/tasks/holdout/
Each task defines:
- the input topic / PDF / summary seed
- the required outputs
- the scoring constraints
- the benchmark split and metadata
Run one policy version against a benchmark split:
cd E:\大模型讨论\Cyber-Colloquium-main
conda activate myenv
python -m src.discussion_app.evaluation --tasks-root benchmarks/tasks --split train --policy-version local_smokeUseful flags:
--workflow-config path/to/workflow_config.json--output-root benchmarks/runs--limit 1--quality-weight 1.0--cost-weight 0.2--latency-weight 0.15--human-weight 0.1--failure-weight 0.8--stability-weight 0.2
This produces:
- benchmark result JSON
- suite summary JSON
- workflow graph JSON
- workflow graph Mermaid file
- grouped workflow policy snapshot
- execution trace
- meeting notes and research report copies for that run
Run random-search workflow optimization:
cd E:\大模型讨论\Cyber-Colloquium-main
conda activate myenv
python -m src.discussion_app.policy_optimizer --tasks-root benchmarks/tasks --split train --samples 6This compares multiple workflow policy candidates by changing parameters such as:
- max discussion rounds
- checkpoint frequency
- reviewer on/off
- structured summary slots
- context limits
- evidence and log budgets
- follow-up depth
The optimizer writes:
- per-candidate workflow configs
- benchmark run artifacts per candidate
policy_search_summary.jsonpolicy_training_corpus.jsonl
The current app already exports a policy-oriented training corpus that includes:
- benchmark input/task metadata
- config snapshot
- grouped policy snapshot
- workflow graph
- objective metrics and scores
This is the recommended bridge to later model adaptation work.
In other words:
- use Cyber Colloquium to generate structured benchmark traces and workflow-policy supervision
- use external training code to perform actual fine-tuning, SFT, preference optimization, or distillation on top of that corpus
Cyber Colloquium uses tectonic as the local TeX compiler backend. The app does not bundle Tectonic by itself; it expects the tectonic executable to be available on your system PATH.
Official references:
If you already use the myenv Conda environment, the simplest setup is:
conda activate myenv
conda install -c conda-forge tectonicAfter installation, verify that the compiler is available:
tectonic --help
tectonic --version
where.exe tectonicIf both commands work, Cyber Colloquium should detect Tectonic automatically at startup.
The official Tectonic documentation also provides a PowerShell installer for Windows. If you use that method, make sure that the unpacked tectonic.exe is moved into a directory that is included in your system PATH; otherwise the app will not be able to find it.
At startup, the app runs an environment check:
- if
tectonicis found onPATH, the UI will showTectonic detected - if
tectonicis not found, the UI will warn that local LaTeX compilation will be skipped
Tectonic compilation requires two conditions:
- In
Workflow Policy -> Edit workflow settings, enable:Generate LaTeX document draft after the report stageAllow local Tectonic compile after draft generation
- For the current run, check:
Authorize local execution for this run
Only when both are enabled will the app try to build the generated .tex file locally.
Successful or attempted Tectonic builds are written under:
generated_artifacts/latex_builds/
Typical outputs include:
- compiled PDF
- Tectonic build log
- intermediate build folder for that run
Use Benchmark / policy tuning mode when your main goal is:
- improving multi-AI collaboration quality
- comparing workflow policies systematically
- reducing cost / latency / intervention while preserving output quality
- building a small but reusable benchmark set
- exporting structured supervision data for later fine-tuning
Recommended first run:
- open the app
- configure providers and API keys
- attach one or more source files
- optionally enable arXiv discovery in workflow settings
- optionally click
Build PDF reader - optionally enable literature review
- review the startup environment check for
tectonic - if you want local execution, check the per-run authorization box
- start the discussion
Run the evaluation harness from the command line:
conda run -n myenv python -m src.discussion_app.evaluation --tasks-root benchmarks/tasks --split train --policy-version local_smokeUseful flags:
--workflow-config path/to/workflow_config.json--output-root benchmarks/runs--limit 1--quality-weight 1.0 --cost-weight 0.2 --latency-weight 0.15
Run random-search workflow policy tuning against the benchmark suite:
conda run -n myenv python -m src.discussion_app.policy_optimizer --tasks-root benchmarks/tasks --split train --samples 6This writes:
- per-candidate workflow configs
- benchmark run artifacts
- workflow graph / policy snapshots
policy_training_corpus.jsonlpolicy_search_summary.json
- arXiv is the only built-in remote literature source right now
- local execution remains intentionally explicit and user-authorized
- generated Python scripts are scaffolds and validation targets, not guaranteed full experiments
- automatic dependency installation is out of scope
- local LaTeX compilation depends on
tectonicbeing installed - Python workspace limits currently do not enforce full memory or CPU quotas at the OS level
- provider behavior still varies across vendors even with OpenAI-compatible APIs
- PDF extraction quality depends on the original PDF structure
- figure and formula extraction remain imperfect
- there is no persistent database-backed project memory yet
This project is released under the MIT License.


