Run multi-agent vulnerability research against WordPress plugins.
Source-grounded triage, sandbox verification, and private disclosure drafts.
Most vulnerability scanners stop at patterns. Squadrone runs a full research pipeline over WordPress plugins: it pulls plugin source, maps reachable entry points, asks specialist agents to form hypotheses, verifies survivors in a Docker WordPress sandbox, deduplicates against known vulnerability databases, and writes disclosure-ready report drafts.
What you get:
- One-command plugin scans from a WordPress.org plugin slug
- Specialist agent coverage across auth, auth-flow, object authorization, state changes, payment logic, injection, file ops, SSRF/deserialization, stored-to-admin paths, XSS, and logic flaws
- Source-grounded triage against exploitability and Wordfence/Patchstack scope rules
- Balanced quality gates that reject clear false positives while preserving borderline evidence or impact for manual review
- Sandbox verification with an isolated WordPress install and iterative PoC attempts
- Known-vulnerability deduplication against Wordfence Intelligence and WPScan when API keys are available
- Private report drafts for novel findings, with no auto-submit path
Install the prerequisites below first, and make sure Docker Desktop is running before starting a scan.
git clone https://github.com/pr0f94/Squadrone.git squadrone
cd squadrone
python3.12 -m venv .venv
.venv/bin/python -m pip install -e ".[dev]"
cp .env.example .env
$EDITOR .env
set -a; . ./.env; set +a
.venv/bin/squadrone --help
.venv/bin/squadrone scan hello-dollyThe default pipeline uses Anthropic models, so set ANTHROPIC_API_KEY in .env
before running the scan. To use ChatGPT subscription OAuth instead, run:
.venv/bin/squadrone scan hello-dolly --config pipelines/openai.yamlEach scan writes artifacts under plugins/<slug>/runs/<run_id>/.
Intake version 1.7.2 · files 42 · lines 4,812
Recon entry points 8 · sinks 3
Hypothesis count 2
Triage accepted 1 · rejected 1 · merged 0 · manual review 0
Verify findings 1
Report reports 2
The SQLite index is created automatically at db/squadrone.sqlite on first use.
SQLite connections use WAL mode and a busy timeout so batch scans can share the
run index and LLM cache without most transient lock failures.
- Pulls plugin source from
plugins.svn.wordpress.org, falling back to the WordPress.org plugin ZIP ifsvnis not installed. - Maps attack surface: reachable entry points, nonce/capability checks, risky sinks, plugin type, sensitive objects, custom roles/capabilities, and high-risk workflows.
- Runs role-aware and workflow-aware specialist LLM agents with on-demand
grep_plugin,glob_plugin, andread_plugin_filetools instead of dumping the full plugin into context. - Self-verifies hypotheses to drop only definitely ungrounded claims such as fabricated sinks, impossible bug classes, or explicit missed guards.
- Builds a focus-area map for AJAX/REST, forms, files, auth, SQL, payment logic, and rendering paths.
- Triages survivors against exploitability and bounty-scope rules, including an adversarial rejection pass when multiple triage votes are used; split votes are routed to manual review.
- Applies balanced quality gates: evidence completeness, WordPress false-positive rules, derived severity, and manual-review routing for borderline cases.
- Builds a one-shot Docker WordPress sandbox for accepted hypotheses.
- Iteratively runs LLM-authored Python PoCs against the sandbox.
- Deduplicates confirmed findings against Wordfence Intelligence and WPScan when keys are configured.
- Runs a report-quality gate before writing disclosure drafts.
- Writes private report drafts per finding and program.
- Records run metadata and findings in SQLite for later review.
The system never auto-submits anything. It produces files. You decide what to disclose, where, and when.
You should not pick only one approach. Static scanning is fast and broad; agentic review is slower but can reason across WordPress idioms, exploit preconditions, scope rules, and PoC feedback.
| Squadrone | Static grep / rules | |
|---|---|---|
| WordPress-specific authorization reasoning | ✓ | partial |
| Cross-file hypothesis formation | ✓ | partial |
| Scope-aware bounty triage | ✓ | ✗ |
| Sandbox PoC verification | ✓ | ✗ |
| Report draft generation | ✓ | ✗ |
| Known-vuln deduplication | ✓ | partial |
| Cheap broad pre-filtering | partial | ✓ |
| Deterministic repeated output | partial | ✓ |
The intended workflow is conservative: use static tools and human review alongside Squadrone, then manually validate anything you plan to report.
- Python 3.12+
- Docker Desktop running before verification
ripgrepsubversionoptional, but recommended for WordPress.org source checkout; Squadrone falls back to plugin ZIP downloads whensvnis unavailable- LLM access through LiteLLM-compatible providers
On macOS:
brew install ripgrep subversionIf you do not install subversion, scans can still run through the ZIP
fallback, but historical source layouts may be less precise for some plugins.
python3.12 -m venv .venv
.venv/bin/pip install -e ".[dev]"The editable install exposes:
.venv/bin/squadrone --helpCopy the example environment file:
cp .env.example .envSet whichever keys match your pipeline and dedup needs:
| Variable | Required for | Notes |
|---|---|---|
ANTHROPIC_API_KEY |
Claude models | Used by pipelines/default.yaml |
OPENAI_API_KEY |
OpenAI API models | Not needed for chatgpt/ subscription models |
WORDFENCE_API_KEY |
dedup stage | Wordfence Intelligence v3 production feed |
WPSCAN_API_KEY |
dedup stage | wpscan.com plugin DB |
LITELLM_LOG |
optional logging | Example: WARNING |
If a vuln-DB key is missing, dedup logs a warning and skips that source. The scan does not fail solely because a dedup key is absent.
Pipeline YAML files in pipelines/ control models, budget ceiling, sandbox shape, hypothesis limits, and developer-consult caps.
pipelines/openai.yaml uses LiteLLM's chatgpt/ provider for ChatGPT subscription access. On first use, LiteLLM starts an OAuth device-code flow; complete the browser login prompt and Squadrone will use the authenticated ChatGPT session. This path does not require OPENAI_API_KEY.
# Scan one plugin with the default pipeline
squadrone scan hello-dolly
# Scan with a higher budget
squadrone scan contact-form-7 --budget 5.00
# Scan a specific historical plugin release
squadrone scan contact-form-7 --version 5.3.1
# Use the OpenAI/ChatGPT pipeline
squadrone scan contact-form-7 --budget 5.00 --config pipelines/openai.yaml
# Show detailed stage, agent, sandbox, and LLM logs
squadrone scan contact-form-7 --verbose
# Skip sandbox verification and queue accepted hypotheses for manual review
squadrone scan contact-form-7 --no-verify
# Disable bounty-scope filtering when you only care whether a bug is technically valid
squadrone scan contact-form-7 --ignore-scope
# Disable strict quality gates for exploratory research
squadrone scan contact-form-7 --no-strict-quality
# Require a majority vote from three independent Critic passes
squadrone scan contact-form-7 --triage-votes 3
# Run exploit-chain synthesis between hypotheses before triage
squadrone scan contact-form-7 --chain
# Enable the larger cross-file stored-XSS specialist
squadrone scan contact-form-7 --cross-file-taint
# Compare the scanned version against a prior WordPress.org release
squadrone scan contact-form-7 --version 5.3.2 --diff 5.3.1
# Scan multiple plugins from a file, one slug per line
squadrone scan-batch plugins.txt
# Scan multiple plugins in parallel
squadrone scan-batch plugins.txt --concurrency 3
# Batch flags mirror the scan quality/scope controls
squadrone scan-batch plugins.txt --concurrency 3 --triage-votes 3 --no-verify
# Run a higher-budget ChatGPT research batch with verbose logs, no sandbox verification,
# exploit-chain synthesis, cross-file taint review, and three independent triage votes
squadrone scan-batch plugins.txt --budget 100 --config pipelines/openai-research.yaml --verbose --no-verify --chain --cross-file-taint --triage-votes 3
# Resume an existing run
squadrone scan contact-form-7 --resume <run_id>
# Force re-run from a specific stage
squadrone scan contact-form-7 --resume <run_id> --from verify
# Inspect run history and findings
squadrone runs list
squadrone findings show <finding-id>
# Interactively review findings
squadrone review <run-id>
# Record a disclosure you submitted manually
squadrone disclose <finding-id> --to wordfence --notes "Sent via Wordfence portal"
# Run the benchmark harness
squadrone benchmark benchmarks/corpus.json --split train --budget 5.00Output by default:
plugins/<slug>/runs/<run_id>/intake.jsonplugins/<slug>/runs/<run_id>/recon.jsonplugins/<slug>/runs/<run_id>/hypotheses.jsonlplugins/<slug>/runs/<run_id>/focus_areas.jsonplugins/<slug>/runs/<run_id>/chains.jsonwhen--chainis usedplugins/<slug>/runs/<run_id>/chain_diagnostics.jsonwhen--chainis usedplugins/<slug>/runs/<run_id>/triaged.jsonlplugins/<slug>/runs/<run_id>/quality_gate_triage.jsonplugins/<slug>/runs/<run_id>/manual_review_queued.jsonwhen triage or quality gates queue manual reviewplugins/<slug>/runs/<run_id>/findings.jsonlplugins/<slug>/runs/<run_id>/findings_corrupt.jsonlif malformed finding rows are quarantined during resumeplugins/<slug>/runs/<run_id>/schema_invalid_<agent>.jsonif an agent returns invalid structured output after repairplugins/<slug>/runs/<run_id>/decision_ledger.jsonlplugins/<slug>/runs/<run_id>/trace.jsonlplugins/<slug>/runs/<run_id>/report_<finding_id>_<program>.md
When --no-verify is used, triage-accepted hypotheses are first graded by the quality gate, then written to the manual review queue instead of findings.jsonl; no submission reports are generated.
When triage voting or the quality gate cannot make a clean automatic decision, the hypothesis is preserved in the manual review queue. The per-run decision_ledger.jsonl records the exact stage, action, result, reason, and artifact path for each keep, reject, manual-review, verification, dedup, and report decision.
Run artifacts that affect resume are written atomically where possible. If a
crash leaves malformed rows in findings.jsonl, Squadrone preserves readable
findings, writes the bad rows to findings_corrupt.jsonl, and records the
recovery in decision_ledger.jsonl.
Squadrone's default pipelines enable strict quality controls. These are deterministic checks inspired by verifier/grader harnesses:
- Finding grader before verification accepts strong candidates, hard-rejects clear false positives, and routes borderline evidence or impact to manual review.
- WordPress false-positive rules reject admin-only, self-XSS, own-resource-only, cosmetic, open redirect, and low-impact CSRF cases; borderline submit-worthiness can be preserved for manual review.
- Evidence-first schema annotates each survivor with attacker role, source, sink, affected file/function, guard discussion, impact statement, and bounty routing.
- Severity recomputation derives an internal CVSS-style score and OWASP 2021 category instead of trusting model-written severity.
- Report grader blocks confirmed findings from becoming polished reports if the evidence or impact does not meet the submission bar.
- Focused review fanout writes
focus_areas.jsonand feeds the attack-surface map into specialist review. - V2 methodology is the default: the surveyor maps plugin type, sensitive objects, roles, and workflows; specialists review object authorization, state changes, payment logic, and stored-to-admin paths alongside classic vulnerability classes.
- Verifier voting is available with
--triage-votes N; use3or5when quality matters more than runtime. In multi-vote mode, majority-accepted findings continue, zero-accept findings reject, and split votes go to manual review. - Exploit-chain synthesis is available with
--chain; it enriches hypotheses withchains_with,chain_impact, andchain_severity_bump, and writeschain_diagnostics.jsonso skipped, failed, and empty chain passes are distinguishable.
Use --no-strict-quality for exploratory scans where you want more raw hypotheses.
# List runs
squadrone runs list
# Show one finding
squadrone findings show <finding-id>
# Review a completed run interactively
squadrone review <run-id>A confirmed finding typically has:
{
"id": "f-...",
"poc_status": "confirmed",
"dedup_status": "novel",
"report_paths": [
"plugins/<slug>/runs/<run_id>/report_f-..._wordfence.md",
"plugins/<slug>/runs/<run_id>/report_f-..._patchstack.md"
]
}Treat generated reports as drafts. Confirm the bug manually, reproduce the PoC, check source-side sanitization and authorization carefully, then disclose privately through the appropriate channel.
.venv/bin/pytest tests/unit/ -q
.venv/bin/ruff check src/
.venv/bin/mypy src/- stages/ — async pipeline functions: intake, recon, hypothesis, triage, verify, dedup, report
- agents/ — LLM-backed agents and the runtime tool-call loop
- services/ — LiteLLM gateway, budget tracker, SVN client, vuln-DB clients, Docker sandbox manager, WP-CLI wrapper
- schemas/ — Pydantic artifact models
- prompts/ — markdown system prompts for every agent
- poc_templates/ — Jinja2 PoC skeletons and Python helpers
- orchestrator.py — stage orchestration, SQLite persistence, budget handling
For a deeper design walkthrough, see DESIGN.md.
This tool is for authorised security research and responsible disclosure only.
Expected workflow:
- Scan a plugin you are allowed to test.
- Review confirmed findings manually.
- Reproduce and validate impact outside the generated draft.
- Disclose privately via Patchstack mVDP, Wordfence Vulnerability Disclosure, WPScan, or the plugin author.
- Wait for a fix before any public write-up.
- Use
squadrone discloseonly to record your own disclosure status.
Do not point Squadrone at infrastructure you do not own or have explicit written permission to test.
