dreadnode · monoxgas · Jun 12, 2026
diff --git a/capabilities/ai-red-teaming/README.md b/capabilities/ai-red-teaming/README.md
@@ -0,0 +1,48 @@
+# ai-red-teaming
+
+An agent-driven harness for adversarially probing AI systems. The `ai-red-teaming-agent` extracts attack parameters from a request, generates a Python workflow against the target, executes it, and reports platform-tracked metrics. Each workflow assembles three pieces — an **attack algorithm** (iterative jailbreaks like TAP/PAIR/Crescendo, ML adversarial samplers like HopSkipJump), optional **transforms** that mutate the adversarial prompt (encoding, cipher, persuasion, MCP/multi-agent poisoning), and **scorers** that judge whether the target broke — and runs them as trials on the Dreadnode platform. Targets can be plain LLMs, agentic HTTP endpoints (tools/MCP/multi-agent), RAG pipelines, or traditional ML classifiers. The agent is a parameter extractor: it does not write attack code or interpret results, it drives the generator tools and relays raw platform numbers.
+
+**Shape:** one agent (`ai-red-teaming-agent`, pinned to `claude-opus-4`), a Python tool surface (attack generation, workflow execution, assessment tracking, session context, platform analytics), and eight lazily-loaded skills (attack selection, transform/scorer reference, workflow patterns, compliance mapping, trace/analytics interpretation, troubleshooting). The attack-runner code generator and the catalogs of algorithms, transforms, and scorers live in `scripts/` and the skills — not here.
+
+The attack catalog (45 algorithms, 500+ transforms, the scorer set, and 260 bundled harm goals across 25 sub-categories) is methodology, not setup — the agent enumerates it on request (`"show me all available attacks"`) and the skills document selection. This README is for standing the harness up.
+
+## Setup
+
+Configuration is entirely through the environment — the tools self-bootstrap their dependencies via `uv run`. No `.env` autoload; set these on the deployer (secrets screen or web app).
+
+**Platform connection** (where assessments and trials are tracked):
+
+| Var | Notes |
+|---|---|
+| `DREADNODE_API_KEY` | Required with `DREADNODE_SERVER` for sandbox mode. |
+| `DREADNODE_SERVER` | Platform URL. |
+| `DREADNODE_ORGANIZATION` / `DREADNODE_WORKSPACE` / `DREADNODE_PROJECT` | Scope the run; optional. |
+
+If `DREADNODE_SERVER` + `DREADNODE_API_KEY` are unset, the runner falls back to a saved profile (`dreadnode login`). With neither, workflow execution aborts.
+
+**Model provider keys** — the attack, attacker, and judge models can be any litellm-routable provider. Supply the matching key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GROQ_API_KEY`, …) for whichever providers your target/attacker/judge models use; the runner warns at execution time when a model's key is missing. Alternatively, set `OPENAI_API_KEY` + `OPENAI_BASE_URL` to route prefix-less models through a LiteLLM proxy.
+
+**Target endpoints** are not env config — they are passed as tool parameters at attack time: a model alias or full litellm path for LLMs, an `agent_url` (plus `agent_auth_type` and an `agent_auth_env_var` naming a platform secret) for agentic targets, or an `api_url` for ML classifiers. The skills cover the parameter shapes.
+
+Outputs and session state land under `~/workspace/airt/` and `~/.dreadnode/airt/<org>/<workspace>/`; override with `AIRT_WORKFLOWS_DIR`, `AIRT_SESSION_PATH`, `AIRT_ASSESSMENT_PATH`.
+
+## Usage
+
+Drive it through the agent:
+
+```
+>>> @ai-red-teaming-agent Run TAP on gpt-4o, goal: extract the system prompt
+>>> @ai-red-teaming-agent Full safety sweep on claude-sonnet
+>>> @ai-red-teaming-agent Red team my agent at https://api.example.com/chat, make it execute shell commands
+```
+
+The agent picks a generator (`generate_attack`, `generate_category_attack`, `generate_agentic_attack`, `generate_image_attack`), executes the workflow, registers the assessment, validates the results, and reports the platform metrics. Session context carries target/goal/config across follow-ups so "now try Crescendo on the same target" reuses prior parameters.
+
+## Before you trust it
+
+- **This is offensive tooling against AI systems.** Attacks generate adversarial prompts, contact target endpoints, and attempt to elicit unsafe behavior. Only point it at models, agents, and endpoints you are authorized to test — the harm goals and prompts are test data, but the traffic to a target is real.
+- **Agentic and ML attacks hit live endpoints.** `agent_url` / `api_url` attacks send real requests; agentic runs can invoke the target's tools. Scope auth and dangerous-tool lists deliberately.
+- **Cost is query budget.** Iterative algorithms run hundreds to thousands of model queries per goal (see the per-attack budgets in the agent's table); a full category sweep multiplies that across 260 goals. Bound runs with `goals_per_category` and `n_iterations` before a kitchen-sink sweep.
+- **The agent reports platform data only** — it never interprets ASR/risk scores or invents numbers. Deeper analysis lives in the platform web interface and the trace/analytics skills.
+- **Compliance mappings are provenance, not a tour.** Goals and categories map to OWASP LLM Top 10, OWASP ASI01–ASI10, MITRE ATLAS, and NIST AI RMF; the `compliance-mapping` skill carries the crosswalk.
+- Unit tests ship under `tests/` for the script layer (attack runner, goal loader, assessment tracker, results inspector, workflow helper); there is no live end-to-end target test.
diff --git a/capabilities/android-apk-research/README.md b/capabilities/android-apk-research/README.md
@@ -0,0 +1,50 @@
+# android-apk-research
+
+Static semantic-bug research on Android APKs — deep-link routers, intent redirection, WebView trust boundaries, auth/session/client-state bypass, Dirty Stream share targets, and APK-derived backend API chains. A 10-tool orchestration MCP (`android-research`) handles the wide, parallel work — corpus inventory, component ranking, runtime classification, protector detection/unpack, API-map extraction, finding normalization — while the heavyweight decompile-and-hunt methodology (JADX heap tiers, ripgrep pattern packs, Semgrep rule ensembles, Joern/CodeQL recipes) lives in the skills as bash, where it belongs. Everything here is **static**: no device, no emulator, no live backend.
+
+**Shape:** one MCP server (`android-research`, self-bootstrapping via `uv run`), four skills — `android-corpus-prep` (AndroZoo/Play target selection), `android-semantic-vuln-hunting` (canonical at-scale methodology), `android-targeted-assessment` (one-APK depth mode), `android-protector-triage` (DexProtector/Promon handling). Findings ground in OWASP MASVS / MASTG, MASWE, and CWE.
+
+## What the MCP exposes
+
+Ten tools, grouped by pipeline stage. They orchestrate scripts under `scripts/`; the actual decompilation and scanning stays in the skills.
+
+| Stage | Tools |
+|---|---|
+| Probe | `inventory_status` |
+| Corpus inventory | `run_corpus_inventory` |
+| Attack-surface ranking | `extract_components`, `rank_components`, `detect_runtime_kind` |
+| Protector triage | `detect_protector`, `dexprotector_unpack` |
+| Backend mapping | `extract_api_map`, `rank_backend_richness` |
+| Reporting | `normalize_semantic_findings` |
+
+## Setup
+
+The MCP self-bootstraps (PEP 723 / `uv run`) — no Python install step. The work it orchestrates depends on external CLIs that are **not** bundled; the manifest `checks:` block surfaces missing hard prerequisites in the TUI capability manager.
+
+**Hard prerequisites** (manifest `checks:` — capability is degraded without them):
+
+| Tool | Why |
+|---|---|
+| `uv` | Runs the MCP and its PEP 723 scripts |
+| `jadx` | DEX → Java decompilation (the core hunting surface) |
+| `apktool` | Resource / manifest decoding |
+| `aapt` or `aapt2` | Manifest fallback when Androguard errors on multi-dex APKs |
+| `semgrep` | Rule-pack triage of decompiled source |
+| `apkid` | Packer / protector signal during inventory |
+
+**Skill-step tools** (not checked at install, but `inventory_status` reports them — needed for specific methodology steps): `joern`, `codeql`, `adb`, plus hybrid-runtime follow-ups `hbctool` (Hermes), `blutter` (Flutter/Dart AOT), and `prettier` / `npx` (JS bundle work). `android-corpus-prep` additionally uses DuckDB for AndroZoo Parquet selection. Call `inventory_status` once at session start to see which steps will run end-to-end on the host.
+
+**Tunables** (set via the deployer environment — secrets screen or web app; no `.env` autoload):
+
+| Var | Default | Change when |
+|---|---|---|
+| `ANDROID_RESEARCH_MAX_OUTPUT_CHARS` | `20000` | Tool output is being truncated and you need more inline context |
+| `ANDROID_RESEARCH_TIMEOUT` | `300` | Reference default only; each tool takes its own `timeout` arg (per-APK inventory 180s, unpack 600s, etc.) |
+
+## Before you trust it
+
+- **Static only.** `extract_api_map` output is a *target map* for backend hypotheses, not proof — findings default to `needs_backend_validation` until tested against authorized accounts. No exploitation, no live-backend probing, no APK patching ships here.
+- **DexProtector unpack is arm64-v8a only.** `dexprotector_unpack` statically recovers `libdp.so` via Unicorn emulation (it never *executes* the blob); other ABIs and other protectors fall back to adjacency analysis only. Always run `detect_protector` first and gate on `dexprotector_unpack_supported`.
+- **Authorization is the operator's job.** The skills default to static + authorized read-only validation; pointing the pipeline at APKs or backends you're not cleared to test is out of scope by design.
+
+Agent-facing usage — the JADX heap tiers, ripgrep/Semgrep/Joern/CodeQL recipes, bug-class catalog, and finding schema — lives in `skills/`, not here. The MCP carries a header note on the "why bash, not MCP" split; the long-form rationale is in `skills/android-semantic-vuln-hunting/references/workflow.md`.
diff --git a/capabilities/bloodhound/README.md b/capabilities/bloodhound/README.md
@@ -0,0 +1,29 @@
+# bloodhound
+
+Wires a local [BloodHound Community Edition](https://bloodhound.specterops.io) deployment into chat and agents. The MCP authenticates to the CE REST API to verify the session, then runs **Cypher against the underlying Neo4j graph over Bolt** for AD/Entra attack-path analysis — domain enumeration, Tier Zero, Kerberos, delegation, ADCS, NTLM relay, and hygiene. It ships ~25 named "standard queries" alongside an arbitrary-Cypher tool.
+
+Twin of `bloodhound-enterprise/`: **this** talks Bolt to a local CE Neo4j; **that** talks HMAC-signed REST to a hosted BloodHound Enterprise deployment. Use this one when you run your own CE/Neo4j stack.
+
+## Setup
+
+The server connects to two endpoints — the CE web API (to authenticate) and the Neo4j graph (where queries run). Both default to a standard local CE Docker stack; the only value you must supply is the BloodHound password:
+
+| Var | Default | Reason to change |
+|-----|---------|------------------|
+| `BLOODHOUND_URL` | `http://localhost:8080` | CE running on another host/port |
+| `BLOODHOUND_USERNAME` | `admin` | non-default CE account |
+| `BLOODHOUND_PASSWORD` | (required) | the CE login secret — no default |
+| `NEO4J_URL` | `bolt://localhost:7687` | Neo4j not co-located with CE |
+| `NEO4J_USERNAME` | `neo4j` | non-default Neo4j account |
+| `NEO4J_PASSWORD` | `bloodhoundcommunityedition` | you changed the CE Neo4j password |
+| `NEO4J_DATABASE` | `neo4j` | multi-database Neo4j |
+
+Set these as capability secrets, or pass them at runtime via the `connect` tool (overrides env for the session). Until a password is present the server raises `Not connected` on first query.
+
+## Before you trust it
+
+- **Read/query only.** The four tools (`connect`, `query`, `standard_query`, `list_queries`) execute Cypher against the graph — there is no SharpHound/AzureHound ingest path here. Collect and import data with the normal CE tooling first; this capability analyzes what's already loaded.
+- **`query` runs arbitrary Cypher** with the configured Neo4j credentials — scope those credentials to the read posture you want.
+- **`docs/`** is imported SpecterOps BloodHound reference (node/edge/glossary docs), bundled under its own Apache-2.0 `LICENSE` for offline schema lookup.
+
+Agent-facing usage — Cypher idioms, the standard-query catalog, and attack-path tradecraft — lives in `skills/bloodhound/`, not here.
diff --git a/capabilities/dotnet-reversing/README.md b/capabilities/dotnet-reversing/README.md
@@ -0,0 +1,29 @@
+# dotnet-reversing
+
+ILSpy-backed decompilation and static analysis for .NET assemblies (`.dll` / `.exe`). One agent (`dotnet-reversing-agent`) over a Python toolset that drives [ILSpy](https://github.com/icsharpcode/ILSpy) through `pythonnet`/CoreCLR: scan a directory for binaries, walk namespaces and types, decompile a type or specific methods to C#, search IL operands for API usage, and trace call flows across assemblies to a target method. Targets don't have to be on disk — it can pull a NuGet package or extract .NET assemblies straight out of a Microsoft Container Registry image (HTTP-only, the container is never run).
+
+**Shape:** one agent, two skills (`dotnet-reversing` for the decompilation workflow, `mcr-analysis` for MCR image extraction), and a Python `@tool` surface — no MCP server. The reversing tools run in a persistent subprocess pinned to **Python 3.12** (a `pythonnet` requirement); the parent process proxies calls to it over a local HTTP port.
+
+## Setup
+
+There is no manifest config to fill in — the toolset **bootstraps its own backend on first use**. The first `dotnet_*` tool call spawns the subprocess (via `uv run --python 3.12 --with pythonnet`), and that subprocess downloads, if not already present:
+
+| Component | Version | Source |
+|---|---|---|
+| .NET runtime (runtime-only, no SDK) | channel `8.0` | `dot.net/v1/dotnet-install.sh` (~100 MB) |
+| ILSpy decompiler DLLs (`ICSharpCode.Decompiler.dll`, `Mono.Cecil.dll`) | `8.2.0.7535` | ILSpy GitHub releases |
+| `pythonnet` | `>=3.0.5` | pip / uv |
+
+The download is **one-time and idempotent** — subsequent runs detect the installed DLLs and skip it. Dependencies land in a persistent deps directory so they survive sandbox restarts: `/home/user/workspace/.dreadnode/deps` in the Dreadnode sandbox (when `DREADNODE_SANDBOX` is set or the workspace is an S3 mount), `~/.dreadnode/deps` locally. The bootstrap sets `DOTNET_ROOT` and the ILSpy lib path itself; you don't configure them.
+
+Prerequisites the bootstrap does **not** install for you: `uv` (used to launch the 3.12 subprocess; a `python3.12` with `pythonnet` already present is the fallback), plus `curl` and `unzip` for the downloads. First call needs outbound network to Microsoft and GitHub.
+
+`CAPABILITY_PORT` (default `9797`) overrides the subprocess HTTP port if it collides; a free port is auto-selected otherwise.
+
+## Scope
+
+- **Targets:** managed .NET assemblies — `.dll` and `.exe`. Decompilation is ILSpy's; obfuscated or AOT/native-compiled binaries decompile poorly or not at all.
+- **Read-only.** Tools decompile and inspect; nothing patches or writes to the target. Reporting tools persist findings to the Dreadnode platform.
+- **NuGet & MCR** are convenience fetchers — `dotnet_download_nuget` pulls from nuget.org, the `mcr_*` tools extract layers from `mcr.microsoft.com` over HTTP without Docker and without executing the image.
+
+`secure-software` hands off to this capability for .NET assemblies found inside packages (its tools surface under the `dotnet_*` namespace); agent-facing usage — the decompilation and vuln-hunting workflow, tool-by-tool — lives in `skills/`, not here.
diff --git a/capabilities/ios-forensics/README.md b/capabilities/ios-forensics/README.md
@@ -0,0 +1,55 @@
+# ios-forensics
+
+A curated [MVT](https://github.com/mvt-project/mvt) (Mobile Verification Toolkit) surface for triaging iOS acquisitions for mercenary-spyware compromise — Pegasus, Predator, QuaDream, RCS, Hermit. The `mvt` MCP wraps the `mvt-ios` CLI behind verb-named tools (device info, installed apps, configuration profiles, TCC grants, data usage, SMS, calls, Safari, shutdown log) plus a STIX-IoC sweep that correlates every module against published indicator feeds. Backup-native helpers (`Manifest.db` resolution, read-only SQLite, plist parsing) let the agent pivot a flagged record into the underlying artifact. The `ios-forensics-analyst` agent drives the whole loop: identify → triage → focused hunt → extract → report.
+
+This is triage, not chain-of-custody forensics. It tells you whether a device looks compromised and pins findings to specific artifacts; it does not produce court-grade evidence packages or perform acquisition.
+
+**Shape:** one agent (`ios-forensics-analyst`), one MCP server (`mvt`, ~19 tools), five playbook skills (image triage, spyware hunt, communications analysis, activity reconstruction, config/persistence review). Sibling capability `memory-forensics` mirrors this shape for memory images.
+
+## Setup
+
+**1. Install MVT.** The MCP does not vendor it. It resolves the command as `MVT_COMMAND` → `mvt-ios` on `PATH` → a PEP 723 fallback that runs the `mvt` package installed into the `uv` venv. The fallback works, but install MVT explicitly so the version is yours to control:
+
+```
+pipx install mvt        # or: uv tool install mvt
+```
+
+**2. Produce an input.** MVT reads one of two source kinds — every tool takes a `source_kind`:
+
+| `source_kind` | What it is | How to get it |
+|---|---|---|
+| `backup` | iTunes/Finder backup directory | Finder (or `idevicebackup2 backup` from libimobiledevice). **Enable encryption** before backing up — it pulls Health, keychain metadata, and more that an unencrypted backup omits. |
+| `fs` | Full-filesystem extraction | A jailbreak / `checkm8`-class acquisition (commercial tooling or `palera1n`-style). |
+
+Most modules run on a backup. A handful are FFS-only and are the highest-signal spyware artifacts — `shutdown_log`, PowerLog, WebKit DataStore/resource logs, crash `.ips` files. If you only have a backup, the agent will say so rather than fabricate a verdict.
+
+Encrypted backups: supply the password to `mvt_decrypt_backup`, which writes a decrypted working copy. The password is passed to `mvt-ios` as a CLI argument, so it is briefly visible to local process listings while the subprocess runs.
+
+**3. Bring STIX IoCs.** Spyware detection is only as current as the indicators you feed it. MVT correlates modules against STIX2 IoC files supplied via the `iocs=` parameter — these are **not** bundled. Pull the latest from [Amnesty's Security Lab](https://github.com/AmnestyTech/investigations), Citizen Lab, Kaspersky GReAT, or Volexity, or hand-write a minimal STIX2 file for custom indicators. Absence of STIX hits is not a clean verdict; feeds lag live campaigns by weeks to months.
+
+### Tuning
+
+No credentials or secrets. Optional environment variables:
+
+| Var | Default | Change when |
+|---|---|---|
+| `MVT_COMMAND` | unset | You want a specific `mvt-ios` binary (e.g. a pinned venv) instead of `PATH` resolution. |
+| `MVT_TIMEOUT` | `900` (s) | A module on a large FFS times out; raise it. Per-subprocess. |
+| `MVT_MAX_OUTPUT_CHARS` | `200000` | Tool output is truncating mid-analysis and your context budget can absorb more. |
+
+## Usage
+
+Drive it through the agent:
+
+```
+>>> @ios-forensics-analyst triage the backup at ~/cases/device-01 with the Amnesty STIX feed at ~/iocs/amnesty.stix2.json
+```
+
+It runs `mvt_status` → `mvt_info` to fix device context, sweeps the high-signal modules, runs the STIX correlation, then pivots any hit into the underlying SQLite/plist record. The five skills carry the per-phase methodology; the agent loads them as evidence demands.
+
+## Before you trust it
+
+- **Triage scope, not custody.** Findings are evidence-pinned but this is not an acquisition or chain-of-custody tool. Treat suspected mercenary-spyware findings as sensitive — default to minimum distribution until victims and legal/human-rights stakeholders are briefed.
+- **Backups underperform FFS.** The most diagnostic spyware artifacts (shutdown_log, WebKit DataStore, crash logs) only exist in a full-filesystem extraction. A clean backup sweep is not an all-clear.
+- **IoC currency is on you.** Detection depends entirely on the STIX feed you supply and when it was last updated.
+- **No tests ship** for the MCP server. The SQLite helpers open read-only with `ATTACH`/`DETACH` denied at the authorizer, so query tools can't write or reach beyond the named database.