Vespasian: API Discovery and Specification Generation Tool

Discover API endpoints from real HTTP traffic. Generate OpenAPI, GraphQL SDL, and WSDL specs automatically.

Vespasian: API Discovery and Specification Generation Tool

Vespasian discovers API endpoints by observing real HTTP traffic and generates API specification files from those observations. It captures traffic through headless browser crawling or imports it from existing sources (Burp Suite XML exports, HAR files, and mitmproxy dumps), then classifies requests, probes discovered endpoints, and outputs specifications in the native format for each API type: OpenAPI 3.0 for REST, GraphQL SDL for GraphQL, and WSDL for SOAP services.

Built for penetration testers and security engineers who need to map the API attack surface of web applications, single-page apps, and microservices when the API documentation is not available.

Why Vespasian?

Modern applications make API calls dynamically. Single-page applications construct requests at runtime via JavaScript. Mobile apps call APIs through native HTTP clients. Real-time features communicate over WebSocket connections. Static analysis and source code review miss these runtime behaviors entirely.

Existing approaches to API discovery have limitations:

Checking known paths (/swagger.json, /openapi.yaml) only finds APIs that are explicitly documented
Static analysis cannot observe requests that are constructed dynamically at runtime
Manual proxy capture is time-consuming and produces raw traffic without structured specifications

Vespasian takes a different approach: it observes actual network traffic at the wire level, then uses classification heuristics and active probing to produce structured API specifications automatically. Because this is inherently probabilistic, Vespasian discovers only the endpoints present in the captured traffic, but it reliably maps the API surface that an application actually exposes during use.

Key Features

Feature	Description
REST API Discovery	Classifies REST endpoints via content-type, path patterns, and response structure; outputs OpenAPI 3.0
GraphQL API Discovery	Detects GraphQL endpoints, runs tiered introspection queries, and generates GraphQL SDL schemas
WSDL/SOAP Discovery	Identifies SOAP services via SOAPAction headers and envelope detection; fetches and parses WSDL documents
API Type Auto-Detection	Automatically determines API type (REST, GraphQL, WSDL) from captured traffic without manual selection
Headless Browser Crawling	Drives a headless Chrome browser with full JavaScript execution for SPA support, powered by Katana
Traffic Import	Import existing captures from Burp Suite XML, HAR 1.2 files, and mitmproxy dumps
Active Probing	OPTIONS discovery, JSON schema inference, WSDL document fetching, and GraphQL introspection
Path Normalization	`/users/42` and `/users/87` become `/users/{id}` with known literal preservation (`/me`, `/self`)
SSRF Protection	Blocks crawling and probing of private and loopback addresses by default. Pass `--dangerous-allow-private` to test internal targets (localhost, 127.0.0.1, RFC1918, link-local); the flag is required when the seed URL is itself a private host.
Proxy Support	Route headless browser traffic through Burp Suite or other intercepting proxies
Two-Stage Pipeline	Capture once, generate many: separate capture and generation steps for maximum flexibility

How It Works

Vespasian uses a two-stage pipeline that separates traffic capture from specification generation:

flowchart LR
    subgraph Capture
        A["Headless Browser Crawler<br/>JS execution, auth injection"] --> C["capture.json<br/>ObservedRequest array"]
        B["Traffic Importers<br/>Burp Suite XML, HAR, mitmproxy"] --> C
    end
    subgraph Generate
        C --> D["Classifier<br/>REST, GraphQL, WSDL"]
        D --> E["Prober<br/>OPTIONS, schema, WSDL, introspection"]
        E --> F["Spec Generator<br/>OpenAPI 3.0, GraphQL SDL, WSDL"]
    end

Why two stages:

Capture once, generate many. Run different generators against the same capture without re-scanning.
Debuggable. The capture file is inspectable JSON, isolating capture bugs from generation bugs.
Composable. Import traffic from any source (browser crawls, proxy captures, mobile testing).
Offline analysis. Generate specifications without network access, useful during limited engagement windows.

How to Install Vespasian

Install from Source (Go)

go install github.com/praetorian-inc/vespasian/cmd/vespasian@latest

Download Pre-Built Binary

Download the latest binary for your platform from the Releases page.

Build from Source

git clone https://github.com/praetorian-inc/vespasian.git
cd vespasian
make build

How to Discover APIs with Vespasian

Quick Start: Scan a Web Application

# Crawl and generate an API spec in one step (auto-detects API type)
vespasian scan https://app.example.com -o api.yaml

# With authentication
vespasian scan https://app.example.com -H "Authorization: Bearer <token>" -o api.yaml

# Specify the API type explicitly
vespasian scan https://app.example.com --api-type graphql -o schema.graphql

Two-Stage Workflow

# Stage 1: Capture traffic via headless browser
vespasian crawl https://app.example.com -o capture.json

# Stage 1 (alternative): Import traffic from Burp Suite
vespasian import burp traffic.xml -o capture.json

# Stage 1 (alternative): Import traffic from HAR archive
vespasian import har recording.har -o capture.json

# Stage 1 (alternative): Import traffic from mitmproxy
vespasian import mitmproxy flows -o capture.json

# Stage 2: Generate OpenAPI spec for REST
vespasian generate rest capture.json -o api.yaml

# Stage 2: Generate GraphQL SDL schema
vespasian generate graphql capture.json -o schema.graphql

# Stage 2: Generate WSDL from SOAP traffic
vespasian generate wsdl capture.json -o service.wsdl

Common Options

# Route crawl traffic through Burp Suite
vespasian scan https://app.example.com --proxy http://127.0.0.1:8080 -o api.yaml

# Scan a local/private target (bypasses SSRF protection)
vespasian scan http://localhost:3000 --dangerous-allow-private -o api.yaml

# Verbose output to see discovered requests in real-time
vespasian scan https://app.example.com -v -o api.yaml

# Suppress the startup banner
vespasian --no-banner scan https://app.example.com -o api.yaml

Use Cases

Penetration Testing without API Documentation

During authorized security assessments, clients often cannot provide API documentation. Vespasian crawls the target application with a headless browser, captures every API call the frontend makes, and produces specifications that describe the discovered endpoints, parameters, and response schemas.

Generating API Specs from Existing Proxy Captures

Pentesters already capture traffic in Burp Suite and mitmproxy during manual testing. Rather than re-crawling, Vespasian can import that traffic and generate specifications from work already done. This is especially useful for mobile application testing, where no browser crawl can observe the API calls.

Mapping API Attack Surface for Web Applications

For attack surface management, Vespasian identifies which API endpoints a web application exposes by executing its JavaScript and intercepting all outbound requests. The resulting specification can feed into further security testing tools that accept OpenAPI, GraphQL SDL, or WSDL input.

Feeding into Hadrian for Authorization Testing

Generate an API specification with Vespasian, then pass it directly to Hadrian for automated OWASP API Top 10 authorization testing. This creates a complete discover-then-test workflow.

API Type Support

Vespasian classifies and generates specifications for three API types:

API Type	Classification Signals	Output Format	Probing
REST	JSON/XML content-type, `/api/` `/v1/` path patterns, HTTP methods	OpenAPI 3.0 (YAML/JSON)	OPTIONS discovery, JSON schema inference
GraphQL	`/graphql` path, query structure in POST body, `data`/`errors` response keys	GraphQL SDL	Tiered introspection queries (3 tiers for WAF bypass)
WSDL/SOAP	SOAPAction header, SOAP envelope in body, `?wsdl` URL parameter	WSDL XML	Active `?wsdl` document fetching

REST Classification Heuristics

Content-type: responses with application/json or application/xml
Static asset exclusion: drops .js, .css, .png, .woff, /static/, /assets/
Path heuristics: /api/, /v1/, /v2/, /v3/, /rest/, /rpc/ paths boost confidence
HTTP method: POST/PUT/PATCH/DELETE to non-page URLs
Response structure: JSON object or array bodies (not HTML)

GraphQL Classification Heuristics

Path matching: /graphql path (0.70 confidence)
Query structure: GraphQL query syntax in POST body (0.85 confidence)
Response structure: data/errors keys in response (0.80 confidence)
Combined signals: path + body together (0.95 confidence)

GraphQL Introspection

Vespasian uses a tiered introspection strategy to handle WAF-protected GraphQL servers:

Tier 1: Full introspection with descriptions, deprecation, and directives
Tier 2: Minimal-complete query without descriptions, deprecation info, or directives
Tier 3: Minimal last-resort query with the smallest payload
Fallback: Traffic-based inference from observed queries and mutations when introspection is disabled

CLI Reference

`vespasian scan`

Convenience command that crawls a target and generates a specification in one step.

vespasian scan <url> [flags]
  --api-type         API type: auto, rest, graphql, wsdl (default: auto)
  -H, --header       Auth headers to inject (repeatable)
  -o, --output       Output spec file (default: stdout)
  --depth            Max crawl depth (default: 3)
  --max-pages        Max pages to visit (default: 100)
  --timeout          Maximum duration for the entire scan (default: 10m)
  --scope            same-origin or same-domain (default: same-origin)
  --headless         Browser mode (default: true)
  --proxy            Proxy URL for headless browser (e.g., http://127.0.0.1:8080)
  --confidence       Min classification confidence (default: 0.5)
  --probe            Enable active probing (default: true)
  --deduplicate      Deduplicate endpoints before probing (default: true)
  --dangerous-allow-private  Disable SSRF protection for crawling and probes,
                     allowing private/localhost targets (localhost, 127.0.0.1,
                     RFC1918, link-local). Required when the seed URL is a
                     private host, otherwise the crawl exits with an error and
                     captures nothing. WARNING: Do not use on production
                     systems.
  --no-request-id    Disable auto X-Vespasian-Request-Id header
  -v, --verbose      Show requests in real-time

`vespasian crawl`

Captures HTTP traffic by driving a headless browser through the target application.

vespasian crawl <url> [flags]
  -H, --header       Auth headers to inject (repeatable)
  -o, --output       Capture output file (default: stdout)
  --depth            Max crawl depth (default: 3)
  --max-pages        Max pages to visit (default: 100)
  --timeout          Maximum duration for the entire crawl (default: 10m)
  --scope            same-origin or same-domain (default: same-origin)
  --headless         Browser mode (default: true)
  --proxy            Proxy URL for headless browser (e.g., http://127.0.0.1:8080)
  --dangerous-allow-private  Disable SSRF protection for crawling, allowing
                     private/localhost targets (localhost, 127.0.0.1, RFC1918,
                     link-local). Required when the seed URL is a private
                     host, otherwise the crawl exits with an error and
                     captures nothing. WARNING: Do not use on production
                     systems.
  --no-request-id    Disable auto X-Vespasian-Request-Id header
  -v, --verbose      Show requests in real-time

`vespasian import`

Converts traffic captures from external tools and formats into the Vespasian capture format.

vespasian import <format> <file> [flags]
  Formats: burp, har, mitmproxy
  -o, --output       Capture output file (default: stdout)
  -v, --verbose      Show imported requests

`vespasian generate`

Produces an API specification from a capture file.

vespasian generate <api-type> <capture-file> [flags]
  API types: rest, graphql, wsdl
  -o, --output       Output file (default: stdout)
  --confidence       Min classification confidence (default: 0.5)
  --probe            Enable active probing (default: true)
  --deduplicate      Deduplicate endpoints before probing (default: true)
  --dangerous-allow-private  Disable SSRF protection on the probe path
                     (OPTIONS/schema/WSDL-fetch/GraphQL introspection) for
                     private/localhost targets. WARNING: Do not use on
                     production systems.
  -v, --verbose      Show discovered endpoints

Architecture

Pipeline Components

Component	Purpose	Supported Types
Crawler	Drives a headless browser to capture HTTP traffic, powered by Katana	Protocol-agnostic
Importers	Convert Burp Suite XML, HAR, and mitmproxy traffic to capture format	All three formats
Classifier	Separates API calls from static assets using heuristics	REST, GraphQL, WSDL
Prober	Enriches endpoints via active requests	OPTIONS, JSON schema, WSDL fetch, GraphQL introspection
Generator	Produces specification files from classified and probed traffic	OpenAPI 3.0, GraphQL SDL, WSDL

Package Layout

cmd/vespasian/          CLI entry point
pkg/crawl/              Headless browser crawler + capture format
pkg/importer/           Traffic importers (Burp, HAR, mitmproxy)
pkg/classify/           API classification (REST, GraphQL, WSDL)
pkg/probe/              Endpoint probing (OPTIONS, schema, WSDL, GraphQL introspection)
pkg/generate/
  ├── rest/             OpenAPI 3.0 generation, path normalization, schema inference
  ├── graphql/          GraphQL SDL generation, introspection, traffic inference
  └── wsdl/             WSDL generation, SOAP operation extraction

Frequently Asked Questions

What types of APIs can Vespasian discover?

Vespasian discovers REST APIs (generating OpenAPI 3.0 specs), GraphQL APIs (generating SDL schemas via introspection or traffic inference), and SOAP/WSDL services (generating WSDL documents). It automatically detects the API type from captured traffic, or you can specify it explicitly with --api-type.

How is Vespasian different from running a web crawler?

Standard web crawlers follow HTML links and index pages. Vespasian intercepts all HTTP traffic from a headless browser, including XHR/fetch API calls, WebSocket upgrades, and dynamically constructed requests that don't appear in HTML. It then classifies those requests by API type and generates structured specifications, not just URL lists.

Does Vespasian find undocumented APIs?

Vespasian discovers any API endpoint that the application calls during the crawl. If the frontend calls /api/internal/debug at runtime, Vespasian will capture and document it, even if it doesn't appear in any published API documentation.

Can I use Vespasian with traffic I've already captured?

Yes. If you've already captured traffic using Burp Suite, browser dev tools (HAR), or mitmproxy, use vespasian import to convert it to the capture format, then vespasian generate to produce specifications. No re-crawling needed.

Does Vespasian handle GraphQL servers that disable introspection?

Yes. Vespasian uses a tiered introspection strategy. If the full introspection query is blocked, it tries progressively simpler queries. If all introspection is disabled, it falls back to inferring the schema from observed queries and mutations in the captured traffic.

Is it safe to run against production?

Vespasian's crawl stage drives a browser and follows links, which is read-only. The probing stage sends OPTIONS requests, fetches ?wsdl documents, and runs GraphQL introspection queries, all of which are read-only operations. However, always coordinate with the target owner and prefer staging environments during security assessments.

Development

Prerequisites

Build and Test

git clone https://github.com/praetorian-inc/vespasian.git
cd vespasian
make build       # Build the binary to bin/vespasian
make test        # Run tests with race detection
make lint        # Run golangci-lint (gocritic, misspell, revive)
make check       # Run all checks (fmt, vet, lint, test)

make coverage    # Generate coverage report
make deps        # Download and tidy modules
make clean       # Remove build artifacts

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Commit your changes (git commit -am 'Add my feature')
Push to the branch (git push origin feature/my-feature)
Open a Pull Request

Please ensure all CI checks pass before requesting review.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

About Praetorian

Praetorian is a cybersecurity company that helps organizations secure their most critical assets through offensive security services and the Praetorian Guard attack surface management platform.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github		.github
cmd/vespasian		cmd/vespasian
docs/images		docs/images
pkg		pkg
test		test
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
.mailmap		.mailmap
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Vespasian: API Discovery and Specification Generation Tool

Why Vespasian?

Key Features

How It Works

How to Install Vespasian

Install from Source (Go)

Download Pre-Built Binary

Build from Source

How to Discover APIs with Vespasian

Quick Start: Scan a Web Application

Two-Stage Workflow

Common Options

Use Cases

Penetration Testing without API Documentation

Generating API Specs from Existing Proxy Captures

Mapping API Attack Surface for Web Applications

Feeding into Hadrian for Authorization Testing

API Type Support

REST Classification Heuristics

GraphQL Classification Heuristics

GraphQL Introspection

CLI Reference

vespasian scan

vespasian crawl

vespasian import

vespasian generate

Architecture

Pipeline Components

Package Layout

Frequently Asked Questions

What types of APIs can Vespasian discover?

How is Vespasian different from running a web crawler?

Does Vespasian find undocumented APIs?

Can I use Vespasian with traffic I've already captured?

Does Vespasian handle GraphQL servers that disable introspection?

Is it safe to run against production?

Development

Prerequisites

Build and Test

Contributing

License

About Praetorian

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`vespasian scan`

`vespasian crawl`

`vespasian import`

`vespasian generate`

Packages