A Claude Code skill that converts any content into stunning visual explanations — whiteboard sketches, professional infographics, presentation slides, technical diagrams, mind maps, and UI wireframe mockups — powered by OpenAI's gpt-image-1.5 or Google Gemini's Nano Banana 2.
AI-generated visual explanations have exploded in popularity — tools like NotebookLM and Gemini can turn documents into polished infographics and whiteboard sketches. But these tools are closed ecosystems. You can't customize the output style, integrate them into your dev workflow, or control the prompts that drive the generation.
Visual Explainer brings this capability directly into Claude Code as a slash command. It takes any content — a topic, a document, meeting notes, a codebase — and transforms it into a rich visual explanation using OpenAI's gpt-image-1.5 model.
The core insight is that image generation quality depends almost entirely on prompt quality. Visual Explainer uses deeply structured, 400-800 word prompts with explicit spatial layout, icon descriptions, color palettes, typography, and connections — producing results that rival or exceed what dedicated visual AI tools generate.
- Style Spectrum — From rough whiteboard sketches to polished infographics, with a
--draw-levelparameter to control exactly where on the hand-drawn-to-professional spectrum the output lands - Deep Content Analysis — Every generation starts with structured extraction of concepts, relationships, visual metaphors, and layout strategy before any prompt is written
- Prompt Engineering as the Product — The skill's value is in its style-specific prompt templates, not just API wrappers. Each style (whiteboard, infographic, presentation, diagram, mindmap, mindmap-structured, mockup) has a comprehensive template tuned for that visual language
- Composable with Documents — Works naturally with Claude Code's ability to read files, so you can point it at any existing doc, spec, or codebase and generate visuals from it
Hand-drawn, colorful, educator-style — like walking into a classroom with an amazing whiteboard illustration.
Clean, structured, publication-quality — numbered sections, flat-design icons, cohesive color palettes.
Rougher hand-drawn feel with --draw-level sketch — casual, playful, like a developer sketching during standup.
Precise, technical, well-labeled architecture diagram with --complexity detailed — layered layout with color-coded legend.
Progressive build-up with --mode multi-frame — 3 frames that introduce actors, show the flow, then present the complete picture.
Bold, minimal, conference-keynote quality — dark background with strong visual hierarchy and layered architecture.
Vibrant, colorful, radial mind map — organic branches, bold colors, visual icons for each concept.
Clean, data-oriented, XMind-style — muted colors, category tags, metadata badges, professional layout.
Convert a Mermaid flowchart into a polished infographic with --from mermaid. All nodes, edges, and labels are extracted and transformed.
Source Mermaid
flowchart TD
A[User Request] --> B{Authentication}
B -->|Valid Token| C[API Gateway]
B -->|Invalid| D[401 Unauthorized]
C --> E{Rate Limit Check}
E -->|Under Limit| F[Route to Service]
E -->|Over Limit| G[429 Too Many Requests]
F --> H[User Service]
F --> I[Order Service]
F --> J[Payment Service]
H --> K[(Users DB)]
I --> L[(Orders DB)]
J --> M[(Payments DB)]
H --> N[Response Builder]
I --> N
J --> N
N --> O[JSON Response]
O --> P[Client]
Convert a Mermaid sequence diagram into a vibrant whiteboard sketch with --from mermaid. Actors become illustrated characters, messages become hand-drawn arrows.
Source Mermaid
sequenceDiagram
participant U as User
participant B as Browser
participant S as Server
participant DB as Database
participant C as Cache
U->>B: Fill login form
B->>S: POST /api/login {email, password}
S->>DB: SELECT user WHERE email=?
DB-->>S: User record
S->>S: Verify bcrypt hash
alt Password valid
S->>S: Generate JWT token
S->>C: Store session {userId, token}
C-->>S: OK
S-->>B: 200 {token, user}
B->>B: Store token in localStorage
B-->>U: Redirect to dashboard
else Password invalid
S-->>B: 401 Invalid credentials
B-->>U: Show error message
end
Same topic, same style, same prompt — rendered by both backends for comparison.
| OpenAI (gpt-image-1.5) | Gemini (Nano Banana 2) |
|---|---|
![]() |
![]() |
| OpenAI | Gemini | |
|---|---|---|
| Dimensions | 1536x1024 (exact) | 1024x1024 (ignores size request) |
| Text clarity | Clean, all legible | Clean, all legible |
| Style fidelity | Polished whiteboard texture, subtle details | Bolder colors, stronger section borders |
| Size control | Honors exact dimensions | Always produces square output |
| Cost | ~$0.29/image | Free tier available |
Both backends produce quality results from the same prompt. OpenAI gives more control over dimensions and a more refined aesthetic. Gemini is solid and has a free tier but doesn't respect size parameters.
Polished, Figma-quality UI wireframe with --style mockup --device desktop — browser chrome, sidebar navigation, stats cards, charts, and data table.
The mockup style supports three device frames (--device mobile|desktop|tablet) and three fidelity levels via --draw-level:
- sketch — hand-drawn wireframe, great for brainstorming and design sprints
- normal — mid-fidelity, clean enough to share with stakeholders
- polished — Figma/Sketch-quality, pixel-perfect precision for design reviews
Use cases: rapid wireframing from PRDs, brainstorming UI layouts, visualizing modernized interfaces for existing code, stakeholder alignment before opening Figma.
Install Claude Code if you haven't already:
npm install -g @anthropic-ai/claude-codeYou need at least one of the following. If both are set, OpenAI is used by default (override with --backend gemini).
- Go to platform.openai.com/api-keys
- Create a new secret key and copy it
export OPENAI_API_KEY="sk-..."- Go to aistudio.google.com/apikey
- Create an API key and copy it
export GEMINI_API_KEY="AIza..."Add your key(s) to your shell profile:
# For zsh (~/.zshrc)
echo 'export OPENAI_API_KEY="sk-..."' >> ~/.zshrc
echo 'export GEMINI_API_KEY="AIza..."' >> ~/.zshrc
source ~/.zshrc
# For bash (~/.bashrc or ~/.bash_profile)
echo 'export OPENAI_API_KEY="sk-..."' >> ~/.bashrc
echo 'export GEMINI_API_KEY="AIza..."' >> ~/.bashrc
source ~/.bashrcThe skill uses jq to parse JSON responses from the API:
# macOS
brew install jq
# Ubuntu/Debian
sudo apt-get install jqThis skill was primarily developed and tested with Claude Code, but it should work with any Skills-compatible agent or CLI tool that supports markdown skill definitions, including:
- Claude Code (primary target)
- OpenClaw (tested)
- Any agent that reads
.mdskill files with YAML frontmatter
The skill is a self-contained markdown file with structured instructions. Any agent that can parse the frontmatter, read the step-by-step instructions, and execute shell commands (curl, jq, base64) can run it.
git clone <repo-url> && cd visual-explainer-skill
make installOr manually:
cp skill/visual-explainer.md ~/.claude/commands/visual-explainer.mdThe skill will be available immediately as /visual-explainer in any Claude Code session.
make openclaw-installOr manually:
mkdir -p ~/clawd/skills/visual-explainer
cp skill/visual-explainer.md ~/clawd/skills/visual-explainer/SKILL.md| Target | Description |
|---|---|
| Claude Code | |
make install |
Install to ~/.claude/commands/ |
make uninstall |
Remove from ~/.claude/commands/ |
| OpenClaw | |
make openclaw-install |
Install to ~/clawd/skills/ |
make openclaw-uninstall |
Remove from ~/clawd/skills/ |
make openclaw-check |
Check install status |
| General | |
make info |
Show skill name, version, author, and available styles |
make version |
Print the current version |
make check |
Verify prerequisites (jq, skill files, OPENAI_API_KEY) |
/visual-explainer [--style S] [--draw-level L] [--complexity C] [--size WxH] [--mode M] [--output DIR] [--prefix NAME] <content>
# Default whiteboard style
/visual-explainer How DNS resolution works
# Professional infographic
/visual-explainer --style infographic The foundations of machine learning
# Rough sketch feel
/visual-explainer --draw-level sketch How Git branching works
# Detailed technical diagram
/visual-explainer --style diagram --complexity detailed Kubernetes pod networking
# Multi-frame progressive build-up
/visual-explainer --mode multi-frame The OAuth2 authorization code flow
# Custom output location
/visual-explainer --output ./docs/images --prefix arch-overview System architecture of a microservices app
# Colorful radial mind map
/visual-explainer --style mindmap The principles of object-oriented programming
# Clean, data-oriented XMind-style mind map
/visual-explainer --style mindmap-structured Project management methodologies
# UI wireframe mockup (mobile, polished by default)
/visual-explainer --style mockup A mobile app login screen with email, password, social login, and forgot password
# Desktop web app wireframe
/visual-explainer --style mockup --device desktop An admin dashboard with sidebar nav, stats cards, charts, and data table
# Hand-drawn wireframe for brainstorming
/visual-explainer --style mockup --draw-level sketch A settings page with profile photo, name fields, toggles, and save button
# Use Gemini instead of OpenAI
/visual-explainer --backend gemini How the water cycle worksAny Mermaid diagram can be transformed into any visual style. The skill parses nodes, edges, subgraphs, and labels to build a detailed visual prompt.
# Inline Mermaid — paste or type the diagram as the content
/visual-explainer --style infographic --from mermaid flowchart TD; A[Start] --> B{Decision}; B -->|Yes| C[Do Thing]; B -->|No| D[Other Thing]
# From a .mmd file
/visual-explainer --style whiteboard --from mermaid-file docs/architecture.mmd
# From a markdown file containing a mermaid code block
/visual-explainer --style presentation --from mermaid-file docs/sequence-diagram.md
# Auto-detect — if the content looks like Mermaid, it's parsed automatically
/visual-explainer --style diagram sequenceDiagram; participant A as Client; participant B as Server; A->>B: Request; B-->>A: ResponseThe skill works great when pointed at existing files. You can ask it to read a document, summarize the key concepts, and generate a visual from it.
Generate directly from a file:
Read docs/architecture.md and then /visual-explainer --style diagram the system architecture described in that document
Summarize first, then visualize:
Read docs/api-spec.md, summarize the key endpoints, request/response flows, and auth
mechanisms, then /visual-explainer --style infographic the summary
Visualize a README or spec:
Review the PRD at docs/product-requirements.md and /visual-explainer --style presentation
a one-slide executive summary of the product vision, key features, and target users
Turn meeting notes into a whiteboard:
Read notes/2024-03-15-retro.md and /visual-explainer --draw-level sketch
a whiteboard summary of the key takeaways, action items, and themes
Compare concepts from a doc:
Read docs/database-comparison.md and /visual-explainer --style infographic --complexity detailed
a comparison of the database options with pros, cons, and recommendations
Multi-frame walkthrough of a complex doc:
Read docs/deployment-guide.md and /visual-explainer --mode multi-frame --style whiteboard
the deployment process as a step-by-step walkthrough
Visualize code architecture:
Review the src/ directory structure and key modules, then /visual-explainer --style diagram
--complexity detailed the codebase architecture showing module dependencies and data flow
| Option | Values | Default | Description |
|---|---|---|---|
--style |
whiteboard, infographic, presentation, diagram, mindmap, mindmap-structured, mockup |
whiteboard |
Visual style |
--device |
mobile, desktop, tablet |
mobile |
Device frame for mockup style |
--draw-level |
sketch, normal, polished |
normal |
Hand-drawn roughness vs clean precision |
--complexity |
simple, moderate, detailed |
moderate |
Number of concepts (3-4, 5-7, or 8-12) |
--size |
1024x1024, 1536x1024, 1024x1536 |
Style-dependent | Image dimensions |
--mode |
single, multi-frame |
single |
One image or a progressive series |
--from |
mermaid, mermaid-file PATH |
(none) | Parse Mermaid input (inline or from a file) |
--backend |
openai, gemini |
Auto-detected | Image generation backend. Auto-detects based on available API keys. |
--output |
Directory path | ./ |
Where to save generated images |
--prefix |
String | visual-explainer |
Filename prefix |
| Style | Default Size | Orientation |
|---|---|---|
| Whiteboard | 1536x1024 | Landscape |
| Infographic | 1024x1536 | Portrait |
| Presentation | 1536x1024 | Landscape |
| Diagram | 1024x1024 | Square |
| Mind Map | 1536x1024 | Landscape |
| Mind Map (Structured) | 1536x1024 | Landscape |
| Mockup (mobile/tablet) | 1024x1536 | Portrait |
| Mockup (desktop) | 1536x1024 | Landscape |
- Backend detection — Auto-detects available API keys (OpenAI or Gemini) and reports which backend will be used
- Content analysis — The skill deeply analyzes your input to extract core concepts, sub-topics, relationships, visual metaphors, and an optimal layout strategy
- Prompt construction — A detailed 400-800 word prompt is built using style-specific templates that specify exact spatial positions, icons, colors, typography, connections, and decorative elements
- Image generation — The prompt is sent to OpenAI gpt-image-1.5 or Gemini Nano Banana 2
- Structured output — A text summary of sections, relationships, and backend used is provided alongside the image
| Size | Estimated Cost |
|---|---|
| 1024x1024 | ~$0.19 |
| 1536x1024 / 1024x1536 | ~$0.29 |
Free tier available. Check current pricing at aistudio.google.com.
Multi-frame mode generates multiple images (3-5), so costs multiply accordingly.
- Text-heavy content works best with
infographicstyle - Process/flow content works best with
diagramstyle - Engaging/fun explanations work best with
whiteboardstyle - Hierarchical/categorical content works best with
mindmap(colorful) ormindmap-structured(data-oriented) - Use
mindmapwhen the audience values visual appeal and creativity - Use
mindmap-structuredfor board presentations, strategy docs, or data-heavy taxonomies - UI wireframes and screen layouts work best with
mockupstyle — use--deviceto match the target platform - Use
mockup --draw-level sketchfor early brainstorming,--draw-level polishedfor stakeholder-ready wireframes - Use
--draw-level sketchfor a casual, brainstormy feel - Use
--draw-level polishedfor clean hand-lettering on whiteboard style - Use
--complexity detailedwhen you need comprehensive coverage - If results feel too sparse, try increasing complexity; if too cluttered, decrease it
| Version | Date | Description |
|---|---|---|
| 1.3.0 | 2026-04-02 | Mockup/wireframe style with device frames |
| 1.2.0 | 2026-04-02 | Gemini/Nano Banana 2 backend support |
| 1.1.0 | 2026-04-01 | Mermaid diagram conversion support |
| 1.0.0 | 2026-04-01 | Initial release |
- New
mockupstyle for generating UI wireframes and screen mockups --deviceflag to select device frame:mobile(phone),desktop(browser window),tablet(iPad-style)- Three fidelity levels via
--draw-level: sketch (hand-drawn), normal (mid-fi), polished (Figma-quality) - Comprehensive prompt template with support for navigation, input fields, buttons, cards, tables, charts, and all standard UI components
- Annotation support for wireframe callouts and specifications
- Ideal for rapid wireframing from PRDs, brainstorming UI layouts, and visualizing modernized interfaces
--backendflag to choose betweenopenai(gpt-image-1.5) andgemini(Nano Banana 2)- Auto-detection: uses whichever API key is available; defaults to OpenAI if both are set
- Backend reported before generation and in structured output summary
- Gemini API integration via
generativelanguage.googleapis.com - Size handling adapted for Gemini (dimensions included in prompt text)
- Updated prerequisites to support either
OPENAI_API_KEYorGEMINI_API_KEY
--from mermaidflag for inline Mermaid input--from mermaid-file PATHfor reading.mmdor.mdfiles- Auto-detection of Mermaid syntax in content
- Full parsing of all Mermaid diagram types: flowchart, sequence, class, state, ER, gantt, pie, mindmap, timeline
- Extracts nodes, edges, subgraphs, participants, attributes, and labels for precise prompt construction
- Any Mermaid diagram type can be rendered in any visual style
- 6 visual styles at launch: whiteboard, infographic, presentation, diagram, mindmap, mindmap-structured
--draw-levelparameter (sketch, normal, polished) for hand-drawn vs professional spectrum--complexityparameter (simple, moderate, detailed) for content density control--mode multi-framefor progressive build-up explanations- Deep content analysis pipeline with concept extraction, visual metaphors, and layout strategy
- Style-specific prompt templates (400-800 words) for each visual style
- Integration with OpenAI gpt-image-1.5 via generate-images skill
- YAML frontmatter with official Claude Code skill metadata
- Makefile with install, uninstall, version management, and release targets
- 8 example images across all styles
MIT — see LICENSE for details.














