Skip to content

Latest commit

 

History

History
406 lines (299 loc) · 12.5 KB

File metadata and controls

406 lines (299 loc) · 12.5 KB

Binary Ninja Integration for smartdiff

Overview

smartdiff now includes comprehensive Binary Ninja integration, enabling AI agents to analyze and compare binary executables through the MCP (Model Context Protocol).

Key Features:

  • ✅ Discover binaries loaded in Binary Ninja
  • ✅ List functions in binaries
  • ✅ Decompile functions to high-level C code
  • ✅ Compare two binaries and identify matching functions
  • ✅ Analyze changes between binary versions
  • ✅ Works with Binary Ninja Personal License

Architecture

┌─────────────────────────────────────────────────────────┐
│  AI Agent (Claude Desktop)                              │
└──────────────┬──────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────┐
│  smartdiff MCP Server                                   │
│  - Source code comparison tools                         │
│  - Binary comparison tools (NEW)                        │
└──────────────┬──────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────┐
│  Binary Ninja Client Library                           │
│  - HTTP client (no Binary Ninja dependencies)          │
│  - Server discovery                                     │
│  - Function listing and decompilation                   │
└──────────────┬──────────────────────────────────────────┘
               │ (HTTP)
               ▼
┌─────────────────────────────────────────────────────────┐
│  Binary Ninja MCP Server                                │
│  - Multi-binary support (ports 9009+)                   │
│  - Decompilation API                                    │
│  - Function analysis                                    │
└──────────────┬──────────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────┐
│  Binary Ninja (GUI with Personal License)              │
└─────────────────────────────────────────────────────────┘

Components

1. Binary Ninja Client Library

Location: crates/binary-ninja-client/

Purpose: HTTP client for communicating with Binary Ninja MCP servers

Features:

  • No Binary Ninja dependencies (just HTTP/JSON)
  • Server discovery (scans ports 9009-9018)
  • Function listing and searching
  • Decompilation fetching
  • Async/await with tokio

Example:

use smart_diff_binary_ninja_client::BinaryNinjaClient;

let client = BinaryNinjaClient::new();
let servers = client.discover_servers().await?;
let functions = client.list_functions("port_9009").await?;
let code = client.decompile_function("port_9009", "main").await?;

2. Binary Function Matcher

Location: crates/diff-engine/src/binary_matcher.rs

Purpose: Match functions between two binaries using multiple strategies

Matching Strategies:

  1. Exact Name Matching - O(n) HashMap lookup for identical names
  2. Fuzzy Name Matching - Levenshtein distance for renamed functions
  3. Code Similarity - Framework ready for tree-sitter C parser integration

Example:

use smart_diff_engine::{BinaryFunctionMatcher, BinaryFunctionInfo};

let matcher = BinaryFunctionMatcher::new();
let matches = matcher.match_functions(&functions_a, &functions_b)?;

3. Binary Comparison Manager

Location: crates/mcp-server/src/comparison/binary_comparison.rs

Purpose: Store and manage binary comparison results

Features:

  • UUID-based comparison IDs
  • Match statistics and summaries
  • Added/deleted function tracking
  • Sorted match retrieval

4. MCP Tools

Location: crates/mcp-server/src/tools/binary_tools.rs

Purpose: Expose binary analysis capabilities to AI agents

Available Tools:

  1. list_binja_servers - Discover available binaries
  2. list_binary_functions - List functions in a binary
  3. decompile_binary_function - Get decompiled code
  4. compare_binaries - Compare two binaries
  5. list_binary_matches - List matched functions
  6. get_binary_function_diff - Get detailed diff

Quick Start

Prerequisites

  1. Binary Ninja (Personal License or higher)
  2. Binary Ninja MCP Plugin (from /home/matteius/codediff/binary_ninja_mcp/)
  3. smartdiff (this repository)

Setup

1. Install Binary Ninja MCP Plugin

cd /home/matteius/codediff/binary_ninja_mcp
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Copy plugin to Binary Ninja plugins folder
# Restart Binary Ninja

2. Build smartdiff

cd /home/matteius/codediff
cargo build --release -p smart-diff-mcp-server

3. Load Binaries in Binary Ninja

  1. Open Binary Ninja
  2. Load first binary (e.g., malware_v1.exe)
  3. Plugins > MCP Server > Start Server for This Binary (port 9009)
  4. Open another Binary Ninja window
  5. Load second binary (e.g., malware_v2.exe)
  6. Plugins > MCP Server > Start Server for This Binary (port 9010)

4. Configure Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "smartdiff": {
      "command": "/path/to/codediff/target/release/smart-diff-mcp",
      "args": []
    }
  }
}

5. Restart Claude Desktop

Usage Examples

Example 1: Explore a Binary

User: "I have a binary loaded in Binary Ninja. What functions does it have?"

Agent:
1. Calls list_binja_servers() → Finds binary on port_9009
2. Calls list_binary_functions(binary_id="port_9009")
3. Shows list of 150 functions
4. User can ask to decompile specific functions

Example 2: Compare Two Binaries

User: "I have two versions of a binary. What changed between them?"

Agent:
1. Calls list_binja_servers() → Finds binaries on port_9009 and port_9010
2. Calls compare_binaries(binary_a_id="port_9009", binary_b_id="port_9010")
3. Returns comparison ID and summary:
   - 142 matched functions
   - 4 added functions
   - 6 deleted functions
   - 94.5% average similarity
4. Calls list_binary_matches(comparison_id) to see most changed functions
5. Calls get_binary_function_diff(comparison_id, "process_data") for details

Example 3: Analyze Specific Function

User: "Show me the decompiled code for the main function"

Agent:
1. Calls decompile_binary_function(binary_id="port_9009", function_name="main")
2. Returns decompiled C code:
   ```c
   int64_t main(int64_t argc, char** argv)
   {
       initialize();
       process_data();
       cleanup();
       return 0;
   }

## MCP Tool Reference

### list_binja_servers

**Description**: List available Binary Ninja MCP servers

**Input**: None

**Output**: List of servers with binary IDs, filenames, ports

### list_binary_functions

**Description**: List all functions in a binary

**Input**:
- `binary_id` (required): Binary server ID
- `search` (optional): Search term to filter functions

**Output**: List of function names

### decompile_binary_function

**Description**: Decompile a specific function

**Input**:
- `binary_id` (required): Binary server ID
- `function_name` (required): Function name

**Output**: Decompiled C code

### compare_binaries

**Description**: Compare two binaries and identify matching functions

**Input**:
- `binary_a_id` (required): First binary server ID
- `binary_b_id` (required): Second binary server ID
- `use_decompiled_code` (optional): Use code similarity (default: false)
- `similarity_threshold` (optional): Minimum similarity (default: 0.7)

**Output**: Comparison ID and summary statistics

### list_binary_matches

**Description**: List matched functions from a comparison

**Input**:
- `comparison_id` (required): Comparison ID from compare_binaries
- `limit` (optional): Maximum matches to return (default: 100)
- `min_similarity` (optional): Minimum similarity filter

**Output**: List of matches sorted by similarity (most changed first)

### get_binary_function_diff

**Description**: Get detailed diff for a specific function match

**Input**:
- `comparison_id` (required): Comparison ID
- `function_name` (required): Function name

**Output**: Decompiled code from both binaries with similarity metrics

## Implementation Status

### ✅ Phase 1: Binary Ninja MCP Client Library (Complete)
- HTTP client with server discovery
- Function listing and searching
- Decompilation fetching
- No Binary Ninja dependencies
- All unit tests passing

### ✅ Phase 2: Binary Comparison MCP Tools (Complete)
- Binary tool handler
- Integration with MCP server
- Three basic tools (list servers, list functions, decompile)
- All integration tests passing

### ✅ Phase 3: Binary Function Matching Engine (Complete)
- Exact name matching
- Fuzzy name matching (Levenshtein distance)
- Code similarity framework (ready for tree-sitter)
- Comparison storage and management
- Three comparison tools (compare, list matches, get diff)
- All unit tests passing

### 🚧 Phase 4: Testing & Documentation (In Progress)
- End-to-end testing with real binaries
- Performance benchmarking
- User documentation
- API documentation

## Testing

### Unit Tests

Run all tests:
```bash
cargo test

Run specific component tests:

cargo test -p smart-diff-binary-ninja-client
cargo test -p smart-diff-engine binary_matcher
cargo test -p smart-diff-mcp-server

Integration Testing

Requires running Binary Ninja with MCP server:

  1. Load test binaries in Binary Ninja
  2. Start MCP servers
  3. Run smartdiff MCP server
  4. Test with Claude Desktop

Performance

  • Server Discovery: < 1 second (scans 10 ports)
  • Function Listing: < 1 second (typical binary with 100-200 functions)
  • Decompilation: 1-2 seconds per function
  • Binary Comparison: 2-5 seconds (100-200 functions, name matching only)

Known Limitations

  1. Code Similarity: Framework is ready but tree-sitter C parser integration is pending
  2. Parallel Processing: Not yet implemented (can be added for large binaries)
  3. Advanced Metrics: Basic similarity only (no CFG or basic block analysis)

Future Enhancements

  1. Code Similarity Matching

    • Integrate tree-sitter C parser
    • Parse decompiled code as AST
    • Apply tree edit distance algorithms
    • Reuse existing smartdiff infrastructure
  2. Advanced Binary Analysis

    • CFG (Control Flow Graph) similarity
    • Basic block analysis
    • Instruction-level comparison
    • Call graph analysis
  3. Performance Optimization

    • Parallel processing with rayon
    • Caching of decompiled code
    • Incremental comparison
  4. Visualization

    • Function match visualization
    • Diff highlighting
    • Call graph visualization

Documentation

License

AGPL-3.0-only

Contributing

Contributions are welcome! Please see the main smartdiff README for contribution guidelines.

Support

For issues or questions:

  1. Check the documentation
  2. Review existing issues
  3. Create a new issue with details

Acknowledgments

  • Binary Ninja team for the excellent reverse engineering platform
  • MCP (Model Context Protocol) for enabling AI agent integration
  • rust_diff project for inspiration on binary diffing algorithms