Skip to content

felenko/WindowsOSMCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WindowsOSMCP

MCP server for Windows desktop automation: screenshots, window management, mouse, and keyboard. Built for Cursor and other MCP clients.

MCP Desktop Vision Server — AI can see your screen

Platform: Windows 10/11 only.

Tools

Screen capture

Tool Description
screenshot Primary monitor, any specific monitor (by index), or a window (by partial title)
screenshot_monitor Screenshot a specific monitor by index — pre-scaled to logical resolution
screenshot_all_monitors Screenshot every connected monitor at once
screenshot_region Capture a sub-rect at full physical resolution — preserves fine detail that full-screen captures lose

All monitor screenshots are automatically downscaled to logical resolution when DPI > 100%, so image pixel coordinates map directly to mouse_click coordinates with no math required. Every response includes an explicit IMAGE->CLICK formula.

screenshot, screenshot_monitor, and screenshot_region all accept a grid=True parameter that overlays gridlines every 100 logical pixels with labeled screen coordinates baked into the image — so click targets can be read directly off the labels regardless of how much the client shrinks the preview.

UI Automation (accessibility)

Tool Description
find_element Find controls by automation_id, name, or control_type — returns screen rects ready for mouse_click
click_element Click the first matching control by automation_id, name, or control_type

These tools query the accessibility tree rather than pixel coordinates, making them immune to DPI scaling and preview shrinkage. Use them instead of eyeballing screenshots whenever the app exposes UIA (WinForms, WPF, and most native Windows apps do).

# Example: click the OK button in any dialog
click_element("My App", name="OK", control_type="Button")

# Find all text fields in a window
find_element("Settings", control_type="Edit")

Common control_type values: Button, Edit, Text, CheckBox, ComboBox, List, ListItem, Tree, TreeItem, Menu, MenuItem, ToolBar, TabControl, TabItem, RadioButton, Slider, Window, Pane.

Multi-monitor layout

Tool Description
list_monitors All monitors: position, logical/physical size, DPI, scale factor, primary flag
image_to_screen_coords Convert image pixel (x, y) from a monitor screenshot to screen click coordinates

Use list_monitors to understand the virtual desktop layout. Monitors positioned to the left of the primary have negative x coordinatesmouse_click accepts them just fine.

Window management

Tool Description
list_windows Visible windows with title, HWND, rect, and monitor_index
focus_window Bring a window to the foreground
get_window_rect Window position/size, center point, and monitor_index

Mouse

Tool Description
get_cursor_position Current mouse coordinates
mouse_move Move cursor
mouse_click Click (single/double, any button)
mouse_drag Click-drag between points
mouse_scroll Scroll wheel at a position — vertical (default) or horizontal

Keyboard

Tool Description
keyboard_type Type text (clipboard paste — supports Unicode)
keyboard_type_ascii Type ASCII text via individual keystrokes
key_press Keys and shortcuts (enter, ctrl+c, win+d, alt+f4, etc.)

Coordinate system

All coordinates are in the virtual desktop space — the same space mouse_click and mouse_move use.

  • Primary monitor typically starts at (0, 0).
  • Monitors to the left or above the primary have negative offsets (e.g. left=-3840).
  • list_monitors shows each monitor's left and top offset.
  • screenshot_monitor(n) returns a pre-scaled image where (image_x, image_y) maps to screen (monitor.left + image_x, monitor.top + image_y).
  • image_to_screen_coords(image_x, image_y, monitor_index) does the conversion explicitly.

DPI scaling: On displays running at 125%, 150%, or 200% DPI, screenshots are downscaled to the logical pixel resolution so there is no scale factor to apply manually.

Requirements

  • Python 3.10+
  • Windows 10/11

Install

git clone https://github.com/felenko/WindowsOSMCP.git
cd WindowsOSMCP
python -m pip install -r requirements.txt

Cursor setup

Add to your user MCP config (%USERPROFILE%\.cursor\mcp.json) or project .cursor/mcp.json:

{
  "mcpServers": {
    "windows-os": {
      "command": "C:\\Python313\\python.exe",
      "args": ["C:\\Source\\Claude\\WindowsOSMCP\\server.py"]
    }
  }
}

Adjust command and the path in args to match your machine. See mcp.json.example.

Restart Cursor (or reload MCP servers) after changing the config.

Usage tips

  1. Prefer UIA over pixel clicks — Use find_element / click_element first. Targeting by automation_id or name is reliable at any DPI and survives window moves. Fall back to pixel clicks only when a control isn't in the accessibility tree.
  2. Can't tell where to click? — Pass grid=True to screenshot or screenshot_monitor. Gridlines every 100 px are labeled with actual screen coordinates so you can read the target directly off the image.
  3. Fine detail lost in preview — Use screenshot_region(left, top, width, height) to zoom into a small area at full physical resolution instead of squinting at a shrunken full-screen capture.
  4. Finding things across monitors — Call screenshot_all_monitors to see every screen at once, or list_windows to see which monitor each window is on before deciding where to look.
  5. Clicking on a secondary monitor — Take the screenshot with screenshot_monitor(n), find the pixel coordinates of your target, then either call image_to_screen_coords or compute screen_x = monitor.left + image_x directly.
  6. Modal dialogs — After clicks, run list_windows and look for small windows (Feedback, Validation, About, etc.). Screenshot the dialog title, not only the main app — modals disable the parent window.
  7. Failsafe — Moving the mouse to the top-left corner aborts PyAutoGUI (FAILSAFE).

Run manually

python server.py

The server speaks MCP over stdio (used automatically by Cursor).

License

MIT — see LICENSE.

About

MCP server for Windows desktop automation (screenshots, mouse, keyboard)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages