MCP server for Windows desktop automation: screenshots, window management, mouse, and keyboard. Built for Cursor and other MCP clients.
Platform: Windows 10/11 only.
| Tool | Description |
|---|---|
screenshot |
Primary monitor, any specific monitor (by index), or a window (by partial title) |
screenshot_monitor |
Screenshot a specific monitor by index — pre-scaled to logical resolution |
screenshot_all_monitors |
Screenshot every connected monitor at once |
screenshot_region |
Capture a sub-rect at full physical resolution — preserves fine detail that full-screen captures lose |
All monitor screenshots are automatically downscaled to logical resolution when DPI > 100%, so image pixel coordinates map directly to mouse_click coordinates with no math required. Every response includes an explicit IMAGE->CLICK formula.
screenshot, screenshot_monitor, and screenshot_region all accept a grid=True parameter that overlays gridlines every 100 logical pixels with labeled screen coordinates baked into the image — so click targets can be read directly off the labels regardless of how much the client shrinks the preview.
| Tool | Description |
|---|---|
find_element |
Find controls by automation_id, name, or control_type — returns screen rects ready for mouse_click |
click_element |
Click the first matching control by automation_id, name, or control_type |
These tools query the accessibility tree rather than pixel coordinates, making them immune to DPI scaling and preview shrinkage. Use them instead of eyeballing screenshots whenever the app exposes UIA (WinForms, WPF, and most native Windows apps do).
# Example: click the OK button in any dialog
click_element("My App", name="OK", control_type="Button")
# Find all text fields in a window
find_element("Settings", control_type="Edit")
Common control_type values: Button, Edit, Text, CheckBox, ComboBox, List, ListItem, Tree, TreeItem, Menu, MenuItem, ToolBar, TabControl, TabItem, RadioButton, Slider, Window, Pane.
| Tool | Description |
|---|---|
list_monitors |
All monitors: position, logical/physical size, DPI, scale factor, primary flag |
image_to_screen_coords |
Convert image pixel (x, y) from a monitor screenshot to screen click coordinates |
Use list_monitors to understand the virtual desktop layout. Monitors positioned to the left of the primary have negative x coordinates — mouse_click accepts them just fine.
| Tool | Description |
|---|---|
list_windows |
Visible windows with title, HWND, rect, and monitor_index |
focus_window |
Bring a window to the foreground |
get_window_rect |
Window position/size, center point, and monitor_index |
| Tool | Description |
|---|---|
get_cursor_position |
Current mouse coordinates |
mouse_move |
Move cursor |
mouse_click |
Click (single/double, any button) |
mouse_drag |
Click-drag between points |
mouse_scroll |
Scroll wheel at a position — vertical (default) or horizontal |
| Tool | Description |
|---|---|
keyboard_type |
Type text (clipboard paste — supports Unicode) |
keyboard_type_ascii |
Type ASCII text via individual keystrokes |
key_press |
Keys and shortcuts (enter, ctrl+c, win+d, alt+f4, etc.) |
All coordinates are in the virtual desktop space — the same space mouse_click and mouse_move use.
- Primary monitor typically starts at
(0, 0). - Monitors to the left or above the primary have negative offsets (e.g.
left=-3840). list_monitorsshows each monitor'sleftandtopoffset.screenshot_monitor(n)returns a pre-scaled image where(image_x, image_y)maps to screen(monitor.left + image_x, monitor.top + image_y).image_to_screen_coords(image_x, image_y, monitor_index)does the conversion explicitly.
DPI scaling: On displays running at 125%, 150%, or 200% DPI, screenshots are downscaled to the logical pixel resolution so there is no scale factor to apply manually.
- Python 3.10+
- Windows 10/11
git clone https://github.com/felenko/WindowsOSMCP.git
cd WindowsOSMCP
python -m pip install -r requirements.txtAdd to your user MCP config (%USERPROFILE%\.cursor\mcp.json) or project .cursor/mcp.json:
{
"mcpServers": {
"windows-os": {
"command": "C:\\Python313\\python.exe",
"args": ["C:\\Source\\Claude\\WindowsOSMCP\\server.py"]
}
}
}Adjust command and the path in args to match your machine. See mcp.json.example.
Restart Cursor (or reload MCP servers) after changing the config.
- Prefer UIA over pixel clicks — Use
find_element/click_elementfirst. Targeting byautomation_idornameis reliable at any DPI and survives window moves. Fall back to pixel clicks only when a control isn't in the accessibility tree. - Can't tell where to click? — Pass
grid=Truetoscreenshotorscreenshot_monitor. Gridlines every 100 px are labeled with actual screen coordinates so you can read the target directly off the image. - Fine detail lost in preview — Use
screenshot_region(left, top, width, height)to zoom into a small area at full physical resolution instead of squinting at a shrunken full-screen capture. - Finding things across monitors — Call
screenshot_all_monitorsto see every screen at once, orlist_windowsto see which monitor each window is on before deciding where to look. - Clicking on a secondary monitor — Take the screenshot with
screenshot_monitor(n), find the pixel coordinates of your target, then either callimage_to_screen_coordsor computescreen_x = monitor.left + image_xdirectly. - Modal dialogs — After clicks, run
list_windowsand look for small windows (Feedback,Validation,About, etc.). Screenshot the dialog title, not only the main app — modals disable the parent window. - Failsafe — Moving the mouse to the top-left corner aborts PyAutoGUI (
FAILSAFE).
python server.pyThe server speaks MCP over stdio (used automatically by Cursor).
MIT — see LICENSE.
