Chapter 23 - Controlling the Keyboard and Mouse (JavaScript)
Here's a JavaScript-flavoured version of the same concepts, with small JS/Node examples for each idea.
GUI automation in JavaScript
GUI automation from Node is less mature than Python's PyAutoGUI ecosystem. The main options:
- robotjs – native addon for mouse/keyboard control and screen reading. Simple API but limited maintenance.
- nut.js (
@nut-tree/nut-js) – actively maintained, cross-platform, supports image recognition. Recommended. - Playwright / Puppeteer – browser-specific automation (not general desktop GUI).
npm install @nut-tree/nut-js
On macOS, you must grant accessibility permission to your terminal app (System Preferences → Privacy & Security → Accessibility), same as with PyAutoGUI.
Safety and stopping runaway scripts
nut.js doesn't have a built-in corner fail-safe like PyAutoGUI. You can implement one manually:
const { mouse, screen } = require("@nut-tree/nut-js");
async function checkFailSafe() {
const pos = await mouse.getPosition();
const { width, height } = await screen.width().then(async (w) => ({
width: w,
height: await screen.height(),
}));
if (
(pos.x <= 0 && pos.y <= 0) ||
(pos.x >= width - 1 && pos.y <= 0) ||
(pos.x <= 0 && pos.y >= height - 1) ||
(pos.x >= width - 1 && pos.y >= height - 1)
) {
throw new Error("Fail-safe triggered: mouse in corner");
}
}
Or simply use CTRL-C in the terminal to kill the script.
Mouse control
Coordinates and screen size
Same coordinate system: (0, 0) is top-left, x grows right, y grows down.
const { mouse, screen } = require("@nut-tree/nut-js");
const width = await screen.width();
const height = await screen.height();
console.log(width, height); // e.g. 1920, 1080
Moving the mouse
const { mouse, straightTo, centerOf, Point } = require("@nut-tree/nut-js");
// Move to absolute position
await mouse.setPosition(new Point(100, 200));
// Move with animation (nut.js moves smoothly by default)
mouse.config.autoDelayMs = 250;
await mouse.move(straightTo(new Point(200, 200)));
Example square path:
const { mouse, straightTo, Point } = require("@nut-tree/nut-js");
mouse.config.autoDelayMs = 250;
for (let i = 0; i < 10; i++) {
await mouse.move(straightTo(new Point(100, 100)));
await mouse.move(straightTo(new Point(200, 100)));
await mouse.move(straightTo(new Point(200, 200)));
await mouse.move(straightTo(new Point(100, 200)));
}
Getting current position
const pos = await mouse.getPosition();
console.log(pos.x, pos.y);
Clicking, dragging, scrolling
const { mouse, Button, Point, straightTo } = require("@nut-tree/nut-js");
// Click at current position
await mouse.click(Button.LEFT);
// Move then click
await mouse.setPosition(new Point(500, 300));
await mouse.click(Button.LEFT);
// Right click, double click
await mouse.click(Button.RIGHT);
await mouse.doubleClick(Button.LEFT);
// Press/release (mouse down/up)
await mouse.pressButton(Button.LEFT);
await mouse.releaseButton(Button.LEFT);
// Drag (press, move, release)
await mouse.pressButton(Button.LEFT);
await mouse.move(straightTo(new Point(600, 400)));
await mouse.releaseButton(Button.LEFT);
// Scroll
await mouse.scrollDown(5);
await mouse.scrollUp(5);
Planning mouse movements
nut.js doesn't have a built-in MouseInfo GUI. Alternatives:
- Use the Digital Color Meter app on macOS (built-in) for pixel colors.
- Write a quick helper script that logs position on keypress:
const { mouse } = require("@nut-tree/nut-js");
setInterval(async () => {
const pos = await mouse.getPosition();
console.log(`x: ${pos.x}, y: ${pos.y}`);
}, 1000);
Or use Python's pyautogui.mouseInfo() as a standalone tool — it works regardless of what language your main script uses.
Screenshots and pixel checks
const { screen, Region, Point } = require("@nut-tree/nut-js");
// Take a screenshot (saves to file)
await screen.capture("screenshot.png");
// Get pixel color at a point
const color = await screen.colorAt(new Point(500, 300));
console.log(color); // RGBA object
For more advanced screenshot manipulation, combine with sharp:
const sharp = require("sharp");
const { screen } = require("@nut-tree/nut-js");
// Capture and process
const imagePath = await screen.capture("temp_screenshot.png");
const meta = await sharp(imagePath).metadata();
console.log(meta.width, meta.height);
Image recognition
nut.js supports on-screen image search:
const { screen, imageResource, mouse, centerOf } = require("@nut-tree/nut-js");
// Find an image on screen
try {
const region = await screen.find(imageResource("submit.png"));
console.log(region); // { left, top, width, height }
// Click the center of the found region
await mouse.move(straightTo(centerOf(region)));
await mouse.click(Button.LEFT);
} catch (e) {
console.log("Image not found on screen");
}
Find all matches:
const regions = await screen.findAll(imageResource("submit.png"));
for (const region of regions) {
console.log(region);
}
nut.js uses template matching and supports a confidence threshold:
screen.config.confidence = 0.9; // 0-1, default ~0.99
Lowering confidence allows fuzzy matching (unlike PyAutoGUI's pixel-perfect requirement).
Window control
nut.js provides cross-platform window management (not Windows-only like PyAutoGUI):
const { getWindows, getActiveWindow } = require("@nut-tree/nut-js");
// Get active window
const win = await getActiveWindow();
console.log(await win.title);
console.log(await win.region); // { left, top, width, height }
Finding and manipulating windows
const { getWindows, getActiveWindow } = require("@nut-tree/nut-js");
// Get all windows
const windows = await getWindows();
for (const win of windows) {
console.log(await win.title);
}
For more advanced window control on specific platforms, use platform-specific tools:
- macOS:
osascript(AppleScript) viachild_process - Windows: PowerShell commands via
child_process - Linux:
wmctrlorxdotoolviachild_process
Example with AppleScript on macOS:
const { execSync } = require("child_process");
// Get frontmost app name
const app = execSync(
'osascript -e \'tell application "System Events" to get name of first application process whose frontmost is true\''
).toString().trim();
console.log(app);
// Resize a window
execSync(`osascript -e 'tell application "System Events"
tell process "${app}"
set size of window 1 to {1000, 600}
set position of window 1 to {100, 100}
end tell
end tell'`);
Keyboard control
Typing text
const { keyboard, Key } = require("@nut-tree/nut-js");
// Type a string
await keyboard.type("Hello, world!");
// With delay between keystrokes
keyboard.config.autoDelayMs = 100;
await keyboard.type("Hello, world!");
Individual keys and sequences
const { keyboard, Key } = require("@nut-tree/nut-js");
// Press a single key
await keyboard.pressKey(Key.Enter);
await keyboard.releaseKey(Key.Enter);
// Shorthand: type individual keys
await keyboard.type(Key.Tab);
await keyboard.type(Key.Escape);
Hotkeys (key combinations)
const { keyboard, Key } = require("@nut-tree/nut-js");
// CTRL+C (copy)
await keyboard.pressKey(Key.LeftControl, Key.C);
await keyboard.releaseKey(Key.LeftControl, Key.C);
// CTRL+S (save)
await keyboard.pressKey(Key.LeftControl, Key.S);
await keyboard.releaseKey(Key.LeftControl, Key.S);
// macOS: Command+C
await keyboard.pressKey(Key.LeftSuper, Key.C);
await keyboard.releaseKey(Key.LeftSuper, Key.C);
Key reference
Common nut.js key names: Key.Enter, Key.Escape, Key.Tab, Key.Backspace, Key.Delete, Key.Space, Key.Up, Key.Down, Key.Left, Key.Right, Key.Home, Key.End, Key.PageUp, Key.PageDown, Key.F1–Key.F12, Key.LeftShift, Key.LeftControl, Key.LeftAlt, Key.LeftSuper (Win/Cmd).
Alternative: Playwright for browser automation
If your automation target is a web browser (not a desktop app), Playwright is far more reliable:
npm install playwright
const { chromium } = require("playwright");
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto("https://example.com");
// Click a button
await page.click('button#submit');
// Type in an input
await page.fill('input[name="email"]', 'user@example.com');
// Take a screenshot
await page.screenshot({ path: "screenshot.png" });
await browser.close();
Playwright uses the browser's internal APIs (no pixel matching needed), making it much more reliable than screen-based GUI automation for web tasks.
Ethics and CAPTCHAs
Same ethical considerations apply: CAPTCHAs exist to prevent automated abuse. Use automation tools responsibly and only on systems you have permission to control.
Overall idea of the chapter
Chapter 23 in JavaScript: desktop GUI automation with @nut-tree/nut-js (mouse move/click/drag/scroll, keyboard type/press/hotkey, screenshot capture, image recognition with confidence threshold, cross-platform window info). For browser-specific automation, Playwright/Puppeteer are far more reliable than screen-based approaches. Always implement fail-safes and use automation ethically.