Chapter 23 - Controlling the Keyboard and Mouse (JavaScript)

Here's a JavaScript-flavoured version of the same concepts, with small JS/Node examples for each idea.

GUI automation in JavaScript

GUI automation from Node is less mature than Python's PyAutoGUI ecosystem. The main options:

robotjs – native addon for mouse/keyboard control and screen reading. Simple API but limited maintenance.
nut.js (@nut-tree/nut-js) – actively maintained, cross-platform, supports image recognition. Recommended.
Playwright / Puppeteer – browser-specific automation (not general desktop GUI).

npm install @nut-tree/nut-js

On macOS, you must grant accessibility permission to your terminal app (System Preferences → Privacy & Security → Accessibility), same as with PyAutoGUI.

Safety and stopping runaway scripts

nut.js doesn't have a built-in corner fail-safe like PyAutoGUI. You can implement one manually:

const { mouse, screen } = require("@nut-tree/nut-js");

async function checkFailSafe() {
  const pos = await mouse.getPosition();
  const { width, height } = await screen.width().then(async (w) => ({
    width: w,
    height: await screen.height(),
  }));

  if (
    (pos.x <= 0 && pos.y <= 0) ||
    (pos.x >= width - 1 && pos.y <= 0) ||
    (pos.x <= 0 && pos.y >= height - 1) ||
    (pos.x >= width - 1 && pos.y >= height - 1)
  ) {
    throw new Error("Fail-safe triggered: mouse in corner");
  }
}

Or simply use CTRL-C in the terminal to kill the script.

Mouse control

Coordinates and screen size

Same coordinate system: (0, 0) is top-left, x grows right, y grows down.

const { mouse, screen } = require("@nut-tree/nut-js");

const width = await screen.width();
const height = await screen.height();
console.log(width, height); // e.g. 1920, 1080

Moving the mouse

const { mouse, straightTo, centerOf, Point } = require("@nut-tree/nut-js");

// Move to absolute position
await mouse.setPosition(new Point(100, 200));

// Move with animation (nut.js moves smoothly by default)
mouse.config.autoDelayMs = 250;
await mouse.move(straightTo(new Point(200, 200)));

Example square path:

const { mouse, straightTo, Point } = require("@nut-tree/nut-js");

mouse.config.autoDelayMs = 250;

for (let i = 0; i < 10; i++) {
  await mouse.move(straightTo(new Point(100, 100)));
  await mouse.move(straightTo(new Point(200, 100)));
  await mouse.move(straightTo(new Point(200, 200)));
  await mouse.move(straightTo(new Point(100, 200)));
}

Getting current position

const pos = await mouse.getPosition();
console.log(pos.x, pos.y);

Clicking, dragging, scrolling

const { mouse, Button, Point, straightTo } = require("@nut-tree/nut-js");

// Click at current position
await mouse.click(Button.LEFT);

// Move then click
await mouse.setPosition(new Point(500, 300));
await mouse.click(Button.LEFT);

// Right click, double click
await mouse.click(Button.RIGHT);
await mouse.doubleClick(Button.LEFT);

// Press/release (mouse down/up)
await mouse.pressButton(Button.LEFT);
await mouse.releaseButton(Button.LEFT);

// Drag (press, move, release)
await mouse.pressButton(Button.LEFT);
await mouse.move(straightTo(new Point(600, 400)));
await mouse.releaseButton(Button.LEFT);

// Scroll
await mouse.scrollDown(5);
await mouse.scrollUp(5);

Planning mouse movements

nut.js doesn't have a built-in MouseInfo GUI. Alternatives:

Use the Digital Color Meter app on macOS (built-in) for pixel colors.
Write a quick helper script that logs position on keypress:

const { mouse } = require("@nut-tree/nut-js");

setInterval(async () => {
  const pos = await mouse.getPosition();
  console.log(`x: ${pos.x}, y: ${pos.y}`);
}, 1000);

Or use Python's pyautogui.mouseInfo() as a standalone tool — it works regardless of what language your main script uses.

Screenshots and pixel checks

const { screen, Region, Point } = require("@nut-tree/nut-js");

// Take a screenshot (saves to file)
await screen.capture("screenshot.png");

// Get pixel color at a point
const color = await screen.colorAt(new Point(500, 300));
console.log(color); // RGBA object

For more advanced screenshot manipulation, combine with sharp:

const sharp = require("sharp");
const { screen } = require("@nut-tree/nut-js");

// Capture and process
const imagePath = await screen.capture("temp_screenshot.png");
const meta = await sharp(imagePath).metadata();
console.log(meta.width, meta.height);

Image recognition

nut.js supports on-screen image search:

const { screen, imageResource, mouse, centerOf } = require("@nut-tree/nut-js");

// Find an image on screen
try {
  const region = await screen.find(imageResource("submit.png"));
  console.log(region); // { left, top, width, height }

  // Click the center of the found region
  await mouse.move(straightTo(centerOf(region)));
  await mouse.click(Button.LEFT);
} catch (e) {
  console.log("Image not found on screen");
}

Find all matches:

const regions = await screen.findAll(imageResource("submit.png"));
for (const region of regions) {
  console.log(region);
}

nut.js uses template matching and supports a confidence threshold:

screen.config.confidence = 0.9; // 0-1, default ~0.99

Lowering confidence allows fuzzy matching (unlike PyAutoGUI's pixel-perfect requirement).

Window control

nut.js provides cross-platform window management (not Windows-only like PyAutoGUI):

const { getWindows, getActiveWindow } = require("@nut-tree/nut-js");

// Get active window
const win = await getActiveWindow();
console.log(await win.title);
console.log(await win.region); // { left, top, width, height }

Finding and manipulating windows

const { getWindows, getActiveWindow } = require("@nut-tree/nut-js");

// Get all windows
const windows = await getWindows();
for (const win of windows) {
  console.log(await win.title);
}

For more advanced window control on specific platforms, use platform-specific tools:

macOS: osascript (AppleScript) via child_process
Windows: PowerShell commands via child_process
Linux: wmctrl or xdotool via child_process

Example with AppleScript on macOS:

const { execSync } = require("child_process");

// Get frontmost app name
const app = execSync(
  'osascript -e \'tell application "System Events" to get name of first application process whose frontmost is true\''
).toString().trim();
console.log(app);

// Resize a window
execSync(`osascript -e 'tell application "System Events"
  tell process "${app}"
    set size of window 1 to {1000, 600}
    set position of window 1 to {100, 100}
  end tell
end tell'`);

Keyboard control

Typing text

const { keyboard, Key } = require("@nut-tree/nut-js");

// Type a string
await keyboard.type("Hello, world!");

// With delay between keystrokes
keyboard.config.autoDelayMs = 100;
await keyboard.type("Hello, world!");

Individual keys and sequences

const { keyboard, Key } = require("@nut-tree/nut-js");

// Press a single key
await keyboard.pressKey(Key.Enter);
await keyboard.releaseKey(Key.Enter);

// Shorthand: type individual keys
await keyboard.type(Key.Tab);
await keyboard.type(Key.Escape);

Hotkeys (key combinations)

const { keyboard, Key } = require("@nut-tree/nut-js");

// CTRL+C (copy)
await keyboard.pressKey(Key.LeftControl, Key.C);
await keyboard.releaseKey(Key.LeftControl, Key.C);

// CTRL+S (save)
await keyboard.pressKey(Key.LeftControl, Key.S);
await keyboard.releaseKey(Key.LeftControl, Key.S);

// macOS: Command+C
await keyboard.pressKey(Key.LeftSuper, Key.C);
await keyboard.releaseKey(Key.LeftSuper, Key.C);

Key reference

Common nut.js key names: Key.Enter, Key.Escape, Key.Tab, Key.Backspace, Key.Delete, Key.Space, Key.Up, Key.Down, Key.Left, Key.Right, Key.Home, Key.End, Key.PageUp, Key.PageDown, Key.F1–Key.F12, Key.LeftShift, Key.LeftControl, Key.LeftAlt, Key.LeftSuper (Win/Cmd).

Alternative: Playwright for browser automation

If your automation target is a web browser (not a desktop app), Playwright is far more reliable:

npm install playwright

const { chromium } = require("playwright");

const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto("https://example.com");

// Click a button
await page.click('button#submit');

// Type in an input
await page.fill('input[name="email"]', 'user@example.com');

// Take a screenshot
await page.screenshot({ path: "screenshot.png" });

await browser.close();

Playwright uses the browser's internal APIs (no pixel matching needed), making it much more reliable than screen-based GUI automation for web tasks.

Ethics and CAPTCHAs

Same ethical considerations apply: CAPTCHAs exist to prevent automated abuse. Use automation tools responsibly and only on systems you have permission to control.

Overall idea of the chapter

Chapter 23 in JavaScript: desktop GUI automation with @nut-tree/nut-js (mouse move/click/drag/scroll, keyboard type/press/hotkey, screenshot capture, image recognition with confidence threshold, cross-platform window info). For browser-specific automation, Playwright/Puppeteer are far more reliable than screen-based approaches. Always implement fail-safes and use automation ethically.

GUI automation in JavaScript​

Safety and stopping runaway scripts​

Mouse control​

Coordinates and screen size​

Moving the mouse​

Getting current position​

Clicking, dragging, scrolling​

Planning mouse movements​

Screenshots and pixel checks​

Image recognition​

Window control​

Finding and manipulating windows​

Keyboard control​

Typing text​

Individual keys and sequences​

Hotkeys (key combinations)​

Key reference​

Alternative: Playwright for browser automation​

Ethics and CAPTCHAs​

Overall idea of the chapter​