Skip to main content

Chapter 23 - Controlling the Keyboard and Mouse (Python)

Here's a concise but thorough summary of Chapter 23, covering all major sections and main ideas.


GUI automation with PyAutoGUI

  • GUI automation means controlling the mouse and keyboard programmatically so your script can do anything you could do manually in GUI apps (click, type, drag, scroll, etc.).
  • PyAutoGUI is the main library used; it simulates user input and can also inspect the screen (screenshots, pixel colors, image recognition) and manage windows on Windows.
  • Do not name your script pyautogui.py, or imports will break.

On macOS you must grant "accessibility" permission to Mu/IDLE/Terminal (System Preferences → Accessibility → "Allow the apps below to control your computer") or PyAutoGUI will not actually control the mouse/keyboard.

Safety and stopping runaway scripts

  • PyAutoGUI has a fail-safe: quickly move mouse to any corner; after each call PyAutoGUI pauses 0.1s and then checks if the cursor is at a corner, raising pyautogui.FailSafeException if so.
  • You can adjust this delay via pyautogui.PAUSE (seconds).
  • As a last resort, log out (Win/Linux: CTRL-ALT-DEL; macOS: Command-Shift-Q), sacrificing unsaved work but killing the script.

Mouse control

Coordinates and screen size

  • Origin (0, 0) is top-left of screen; x increases right, y increases down.
  • pyautogui.size()Size(width, height) named tuple (e.g. (1920, 1080)), accessible via indices or attributes (.width, .height).

Moving the mouse

  • pyautogui.moveTo(x, y, duration=0) moves cursor to absolute coordinates.
  • pyautogui.move(dx, dy, duration=0) moves relative to current position (negative for left/up).

Example square path:

import pyautogui

for i in range(10):
pyautogui.moveTo(100, 100, duration=0.25)
pyautogui.moveTo(200, 100, duration=0.25)
pyautogui.moveTo(200, 200, duration=0.25)
pyautogui.moveTo(100, 200, duration=0.25)

Getting current position

  • pyautogui.position()Point(x, y) named tuple; can use p[0]/p[1] or p.x/p.y.
p = pyautogui.position()
print(p.x, p.y)

Clicking, dragging, scrolling

  • pyautogui.click(x, y, button='left') – move (if coords given) and click.
  • pyautogui.mouseDown(...) / pyautogui.mouseUp(...) – press/release only.
  • Shortcuts: pyautogui.doubleClick, pyautogui.rightClick, pyautogui.middleClick.

Dragging:

  • pyautogui.dragTo(x, y, duration=...) – drag to absolute position.
  • pyautogui.drag(dx, dy, duration=...) – drag relative.

Example spiral drawing (spiralDraw.py):

  • Sleeps 5s so you can focus Paint canvas, clicks to activate, then repeatedly drags right, down, left, up with decreasing distance to draw a square spiral.

Scrolling:

  • pyautogui.scroll(amount) at current cursor position; positive = up, negative = down; actual distance is OS/app dependent.

Planning mouse movements: MouseInfo

  • pyautogui.mouseInfo() launches the MouseInfo helper GUI, which shows current cursor coordinates and pixel color (RGB + hex) and lets you copy/log them.
  • Use F-keys or Copy/Log buttons to capture positions; disable "3 Sec. Button Delay" or use it depending on whether you want a short delay.
  • Use these recorded coordinates in your automation scripts.

Screenshots and pixel checks

  • pyautogui.screenshot() returns a Pillow Image of the full screen; you can then use standard Pillow operations.
  • pyautogui.pixel(x, y) returns an (R, G, B) tuple for that screen coordinate.
  • pyautogui.pixelMatchesColor(x, y, (r, g, b))True if exact match, otherwise False.

This lets scripts verify that the UI is in the expected state (button color, etc.) before clicking, avoiding "blind" actions.


Image recognition

  • pyautogui.locateOnScreen('image.png') searches the current screen for the given image and returns a Box(left, top, width, height) named tuple for the first match; raises ImageNotFoundException if not found.
  • Matching must be pixel-perfect; scaling or one-pixel differences will break it.

Find all matches:

boxes = list(pyautogui.locateAllOnScreen('submit.png'))
# [(x1, y1, w1, h1), (x2, y2, w2, h2), ...]

You can pass the box directly to click:

pyautogui.click('submit.png')  # shortcut: find, then click center

Always wrap locateOnScreen in try/except to handle not found:

try:
loc = pyautogui.locateOnScreen('submit.png')
except pyautogui.ImageNotFoundException:
print('Image could not be found.')

Window control (Windows only)

Window features use PyGetWindow under the hood and work only on Windows as of PyAutoGUI 1.0.0.

Getting window info

  • pyautogui.getActiveWindow()Window object with attributes describing position, size, and title.

Key attributes (mostly named tuples):

  • left, right, top, bottom – integer edges.
  • topleft, topright, bottomleft, bottomrightPoint(x, y).
  • midleft, midright, midtop, midbottom – midpoints.
  • width, height; size (Size(width, height)); area.
  • center, centerx, centery.
  • boxBox(left, top, width, height).
  • title – window title string.

Example:

import pyautogui

win = pyautogui.getActiveWindow()
print(win.title, win.size, win.topleft)
pyautogui.click(win.left + 10, win.top + 20)

Finding and manipulating windows

Finding:

  • pyautogui.getAllWindows() – all visible windows.
  • pyautogui.getWindowsAt(x, y) – windows under a point.
  • pyautogui.getWindowsWithTitle('Mu') – windows whose title contains string.
  • pyautogui.getAllTitles() – list of titles.

Manipulating via attributes and methods:

win.width = 1000
win.topleft = (800, 400)
win.maximize()
win.minimize()
win.restore()
win.activate() # bring to foreground
win.close() # close window (may bypass "Save?" dialogs)

Boolean properties: isMaximized, isMinimized, isActive.


Keyboard control

Typing text and sequences

  • pyautogui.write('Hello, world!') types that string in the active window; optionally pyautogui.write('Hello, world!', interval=0.25) to add delay between characters.
  • Combined with click to ensure focus:
pyautogui.click(100, 200); pyautogui.write('Hello, world!')
  • For non-character keys (arrows, Enter, etc.), pass a list:
pyautogui.write(['a', 'b', 'left', 'left', 'X', 'Y'])
# results in "XYab"

Key name strings include letters, digits, punctuation plus: 'enter'/'return', 'esc', 'shiftleft', 'shiftright', 'altleft', 'ctrlleft', 'tab', 'backspace', 'delete', 'pageup', 'pagedown', 'home', 'end', 'up', 'down', 'left', 'right', 'f1''f12', 'capslock', 'numlock', 'scrolllock', 'winleft', 'command', 'option', etc., and are all listed in pyautogui.KEYBOARD_KEYS.

Key down/up and hotkeys

  • pyautogui.keyDown('shift') / pyautogui.keyUp('shift') – hold and release a key.
  • pyautogui.press('enter') – press and release a single key (wrapper).
  • pyautogui.hotkey('ctrl', 'c') – press a combination (CTRL-C): presses in order, then releases in reverse.

These let you script shortcuts like CTRL-S, CTRL-A, etc.


Ethics and CAPTCHAs

  • CAPTCHAs exist specifically to stop automated scripts from abusing sign-ups, posting spam, or brute-forcing logins.
  • The chapter stresses that the responsibility for using these powerful automation tools ethically lies with you: using unlocked systems to cause harm or gain unfair advantage is not "clever"; it's unethical.

Overall idea of the chapter

Chapter 23 covers GUI automation with PyAutoGUI: mouse control (moveTo/move, click/drag/scroll), screen inspection (screenshots, pixel checks, image recognition with locateOnScreen), window management (Windows only — position, resize, minimize/maximize/activate), keyboard control (write, press, keyDown/keyUp, hotkey), and the MouseInfo helper for planning coordinates. Always use the fail-safe (mouse to corner) and code ethically.