Chapter 23 - Controlling the Keyboard and Mouse (Python)
Here's a concise but thorough summary of Chapter 23, covering all major sections and main ideas.
GUI automation with PyAutoGUI
- GUI automation means controlling the mouse and keyboard programmatically so your script can do anything you could do manually in GUI apps (click, type, drag, scroll, etc.).
- PyAutoGUI is the main library used; it simulates user input and can also inspect the screen (screenshots, pixel colors, image recognition) and manage windows on Windows.
- Do not name your script
pyautogui.py, or imports will break.
On macOS you must grant "accessibility" permission to Mu/IDLE/Terminal (System Preferences → Accessibility → "Allow the apps below to control your computer") or PyAutoGUI will not actually control the mouse/keyboard.
Safety and stopping runaway scripts
- PyAutoGUI has a fail-safe: quickly move mouse to any corner; after each call PyAutoGUI pauses 0.1s and then checks if the cursor is at a corner, raising
pyautogui.FailSafeExceptionif so. - You can adjust this delay via
pyautogui.PAUSE(seconds). - As a last resort, log out (Win/Linux: CTRL-ALT-DEL; macOS: Command-Shift-Q), sacrificing unsaved work but killing the script.
Mouse control
Coordinates and screen size
- Origin
(0, 0)is top-left of screen; x increases right, y increases down. pyautogui.size()→Size(width, height)named tuple (e.g.(1920, 1080)), accessible via indices or attributes (.width,.height).
Moving the mouse
pyautogui.moveTo(x, y, duration=0)moves cursor to absolute coordinates.pyautogui.move(dx, dy, duration=0)moves relative to current position (negative for left/up).
Example square path:
import pyautogui
for i in range(10):
pyautogui.moveTo(100, 100, duration=0.25)
pyautogui.moveTo(200, 100, duration=0.25)
pyautogui.moveTo(200, 200, duration=0.25)
pyautogui.moveTo(100, 200, duration=0.25)
Getting current position
pyautogui.position()→Point(x, y)named tuple; can usep[0]/p[1]orp.x/p.y.
p = pyautogui.position()
print(p.x, p.y)
Clicking, dragging, scrolling
pyautogui.click(x, y, button='left')– move (if coords given) and click.pyautogui.mouseDown(...)/pyautogui.mouseUp(...)– press/release only.- Shortcuts:
pyautogui.doubleClick,pyautogui.rightClick,pyautogui.middleClick.
Dragging:
pyautogui.dragTo(x, y, duration=...)– drag to absolute position.pyautogui.drag(dx, dy, duration=...)– drag relative.
Example spiral drawing (spiralDraw.py):
- Sleeps 5s so you can focus Paint canvas, clicks to activate, then repeatedly drags right, down, left, up with decreasing distance to draw a square spiral.
Scrolling:
pyautogui.scroll(amount)at current cursor position; positive = up, negative = down; actual distance is OS/app dependent.
Planning mouse movements: MouseInfo
pyautogui.mouseInfo()launches the MouseInfo helper GUI, which shows current cursor coordinates and pixel color (RGB + hex) and lets you copy/log them.- Use F-keys or Copy/Log buttons to capture positions; disable "3 Sec. Button Delay" or use it depending on whether you want a short delay.
- Use these recorded coordinates in your automation scripts.
Screenshots and pixel checks
pyautogui.screenshot()returns a PillowImageof the full screen; you can then use standard Pillow operations.pyautogui.pixel(x, y)returns an(R, G, B)tuple for that screen coordinate.pyautogui.pixelMatchesColor(x, y, (r, g, b))→Trueif exact match, otherwiseFalse.
This lets scripts verify that the UI is in the expected state (button color, etc.) before clicking, avoiding "blind" actions.
Image recognition
pyautogui.locateOnScreen('image.png')searches the current screen for the given image and returns aBox(left, top, width, height)named tuple for the first match; raisesImageNotFoundExceptionif not found.- Matching must be pixel-perfect; scaling or one-pixel differences will break it.
Find all matches:
boxes = list(pyautogui.locateAllOnScreen('submit.png'))
# [(x1, y1, w1, h1), (x2, y2, w2, h2), ...]
You can pass the box directly to click:
pyautogui.click('submit.png') # shortcut: find, then click center
Always wrap locateOnScreen in try/except to handle not found:
try:
loc = pyautogui.locateOnScreen('submit.png')
except pyautogui.ImageNotFoundException:
print('Image could not be found.')
Window control (Windows only)
Window features use PyGetWindow under the hood and work only on Windows as of PyAutoGUI 1.0.0.
Getting window info
pyautogui.getActiveWindow()→Windowobject with attributes describing position, size, and title.
Key attributes (mostly named tuples):
left,right,top,bottom– integer edges.topleft,topright,bottomleft,bottomright–Point(x, y).midleft,midright,midtop,midbottom– midpoints.width,height;size(Size(width, height));area.center,centerx,centery.box–Box(left, top, width, height).title– window title string.
Example:
import pyautogui
win = pyautogui.getActiveWindow()
print(win.title, win.size, win.topleft)
pyautogui.click(win.left + 10, win.top + 20)
Finding and manipulating windows
Finding:
pyautogui.getAllWindows()– all visible windows.pyautogui.getWindowsAt(x, y)– windows under a point.pyautogui.getWindowsWithTitle('Mu')– windows whose title contains string.pyautogui.getAllTitles()– list of titles.
Manipulating via attributes and methods:
win.width = 1000
win.topleft = (800, 400)
win.maximize()
win.minimize()
win.restore()
win.activate() # bring to foreground
win.close() # close window (may bypass "Save?" dialogs)
Boolean properties: isMaximized, isMinimized, isActive.
Keyboard control
Typing text and sequences
pyautogui.write('Hello, world!')types that string in the active window; optionallypyautogui.write('Hello, world!', interval=0.25)to add delay between characters.- Combined with
clickto ensure focus:
pyautogui.click(100, 200); pyautogui.write('Hello, world!')
- For non-character keys (arrows, Enter, etc.), pass a list:
pyautogui.write(['a', 'b', 'left', 'left', 'X', 'Y'])
# results in "XYab"
Key name strings include letters, digits, punctuation plus: 'enter'/'return', 'esc', 'shiftleft', 'shiftright', 'altleft', 'ctrlleft', 'tab', 'backspace', 'delete', 'pageup', 'pagedown', 'home', 'end', 'up', 'down', 'left', 'right', 'f1'–'f12', 'capslock', 'numlock', 'scrolllock', 'winleft', 'command', 'option', etc., and are all listed in pyautogui.KEYBOARD_KEYS.
Key down/up and hotkeys
pyautogui.keyDown('shift')/pyautogui.keyUp('shift')– hold and release a key.pyautogui.press('enter')– press and release a single key (wrapper).pyautogui.hotkey('ctrl', 'c')– press a combination (CTRL-C): presses in order, then releases in reverse.
These let you script shortcuts like CTRL-S, CTRL-A, etc.
Ethics and CAPTCHAs
- CAPTCHAs exist specifically to stop automated scripts from abusing sign-ups, posting spam, or brute-forcing logins.
- The chapter stresses that the responsibility for using these powerful automation tools ethically lies with you: using unlocked systems to cause harm or gain unfair advantage is not "clever"; it's unethical.
Overall idea of the chapter
Chapter 23 covers GUI automation with PyAutoGUI: mouse control (moveTo/move, click/drag/scroll), screen inspection (screenshots, pixel checks, image recognition with locateOnScreen), window management (Windows only — position, resize, minimize/maximize/activate), keyboard control (write, press, keyDown/keyUp, hotkey), and the MouseInfo helper for planning coordinates. Always use the fail-safe (mouse to corner) and code ethically.