Chapter 9 - Text Pattern Matching with Regular Expressions (JavaScript)

Here's a JavaScript-flavoured version of the same concepts, with small JS examples for each idea.

Manually finding patterns vs regex

You can detect patterns (like US phone numbers) with plain string logic, but it's verbose and rigid, while regex does the same with a short pattern.

Example (manual phone check):

function isPhoneNumber(text) {
  if (text.length !== 12) return false;
  for (let i = 0; i < 3; i++) {
    if (!/\d/.test(text[i])) return false;
  }
  if (text[3] !== "-") return false;
  for (let i = 4; i < 7; i++) {
    if (!/\d/.test(text[i])) return false;
  }
  if (text[7] !== "-") return false;
  for (let i = 8; i < 12; i++) {
    if (!/\d/.test(text[i])) return false;
  }
  return true;
}

Regex equivalent pattern: /\d{3}-\d{3}-\d{4}/.

Basic regex workflow

Use a regex literal (/pattern/) or new RegExp(). Call .match() on a string or .test() on the regex.

Example:

const phoneRe = /\d{3}-\d{3}-\d{4}/;
const mo = "My number is 415-555-4242.".match(phoneRe);
console.log(mo[0]);    // '415-555-4242'

Groups with parentheses

Parentheses create capture groups so you can pull out parts of a match (index 1, 2, etc.).

Example:

const phoneRe = /(\d{3})-(\d{3}-\d{4})/;
const mo = "My number is 415-555-4242.".match(phoneRe);
console.log(mo[1]);      // '415'
console.log(mo[2]);      // '555-4242'

You can destructure groups into variables:

const [, area, rest] = mo;

Escaping special characters

Characters like ()[]{}.+*?^$|\ have special meanings in regex; prefix with \ to match them literally.

Example (match (415) 555-4242):

const pattern = /(\(\d{3}\)) (\d{3}-\d{4})/;
const mo = "My phone number is (415) 555-4242.".match(pattern);
console.log(mo[1]);   // '(415)'
console.log(mo[2]);   // '555-4242'

Alternation with `|`

| means "this or that", and you can combine it with groups for shared prefixes.

Example:

const petRe = /Cat(erpillar|astrophe|ch|egory)/;
const mo = "Catch me if you can.".match(petRe);
console.log(mo[0]);    // 'Catch'
console.log(mo[1]);    // 'ch'

`.match()` vs `.matchAll()` (search vs findall)

.match(regex) without the g flag returns the first match (or null).
.match(regex) with the g flag returns all matches as a flat array (no groups).
.matchAll(regex) with g returns an iterator of full match objects (with groups).

Example (no groups, g flag):

const pattern = /\d{3}-\d{3}-\d{4}/g;
console.log("Cell: 415-555-9999 Work: 212-555-0000".match(pattern));
// ['415-555-9999', '212-555-0000']

With groups (use matchAll):

const pattern2 = /(\d{3})-(\d{3})-(\d{4})/g;
for (const mo of "Cell: 415-555-9999 Work: 212-555-0000".matchAll(pattern2)) {
  console.log(mo[1], mo[2], mo[3]);
}
// '415' '555' '9999'
// '212' '555' '0000'

Character classes `[...]` and negated `[^...]`

[...] matches any one character from the set; [^...] matches any character not in the set.

Example (vowels vs non-vowels):

const vowelRe = /[aeiouAEIOU]/g;
console.log("RoboCop eats BABY FOOD.".match(vowelRe));
// ['o', 'o', 'o', 'e', 'a', 'A', 'O', 'O']

const consonantRe = /[^aeiouAEIOU]/g;
console.log("RoboCop eats BABY FOOD.".match(consonantRe));
// includes consonants, spaces, punctuation

Shorthand character classes `\d \w \s` and opposites

Same built-ins as Python:

\d digits, \D non-digits
\w letters/digits/underscore, \W non-\w
\s whitespace, \S non-whitespace

Example:

const pattern = /\d+\s\w+/g;
console.log("12 drummers, 11 pipers, 10 lords".match(pattern));
// ['12 drummers', '11 pipers', '10 lords']

Dot `.` wildcard

. matches any character except newline.

Example:

const atRe = /.at/g;
console.log("The cat in the hat sat on the flat mat.".match(atRe));
// ['cat', 'hat', 'sat', 'lat', 'mat']

To match a literal dot, use \..

Quantifiers: `?`, `*`, `+`, `\{m,n\}`

Quantifiers say how many of the preceding piece to match.

? – 0 or 1 (optional)
* – 0 or more
+ – 1 or more
\{m\} – exactly m
\{m,n\} – between m and n (inclusive), \{m,\} / \{,n\} for open-ended

Examples:

const optEx = /42!?/;           // '42' or '42!'
const starEx = /Eggs(and spam)*/;  // Eggs, Eggs and spam, ...
const plusEx = /(Ha)+/;         // 'Ha', 'HaHa', ...
const countEx = /(Ha){3,5}/;   // 3 to 5 'Ha'

Use parentheses if you want the quantifier to apply to a whole group, not just one char.

Greedy vs non-greedy (`?` after quantifier)

*, +, \{m,n\} are greedy: they match as much as possible; adding ? makes them lazy (shortest possible).

Example:

const greedy = /(Ha){3,5}/;
console.log("HaHaHaHaHa".match(greedy)[0]);   // 'HaHaHaHaHa'

const lazy = /(Ha){3,5}?/;
console.log("HaHaHaHaHa".match(lazy)[0]);     // 'HaHaHa'

Similarly, .* is greedy, .*? is lazy.

`.` and `.?` (match "anything")

.* means "any chars, 0+ times"; use in groups to capture "whatever is here".

Example (First/Last name):

const nameRe = /First Name: (.*) Last Name: (.*)/;
const mo = "First Name: Al Last Name: Sweigart".match(nameRe);
console.log(mo[1]);   // 'Al'
console.log(mo[2]);   // 'Sweigart'

Example of greedy vs lazy tags:

const lazyTag = /<.*?>/;
const greedyTag = /<.*>/;

Matching newlines with the `s` flag

Normally . does not match \n. Add the s (dotAll) flag to let . match newlines too.

Example:

const noNl = /.*/;
console.log("Line1\nLine2".match(noNl)[0]);  // 'Line1'

const withNl = /.*/s;
console.log("Line1\nLine2".match(withNl)[0]);  // whole string

Anchors: `^`, `$`, word boundaries `\b` / `\B`

^ – match at start of string
$ – match at end of string
^...$ – whole string must match
\b – word boundary; \B – not a word boundary

Examples:

const beginsHello = /^Hello/;
const endsDigit = /\d$/;
const allDigits = /^\d+$/;

const wordRe = /\bcat.*?\b/g;
console.log("The cat found a catapult catalog in the catacombs.".match(wordRe));
// ['cat', 'catapult', 'catalog', 'catacombs']

const middleRe = /\Bcat\B/g;
console.log("certificate".match(middleRe));  // ['cat']

Case-insensitive matching with the `i` flag

const regex = /hello/i;
console.log("HELLO World".match(regex)[0]);  // HELLO

Substitution with `replace()` / `replaceAll()`

Replace matches with a new string.

const result = "Agent Alice gave the documents to Agent Bob."
  .replaceAll(/Agent \w+/g, "REDACTED");
console.log(result);  // REDACTED gave the documents to REDACTED.

Verbose mode (no built-in, use `new RegExp`)

JavaScript has no verbose flag, but you can build readable patterns by concatenating strings.

const phoneRegex = new RegExp(
  "(" +
    "\\d{3}" +           // area code
    "|\\(\\d{3}\\)" +    // or area code in parens
  ")" +
  "[\\s\\-.]?" +         // separator
  "\\d{3}" +             // first 3 digits
  "[\\s\\-.]" +          // separator
  "\\d{4}"               // last 4 digits
);

Quick mental model

Regex = mini-language for patterns over text (phone numbers, emails, etc.).
Build a pattern (literal /…/ or new RegExp()), then use .match() / .matchAll() / .test() and group/quantifier tools to extract exactly what you need.

Overall idea of the chapter

JavaScript regex works almost identically to Python's re module. The main differences are syntax (/pattern/flags literals), flags as letters (g, i, s), and using .match() / .matchAll() / .replace() as string methods rather than re.compile() objects.

Manually finding patterns vs regex​

Basic regex workflow​

Groups with parentheses​

Escaping special characters​

Alternation with |​

.match() vs .matchAll() (search vs findall)​

Character classes [...] and negated [^...]​

Shorthand character classes \d \w \s and opposites​

Dot . wildcard​

Quantifiers: ?, *, +, \{m,n\}​

Greedy vs non-greedy (? after quantifier)​

.* and .*? (match "anything")​

Matching newlines with the s flag​

Anchors: ^, $, word boundaries \b / \B​

Case-insensitive matching with the i flag​

Substitution with replace() / replaceAll()​

Verbose mode (no built-in, use new RegExp)​

Quick mental model​

Overall idea of the chapter​

Manually finding patterns vs regex

Basic regex workflow

Groups with parentheses

Escaping special characters

Alternation with `|`

`.match()` vs `.matchAll()` (search vs findall)

Character classes `[...]` and negated `[^...]`

Shorthand character classes `\d \w \s` and opposites

Dot `.` wildcard

Quantifiers: `?`, `*`, `+`, `\{m,n\}`

Greedy vs non-greedy (`?` after quantifier)

`.` and `.?` (match "anything")

Matching newlines with the `s` flag

Anchors: `^`, `$`, word boundaries `\b` / `\B`

Case-insensitive matching with the `i` flag

Substitution with `replace()` / `replaceAll()`

Verbose mode (no built-in, use `new RegExp`)

Quick mental model

Overall idea of the chapter