Chapter 9 - Text Pattern Matching with Regular Expressions (JavaScript)
Here's a JavaScript-flavoured version of the same concepts, with small JS examples for each idea.
Manually finding patterns vs regex
You can detect patterns (like US phone numbers) with plain string logic, but it's verbose and rigid, while regex does the same with a short pattern.
Example (manual phone check):
function isPhoneNumber(text) {
if (text.length !== 12) return false;
for (let i = 0; i < 3; i++) {
if (!/\d/.test(text[i])) return false;
}
if (text[3] !== "-") return false;
for (let i = 4; i < 7; i++) {
if (!/\d/.test(text[i])) return false;
}
if (text[7] !== "-") return false;
for (let i = 8; i < 12; i++) {
if (!/\d/.test(text[i])) return false;
}
return true;
}
Regex equivalent pattern: /\d{3}-\d{3}-\d{4}/.
Basic regex workflow
Use a regex literal (/pattern/) or new RegExp(). Call .match() on a string or .test() on the regex.
Example:
const phoneRe = /\d{3}-\d{3}-\d{4}/;
const mo = "My number is 415-555-4242.".match(phoneRe);
console.log(mo[0]); // '415-555-4242'
Groups with parentheses
Parentheses create capture groups so you can pull out parts of a match (index 1, 2, etc.).
Example:
const phoneRe = /(\d{3})-(\d{3}-\d{4})/;
const mo = "My number is 415-555-4242.".match(phoneRe);
console.log(mo[1]); // '415'
console.log(mo[2]); // '555-4242'
You can destructure groups into variables:
const [, area, rest] = mo;
Escaping special characters
Characters like ()[]{}.+*?^$|\ have special meanings in regex; prefix with \ to match them literally.
Example (match (415) 555-4242):
const pattern = /(\(\d{3}\)) (\d{3}-\d{4})/;
const mo = "My phone number is (415) 555-4242.".match(pattern);
console.log(mo[1]); // '(415)'
console.log(mo[2]); // '555-4242'
Alternation with |
| means "this or that", and you can combine it with groups for shared prefixes.
Example:
const petRe = /Cat(erpillar|astrophe|ch|egory)/;
const mo = "Catch me if you can.".match(petRe);
console.log(mo[0]); // 'Catch'
console.log(mo[1]); // 'ch'
.match() vs .matchAll() (search vs findall)
.match(regex)without thegflag returns the first match (ornull)..match(regex)with thegflag returns all matches as a flat array (no groups)..matchAll(regex)withgreturns an iterator of full match objects (with groups).
Example (no groups, g flag):
const pattern = /\d{3}-\d{3}-\d{4}/g;
console.log("Cell: 415-555-9999 Work: 212-555-0000".match(pattern));
// ['415-555-9999', '212-555-0000']
With groups (use matchAll):
const pattern2 = /(\d{3})-(\d{3})-(\d{4})/g;
for (const mo of "Cell: 415-555-9999 Work: 212-555-0000".matchAll(pattern2)) {
console.log(mo[1], mo[2], mo[3]);
}
// '415' '555' '9999'
// '212' '555' '0000'
Character classes [...] and negated [^...]
[...] matches any one character from the set; [^...] matches any character not in the set.
Example (vowels vs non-vowels):
const vowelRe = /[aeiouAEIOU]/g;
console.log("RoboCop eats BABY FOOD.".match(vowelRe));
// ['o', 'o', 'o', 'e', 'a', 'A', 'O', 'O']
const consonantRe = /[^aeiouAEIOU]/g;
console.log("RoboCop eats BABY FOOD.".match(consonantRe));
// includes consonants, spaces, punctuation
Shorthand character classes \d \w \s and opposites
Same built-ins as Python:
\ddigits,\Dnon-digits\wletters/digits/underscore,\Wnon-\w\swhitespace,\Snon-whitespace
Example:
const pattern = /\d+\s\w+/g;
console.log("12 drummers, 11 pipers, 10 lords".match(pattern));
// ['12 drummers', '11 pipers', '10 lords']
Dot . wildcard
. matches any character except newline.
Example:
const atRe = /.at/g;
console.log("The cat in the hat sat on the flat mat.".match(atRe));
// ['cat', 'hat', 'sat', 'lat', 'mat']
To match a literal dot, use \..
Quantifiers: ?, *, +, \{m,n\}
Quantifiers say how many of the preceding piece to match.
?– 0 or 1 (optional)*– 0 or more+– 1 or more\{m\}– exactly m\{m,n\}– between m and n (inclusive),\{m,\}/\{,n\}for open-ended
Examples:
const optEx = /42!?/; // '42' or '42!'
const starEx = /Eggs(and spam)*/; // Eggs, Eggs and spam, ...
const plusEx = /(Ha)+/; // 'Ha', 'HaHa', ...
const countEx = /(Ha){3,5}/; // 3 to 5 'Ha'
Use parentheses if you want the quantifier to apply to a whole group, not just one char.
Greedy vs non-greedy (? after quantifier)
*, +, \{m,n\} are greedy: they match as much as possible; adding ? makes them lazy (shortest possible).
Example:
const greedy = /(Ha){3,5}/;
console.log("HaHaHaHaHa".match(greedy)[0]); // 'HaHaHaHaHa'
const lazy = /(Ha){3,5}?/;
console.log("HaHaHaHaHa".match(lazy)[0]); // 'HaHaHa'
Similarly, .* is greedy, .*? is lazy.
.* and .*? (match "anything")
.* means "any chars, 0+ times"; use in groups to capture "whatever is here".
Example (First/Last name):
const nameRe = /First Name: (.*) Last Name: (.*)/;
const mo = "First Name: Al Last Name: Sweigart".match(nameRe);
console.log(mo[1]); // 'Al'
console.log(mo[2]); // 'Sweigart'
Example of greedy vs lazy tags:
const lazyTag = /<.*?>/;
const greedyTag = /<.*>/;
Matching newlines with the s flag
Normally . does not match \n. Add the s (dotAll) flag to let . match newlines too.
Example:
const noNl = /.*/;
console.log("Line1\nLine2".match(noNl)[0]); // 'Line1'
const withNl = /.*/s;
console.log("Line1\nLine2".match(withNl)[0]); // whole string
Anchors: ^, $, word boundaries \b / \B
^– match at start of string$– match at end of string^...$– whole string must match\b– word boundary;\B– not a word boundary
Examples:
const beginsHello = /^Hello/;
const endsDigit = /\d$/;
const allDigits = /^\d+$/;
const wordRe = /\bcat.*?\b/g;
console.log("The cat found a catapult catalog in the catacombs.".match(wordRe));
// ['cat', 'catapult', 'catalog', 'catacombs']
const middleRe = /\Bcat\B/g;
console.log("certificate".match(middleRe)); // ['cat']
Case-insensitive matching with the i flag
const regex = /hello/i;
console.log("HELLO World".match(regex)[0]); // HELLO
Substitution with replace() / replaceAll()
Replace matches with a new string.
const result = "Agent Alice gave the documents to Agent Bob."
.replaceAll(/Agent \w+/g, "REDACTED");
console.log(result); // REDACTED gave the documents to REDACTED.
Verbose mode (no built-in, use new RegExp)
JavaScript has no verbose flag, but you can build readable patterns by concatenating strings.
const phoneRegex = new RegExp(
"(" +
"\\d{3}" + // area code
"|\\(\\d{3}\\)" + // or area code in parens
")" +
"[\\s\\-.]?" + // separator
"\\d{3}" + // first 3 digits
"[\\s\\-.]" + // separator
"\\d{4}" // last 4 digits
);
Quick mental model
- Regex = mini-language for patterns over text (phone numbers, emails, etc.).
- Build a pattern (literal
/…/ornew RegExp()), then use.match()/.matchAll()/.test()and group/quantifier tools to extract exactly what you need.
Overall idea of the chapter
JavaScript regex works almost identically to Python's re module. The main differences are syntax (/pattern/flags literals), flags as letters (g, i, s), and using .match() / .matchAll() / .replace() as string methods rather than re.compile() objects.