Chapter 22 - Recognizing Text in Images (JavaScript)
Here's a JavaScript-flavoured version of the same concepts, with small JS/Node examples for each idea.
What OCR is and what this chapter uses
Same idea: extract text from images so you can work with it as strings. In JavaScript:
- tesseract.js – a pure JavaScript/WebAssembly port of Tesseract that runs in Node and the browser (no native binary needed).
- sharp (from Chapter 21) for image preprocessing.
- For searchable PDFs, you can still call NAPS2 via
child_process, or use pdf-lib to embed text layers.
npm install tesseract.js
Installing and setup
Unlike Python's PyTesseract which requires a separate Tesseract binary, tesseract.js is self-contained — it downloads language data automatically on first use.
const Tesseract = require("tesseract.js");
No PATH configuration needed. Language .traineddata files are fetched from a CDN by default (or you can bundle them locally).
OCR fundamentals with tesseract.js
Basic pattern to get text from an image:
const Tesseract = require("tesseract.js");
async function recognize(imagePath) {
const { data } = await Tesseract.recognize(imagePath, "eng");
console.log(data.text);
}
recognize("ocr-example.png");
Tesseract.recognize(image, lang)returns an object withdata.textcontaining the recognized string.- Works with file paths, Buffers, or URLs.
Using a worker for multiple images (better performance)
const Tesseract = require("tesseract.js");
async function main() {
const worker = await Tesseract.createWorker("eng");
const { data: data1 } = await worker.recognize("page1.png");
console.log(data1.text);
const { data: data2 } = await worker.recognize("page2.png");
console.log(data2.text);
await worker.terminate();
}
main();
Creating a worker once and reusing it avoids reloading the language model each time.
Typical OCR issues
Same issues as Python:
- End-of-line hyphenation preserved.
- Layout info (fonts, columns) lost.
- Character misreads (especially numbers).
- Multi-column text gets jumbled.
Preprocessing images (to improve accuracy)
Use sharp to preprocess before OCR:
const sharp = require("sharp");
const Tesseract = require("tesseract.js");
async function ocrWithPreprocessing(imagePath) {
// Convert to grayscale, increase contrast, sharpen
const processed = await sharp(imagePath)
.greyscale()
.normalize() // auto contrast
.sharpen()
.toBuffer();
const { data } = await Tesseract.recognize(processed, "eng");
return data.text;
}
Same guidelines apply:
- Avoid multi-column pages.
- Use typewritten, conventional fonts.
- Rotate so text is perfectly upright.
- Dark text on light background.
- Remove borders and noise.
Using LLMs to fix OCR mistakes
Same approach works in JavaScript — send OCR output to an LLM API:
// Example using fetch with an LLM API
async function cleanOcrText(rawText) {
const response = await fetch("https://api.anthropic.com/v1/messages", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": process.env.ANTHROPIC_API_KEY,
"anthropic-version": "2023-06-01",
},
body: JSON.stringify({
model: "claude-sonnet-4-5-20250929",
max_tokens: 4096,
messages: [
{
role: "user",
content: `Fix OCR errors in this text. Only fix spacing and character recognition errors. Do not correct original spelling/grammar. Join hyphenated line breaks. Put each paragraph on one line.\n\n${rawText}`,
},
],
}),
});
const data = await response.json();
return data.content[0].text;
}
Always review LLM output — it can miss errors or introduce new ones.
Recognizing non-English text
tesseract.js supports the same language codes. Pass the language when creating a worker or calling recognize:
const Tesseract = require("tesseract.js");
// Japanese OCR
const { data } = await Tesseract.recognize("frankenstein_jpn.png", "jpn");
console.log(data.text);
Multiple languages:
// English + Japanese
const { data } = await Tesseract.recognize("mixed_text.png", "eng+jpn");
console.log(data.text);
Language data is downloaded automatically on first use for each language code.
Creating searchable PDFs with OCR
Option 1: Call NAPS2 via child_process
Same approach as Python — NAPS2 works the same regardless of calling language:
const { execSync } = require("child_process");
// macOS
execSync([
"/Applications/NAPS2.app/Contents/MacOS/NAPS2",
"console",
"-i", "frankenstein.png",
"-o", "output.pdf",
"--install", "ocr-eng",
"--ocrlang", "eng",
"-n", "0",
"-f",
].join(" "));
Option 2: tesseract.js + pdf-lib
Generate a searchable PDF programmatically by overlaying invisible text on the image:
const Tesseract = require("tesseract.js");
// tesseract.js can output PDF directly
async function createSearchablePdf(imagePath, outputPath) {
const worker = await Tesseract.createWorker("eng");
const { data } = await worker.recognize(imagePath);
// Get PDF output
const pdf = await worker.getPDF("OCR Result");
const fs = require("fs");
fs.writeFileSync(outputPath, Buffer.from(pdf.data));
await worker.terminate();
}
createSearchablePdf("frankenstein.png", "output.pdf");
worker.getPDF() generates a PDF with the image and an invisible text layer — equivalent to what NAPS2 produces.
Practice program: Browser Text Scraper
JavaScript equivalent using screenshot-desktop and tesseract.js:
npm install screenshot-desktop tesseract.js sharp
const screenshot = require("screenshot-desktop");
const sharp = require("sharp");
const Tesseract = require("tesseract.js");
const fs = require("fs");
// Coordinates for the text portion (adjust as needed)
const LEFT = 400;
const TOP = 200;
const WIDTH = 600; // RIGHT - LEFT
const HEIGHT = 600; // BOTTOM - TOP
async function ocrScreen() {
// Capture screenshot
const imgBuffer = await screenshot({ format: "png" });
// Crop to text region
const cropped = await sharp(imgBuffer)
.extract({ left: LEFT, top: TOP, width: WIDTH, height: HEIGHT })
.toBuffer();
// Run OCR
const { data } = await Tesseract.recognize(cropped, "eng");
// Append to output.txt
fs.appendFileSync("output.txt", data.text + "\n", "utf-8");
console.log("Text appended to output.txt");
}
ocrScreen();
Overall idea of the chapter
Chapter 22 in JavaScript: tesseract.js provides a self-contained OCR engine (no native binary install needed) that runs in Node and browser, supports multiple languages with 'eng+jpn' syntax, and can output searchable PDFs directly with worker.getPDF(). Preprocessing with sharp (greyscale, normalize, sharpen) improves accuracy. For batch PDF creation, NAPS2 can still be called via child_process. LLM APIs can clean OCR output but always need human review.