Extract text from an image, or transcribe an audio or video file — right in your browser. Everything runs locally in your browser so your file never leaves your computer.
Images are read with Tesseract.js 5.1.1, an in-browser port of Google's Tesseract OCR engine.
Audio & video are transcribed with OpenAI's Whisper models, run by Transformers.js 3.5.1 on ONNX Runtime Web — using WebGPU when available, else CPU.
For video, CAF, and other containers the browser can't decode, the audio track is extracted by
ffmpeg.wasm
(@ffmpeg/ffmpeg 0.12.15, @ffmpeg/core 0.12.10 — a WebAssembly build of
FFmpeg,
GPL v2).
Everything runs on your machine — no server, no upload. Engines and models are loaded from CDNs and download once, then are cached by the browser.