Media to Text

Extract text from an image, or transcribe an audio or video file — right in your browser. Everything runs locally in your browser so your file never leaves your computer.

Drop an image, audio, or video file here

Images are read with Tesseract.js 5.1.1, an in-browser port of Google's Tesseract OCR engine.

Audio & video are transcribed with OpenAI's Whisper models, run by Transformers.js 3.5.1 on ONNX Runtime Web — using WebGPU when available, else CPU.

For video, CAF, and other containers the browser can't decode, the audio track is extracted by ffmpeg.wasm (@ffmpeg/ffmpeg 0.12.15, @ffmpeg/core 0.12.10 — a WebAssembly build of FFmpeg, GPL v2).

Everything runs on your machine — no server, no upload. Engines and models are loaded from CDNs and download once, then are cached by the browser.

Run from a local web server