Text to Speech

Type or paste text and turn it into natural-sounding speech — right in your browser. Everything runs locally on your machine so your text never leaves your computer.

Text

Voice Each voice downloads a tiny (~0.5 MB) sample, cached after first use.

Speech is synthesized with Kokoro-82M, an open-weight text-to-speech model, run in the browser by kokoro-js 1.2.1 on Transformers.js 3.5.1 and ONNX Runtime Web — using WebGPU when available, else CPU.

Text is converted to phonemes with phonemizer, then the model generates a 24 kHz waveform that's assembled into a downloadable WAV.

Everything runs on your machine — no server, no upload. The model (onnx-community/Kokoro-82M-v1.0-ONNX) and voices download once from the Hugging Face Hub, then are cached by the browser.

Run from a local web server

Not supported in Safari yet