Type or paste text and turn it into natural-sounding speech — right in your browser. Everything runs locally on your machine so your text never leaves your computer.
Speech is synthesized with Kokoro-82M, an open-weight text-to-speech model, run in the browser by kokoro-js 1.2.1 on Transformers.js 3.5.1 and ONNX Runtime Web — using WebGPU when available, else CPU.
Text is converted to phonemes with phonemizer, then the model generates a 24 kHz waveform that's assembled into a downloadable WAV.
Everything runs on your machine — no server, no upload. The model (onnx-community/Kokoro-82M-v1.0-ONNX) and voices download once from the Hugging Face Hub, then are cached by the browser.