Describe some music and your browser will compose a short clip — entirely locally, no server, no upload. The first run downloads a ~600 MB model (cached after that), and composing is compute-heavy: expect a minute or two for a short clip.
Music is generated by MusicGen-small, an open-weight text-to-music model from Meta, run in the browser by Transformers.js 3.5.1 on ONNX Runtime Web (CPU).
Your description is encoded with a T5 text encoder, then MusicGen's language model composes EnCodec audio tokens — 50 per second of music — which are decoded into a 32 kHz waveform and assembled into a downloadable WAV.
Everything runs on your machine — no server, no upload. The quantized model (Xenova/musicgen-small, ~600 MB) downloads once from the Hugging Face Hub, then is cached by the browser.
Note: MusicGen's model weights are released under CC BY-NC 4.0 — generated audio is for non-commercial use.