Music Generation

Describe some music and your browser will compose a short clip — entirely locally, no server, no upload. The first run downloads a ~600 MB model (cached after that), and composing is compute-heavy: expect a minute or two for a short clip.

Describe the music

Duration Longer clips take proportionally longer to compose. Prompt adherence How closely the music follows your description.

Music is generated by MusicGen-small, an open-weight text-to-music model from Meta, run in the browser by Transformers.js 3.5.1 on ONNX Runtime Web (CPU).

Your description is encoded with a T5 text encoder, then MusicGen's language model composes EnCodec audio tokens — 50 per second of music — which are decoded into a 32 kHz waveform and assembled into a downloadable WAV.

Everything runs on your machine — no server, no upload. The quantized model (Xenova/musicgen-small, ~600 MB) downloads once from the Hugging Face Hub, then is cached by the browser.

Note: MusicGen's model weights are released under CC BY-NC 4.0 — generated audio is for non-commercial use.

Run from a local web server