Chat with an open language model running entirely in your browser. Everything runs locally on your machine so nothing is sent to a server. Models small enough to run in a browser are still limited, so this is just a demo for now, expect it to improve over time.
Responses come from Gemma 3 1B, an open instruction-tuned model from Google, run in the browser by Transformers.js 4.2.0 on ONNX Runtime Web with WebGPU.
It runs entirely on your machine — no server, no upload. The model (onnx-community/gemma-3-1b-it-ONNX, ~1 GB) downloads once from the Hugging Face Hub, then is cached by the browser.