Chat Demo

Chat with an open language model running entirely in your browser. Everything runs locally on your machine so nothing is sent to a server. Models small enough to run in a browser are still limited, so this is just a demo for now, expect it to improve over time.

Responses come from Gemma 3 1B, an open instruction-tuned model from Google, run in the browser by Transformers.js 4.2.0 on ONNX Runtime Web with WebGPU.

Model: Gemma 3 1B (instruction-tuned)
Parameters: ~1 billion
Context window: 32K tokens (32,768)
Quantization: 4-bit (q4)
Download size: ~1 GB
Knowledge cutoff: August 2024
Languages: English

It runs entirely on your machine — no server, no upload. The model (onnx-community/gemma-3-1b-it-ONNX, ~1 GB) downloads once from the Hugging Face Hub, then is cached by the browser.

Run from a local web server

Not built for mobile