What makes the 'web' model different?
I am trying to learn about finetuning models. I would love to finetune a Gemma model, however I am looking to use it in a Web application, which needs special consideration in the type of model I use.
In your version of this model, for web, what are the special settings or operations used - I assume in conversion - that allows for a web-friendly model vs. a non-web Friendly model. Thank you!
Hi @nlisac ,
Welcome to Gemma models, thanks for reaching out to us. The most direct indicator of web-friendly is the model's file extension and the runtime environment it's designed for.
Standard Model: Typically distributed in formats like PyTorch (.pth, .safetensors) or Hugging Face Transformers, which are designed for powerful server-side GPUs.
Web-Friendly Model: The suffix litert-lm in the model name stands for LiteRT-LM, which is Google's dedicated framework for on-device or edge inference.
- The model is converted into a proprietary, highly optimized format like .litertlm (or sometimes a .task file bundle containing a mix of smaller optimized TFLite files).
- This format is designed to be used with the MediaPipe LLM Inference API or the underlying LiteRT-LM runtime, which can run efficiently within a web worker thread using WebAssembly (Wasm).
The base Gemma 3n architecture already includes features that make it inherently more web-friendly before conversion:
1.Selective Parameter Activation (MatFormer): Gemma 3n uses the Matryoshka Transformer architecture, which allows for the selective activation of only a subset of the total parameters based on the task or device resources. This reduces the effective parameter count (E2B) and the computational cost per request.
2. Per-Layer Embedding (PLE) Caching: This technique allows key embedding parameters to be cached to fast, local storage, reducing runtime memory requirements.
3. On-Device Focus: The Gemma family, particularly the 3n variants, are specifically engineered by Google for efficient execution on low-resource devices like mobile phones and, by extension, web browsers.
Thanks.
Thanks for the detailed answer.
A follow up question from my side:
Whats the difference then between gemma-3n-E2B-it-int4-Web.litertlm and gemma-3n-E2B-it-int4.litertlm? As both models are already in the litertlm format.
Could you please clarify the sources from which you are referencing these two models?
Thanks.
Could you please clarify the sources from which you are referencing these two models?
Thanks.
@BalakrishnaCh
From this repository (the Files tab):
https://huggingface.co/google/gemma-3n-E2B-it-litert-lm/tree/main