Commit ·
6f1208d
1
Parent(s): 61e3c79
Add paper link to model card
Browse filesCo-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
README.md
CHANGED
|
@@ -17,6 +17,8 @@ pipeline_tag: image-text-to-text
|
|
| 17 |
|
| 18 |
# gemma-3-4b-it-qat-4bit-mobile
|
| 19 |
|
|
|
|
|
|
|
| 20 |
Aggressively optimized version of [gemma-3-4b-it-qat-4bit](https://huggingface.co/mlx-community/gemma-3-4b-it-qat-4bit) for iPhone/iPad (8 GB RAM). Reduces model size from 2.8 GB to 2.1 GB with split weights for text-only lazy loading, significantly lower runtime memory, and reduced thermal output.
|
| 21 |
|
| 22 |
## Optimizations Applied
|
|
|
|
| 17 |
|
| 18 |
# gemma-3-4b-it-qat-4bit-mobile
|
| 19 |
|
| 20 |
+
> **Paper**: [On-Device Multimodal LLM Optimization: Fitting Gemma 3 into 2 GB](https://atomgradient.github.io/swift-gemma-cli/)
|
| 21 |
+
|
| 22 |
Aggressively optimized version of [gemma-3-4b-it-qat-4bit](https://huggingface.co/mlx-community/gemma-3-4b-it-qat-4bit) for iPhone/iPad (8 GB RAM). Reduces model size from 2.8 GB to 2.1 GB with split weights for text-only lazy loading, significantly lower runtime memory, and reduced thermal output.
|
| 23 |
|
| 24 |
## Optimizations Applied
|