File size: 3,406 Bytes
5b2d721 88ceb2c 5b2d721 4a6cec0 5b2d721 4a6cec0 88ceb2c 4a6cec0 88ceb2c 56933b3 0fc4e8f 88ceb2c 0fc4e8f 4a6cec0 0fc4e8f 88ceb2c 0fc4e8f 5b2d721 4a6cec0 5b2d721 4a6cec0 5b2d721 88ceb2c 4a6cec0 5b2d721 4a6cec0 36f2293 4a6cec0 88ceb2c 5b2d721 4a6cec0 fcc7cde | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | ---
license: apache-2.0
tags:
- conversational
- efficient
- i3-architecture
- custom_code
datasets:
- starhopp3r/TinyChat
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# i3 Model - Ultra-Efficient Pretraining Language Model
## Model Description
The **i3 Model** is designed to optimize **pretraining efficiency** while retaining core language modeling capabilities.
Its architecture allows training on **memory-constrained hardware**, including CPU-only setups, without sacrificing sequence modeling performance.
> [!Note]
> The i3 architecture is present within the model for highly efficient pretraining. It is designed to **reduce memory usage**, **speed up training**, and allow pretraining from scratch on tiny hardware. Internal details are abstracted for simplicity.
---
## Use
```python
from transformers import pipeline
pipe = pipeline("text-generation", model="FlameF0X/i3-12m")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages)
````
---
## Model Statistics
* **Vocabulary Size:** 4,466 (variable-length chunks)
* **Hidden Dimension:** 512
* **Number of Layers:** 12
* **Max Sequence Length:** 256
* **Total Parameters:** 12,691,186
* **Tokenization:** Memory-efficient variable-length chunking (2β3 characters)
* **Total tokens:** 334,524,736
---
## Key Features
1. **Memory-Optimized:** Designed to train on tiny hardware with minimal RAM usage
2. **Pretraining-Focused Architecture:** i3 layers provide efficient sequence modeling, low-rank linear updates, and factorized attention
3. **Variable-Length Tokenization:** 2β3 character chunks for compact embeddings
4. **Conversational Readiness:** Optimized for dialogue and text generation
---
## i3 Architecture (Abstract Overview)
### Design Philosophy
The i3 model targets **CPU-friendly, memory-constrained pretraining**, emphasizing:
* Long-range sequence modeling
* Low-rank weight updates for memory savings
* Efficient factorized attention
* 4-bit weights and microbatching for minimal memory footprint
## Technologies used in the i3 Architecture that are open-sourced by me:
* [Low-Rank Pre-training](https://github.com/FlameF0X/Low-Rank-Pretraining) - LoRa for pre-training.
### Conceptual Layout
```
Input Tokens
β
+-----------------+
| Embedding Layer |
+-----------------+
β
+-----------------+
| i3 Architecture |
+-----------------+
β
+------------------------+
| KQV Low-Rank Attention |
+------------------------+
β
+-----------------------+
| LayerNorm + Residuals |
+-----------------------+
β
+-------------------+
| Output Projection |
+-------------------+
β
Predicted Tokens
```
> Key idea: Every component is optimized for **memory efficiency** and **pretraining speed** on small hardware, while preserving essential transformer dynamics.
---
## Training Details
* **Sequence length:** 128β512 tokens
* **Model size:** ~12M parameters (CPU-friendly)
* **Optimizer:** AdamW or Lion (4-bit / mixed precision)
* **Dataset:** TinyChat (~50β200 MB)
* **Training loop:** gradient checkpointing + recomputation
* **Objective:** token prediction / text generation
## Citation
```bibtex
@software{lorpt2025,
title={LoRPt: Low-Rank Pretraining for Resource-Efficient Language Models},
author={[FlameF0X]},
year={2025},
url={https://github.com/FlameF0X/Low-Rank-Pretraining}
}
``` |