Update README.md
Browse files
README.md
CHANGED
|
@@ -120,7 +120,7 @@ Dhara-70M is a novel diffusion language model that achieves:
|
|
| 120 |
| **FF Dimension** | 1024 |
|
| 121 |
| **Attention Heads** | 8 |
|
| 122 |
| **KV Heads** | 4 (GQA) |
|
| 123 |
-
| **Context Length** |
|
| 124 |
| **Position Encoding** | RoPE |
|
| 125 |
| **Normalization** | RMSNorm |
|
| 126 |
| **Special Layers** | Canon (depthwise causal convolutions) |
|
|
|
|
| 120 |
| **FF Dimension** | 1024 |
|
| 121 |
| **Attention Heads** | 8 |
|
| 122 |
| **KV Heads** | 4 (GQA) |
|
| 123 |
+
| **Context Length** | 1024 tokens |
|
| 124 |
| **Position Encoding** | RoPE |
|
| 125 |
| **Normalization** | RMSNorm |
|
| 126 |
| **Special Layers** | Canon (depthwise causal convolutions) |
|