codelion
/

dhara-70m

Text Generation

feature-extraction

Model card Files Files and versions

codelion commited on 5 days ago

Commit

47ee41f

·

verified ·

1 Parent(s): 7d1c75a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -120,7 +120,7 @@ Dhara-70M is a novel diffusion language model that achieves:
 | **FF Dimension** | 1024 |
 | **Attention Heads** | 8 |
 | **KV Heads** | 4 (GQA) |
-| **Context Length** | 2048 tokens |
 | **Position Encoding** | RoPE |
 | **Normalization** | RMSNorm |
 | **Special Layers** | Canon (depthwise causal convolutions) |

 | **FF Dimension** | 1024 |
 | **Attention Heads** | 8 |
 | **KV Heads** | 4 (GQA) |
+| **Context Length** | 1024 tokens |
 | **Position Encoding** | RoPE |
 | **Normalization** | RMSNorm |
 | **Special Layers** | Canon (depthwise causal convolutions) |