Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,9 @@ library_name: transformers
|
|
| 11 |
|
| 12 |
[](LICENSE)
|
| 13 |
|
| 14 |
-
PCMind-2.1-Kaiyuan-2B is a fully-
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## Introduction
|
| 17 |
|
|
@@ -33,7 +35,7 @@ achieved through several key innovations:
|
|
| 33 |
Spark-based framework optimized with [Chukonu](https://pacman.cs.tsinghua.edu.cn/~cwg/publication/chukonu-2021/),
|
| 34 |
delivering exceptional efficiency for large-scale deduplication and sorting tasks.
|
| 35 |
|
| 36 |
-
5. **Architecture for Training Stability:** Optimized for training on 910A clusters (FP16 precision, similar to V100),
|
| 37 |
the Kaiyuan-2B architecture integrates QK norm, sandwich norm, and soft-capping techniques to ensure stable and robust pre-training.
|
| 38 |
|
| 39 |
## Usage
|
|
|
|
| 11 |
|
| 12 |
[](LICENSE)
|
| 13 |
|
| 14 |
+
PCMind-2.1-Kaiyuan-2B is a cutting-edge, **fully open-source language model** trained using Ascend 910A clusters.
|
| 15 |
+
With 1.4B non-embedding parameters and training on 2.2 trillion tokens,
|
| 16 |
+
it achieves performance competitive with current state-of-the-art fully open models and even rivals some leading open-weight models of similar scale.
|
| 17 |
|
| 18 |
## Introduction
|
| 19 |
|
|
|
|
| 35 |
Spark-based framework optimized with [Chukonu](https://pacman.cs.tsinghua.edu.cn/~cwg/publication/chukonu-2021/),
|
| 36 |
delivering exceptional efficiency for large-scale deduplication and sorting tasks.
|
| 37 |
|
| 38 |
+
5. **Architecture for Training Stability:** Optimized for training on Ascend 910A clusters (FP16 precision, similar to V100),
|
| 39 |
the Kaiyuan-2B architecture integrates QK norm, sandwich norm, and soft-capping techniques to ensure stable and robust pre-training.
|
| 40 |
|
| 41 |
## Usage
|