thu-pacman
/

PCMind-2.1-Kaiyuan-2B

Text Generation

text-generation-inference

Model card Files Files and versions

openhonor commited on 14 days ago

Commit

b24ad92

·

verified ·

1 Parent(s): efd99d4

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -11,7 +11,9 @@ library_name: transformers
 [![License](https://img.shields.io/badge/License-Apache-f5de53?&color=f5de53)](LICENSE)
-PCMind-2.1-Kaiyuan-2B is a fully-open model.
 ## Introduction
@@ -33,7 +35,7 @@ achieved through several key innovations:
     Spark-based framework optimized with [Chukonu](https://pacman.cs.tsinghua.edu.cn/~cwg/publication/chukonu-2021/),
     delivering exceptional efficiency for large-scale deduplication and sorting tasks.
-5.  **Architecture for Training Stability:** Optimized for training on 910A clusters (FP16 precision, similar to V100),
     the Kaiyuan-2B architecture integrates QK norm, sandwich norm, and soft-capping techniques to ensure stable and robust pre-training.
 ## Usage

 [![License](https://img.shields.io/badge/License-Apache-f5de53?&color=f5de53)](LICENSE)
+PCMind-2.1-Kaiyuan-2B is a cutting-edge, **fully open-source language model** trained using Ascend 910A clusters.
+With 1.4B non-embedding parameters and training on 2.2 trillion tokens,
+it achieves performance competitive with current state-of-the-art fully open models and even rivals some leading open-weight models of similar scale.
 ## Introduction
     Spark-based framework optimized with [Chukonu](https://pacman.cs.tsinghua.edu.cn/~cwg/publication/chukonu-2021/),
     delivering exceptional efficiency for large-scale deduplication and sorting tasks.
+5.  **Architecture for Training Stability:** Optimized for training on Ascend 910A clusters (FP16 precision, similar to V100),
     the Kaiyuan-2B architecture integrates QK norm, sandwich norm, and soft-capping techniques to ensure stable and robust pre-training.
 ## Usage