thu-pacman
/

PCMind-2.1-Kaiyuan-2B

Text Generation

text-generation-inference

Model card Files Files and versions

harryleafchen commited on 13 days ago

Commit

94aed2c

·

verified ·

1 Parent(s): 7cab9ca

Update README.md

Files changed (1) hide show

README.md +8 -4

README.md CHANGED Viewed

@@ -11,13 +11,17 @@ library_name: transformers
 [![License](https://img.shields.io/badge/License-Apache-f5de53?&color=f5de53)](LICENSE)
-PCMind-2.1-Kaiyuan-2B is a cutting-edge, **fully open-source language model** trained using Ascend 910A clusters.
 With 1.4B non-embedding parameters and training on 2.2 trillion tokens,
 it achieves performance competitive with current state-of-the-art fully open models and even rivals some leading open-weight models of similar scale.
-## Introduction
-![](model_performance_comparison.png)
 Our data preprocessing and pre-training pipeline is designed for enhanced training efficiency and model quality,
 achieved through several key innovations:
@@ -43,7 +47,7 @@ achieved through several key innovations:
 The model architecture is similar to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B),
 and can be easily loaded by libraries like `transformers`.
-Please use [`demo.py`](demo.py) as an example of use.
 *Note: This is a pretrained base model only and has not undergone fine-tuning,
 reinforcement learning (RL), or any other post-training procedures.

 [![License](https://img.shields.io/badge/License-Apache-f5de53?&color=f5de53)](LICENSE)
+PCMind-2.1-Kaiyuan-2B is a cutting-edge, **fully open-source language model** (i.e., open dataset) trained on a Ascend 910A cluster.
 With 1.4B non-embedding parameters and training on 2.2 trillion tokens,
 it achieves performance competitive with current state-of-the-art fully open models and even rivals some leading open-weight models of similar scale.
+<center>
+![Model Performance Comparison](model_performance_comparison.svg)
+</center>
+We will publish the datasets used to train Kaiyuan-2B soon.
+## Introduction
 Our data preprocessing and pre-training pipeline is designed for enhanced training efficiency and model quality,
 achieved through several key innovations:
 The model architecture is similar to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B),
 and can be easily loaded by libraries like `transformers`.
+Please use [`demo.py`](demo.py) as an example.
 *Note: This is a pretrained base model only and has not undergone fine-tuning,
 reinforcement learning (RL), or any other post-training procedures.