harryleafchen commited on
Commit
94aed2c
·
verified ·
1 Parent(s): 7cab9ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -11,13 +11,17 @@ library_name: transformers
11
 
12
  [![License](https://img.shields.io/badge/License-Apache-f5de53?&color=f5de53)](LICENSE)
13
 
14
- PCMind-2.1-Kaiyuan-2B is a cutting-edge, **fully open-source language model** trained using Ascend 910A clusters.
15
  With 1.4B non-embedding parameters and training on 2.2 trillion tokens,
16
  it achieves performance competitive with current state-of-the-art fully open models and even rivals some leading open-weight models of similar scale.
17
 
18
- ## Introduction
 
 
 
 
19
 
20
- ![](model_performance_comparison.png)
21
 
22
  Our data preprocessing and pre-training pipeline is designed for enhanced training efficiency and model quality,
23
  achieved through several key innovations:
@@ -43,7 +47,7 @@ achieved through several key innovations:
43
  The model architecture is similar to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B),
44
  and can be easily loaded by libraries like `transformers`.
45
 
46
- Please use [`demo.py`](demo.py) as an example of use.
47
 
48
  *Note: This is a pretrained base model only and has not undergone fine-tuning,
49
  reinforcement learning (RL), or any other post-training procedures.
 
11
 
12
  [![License](https://img.shields.io/badge/License-Apache-f5de53?&color=f5de53)](LICENSE)
13
 
14
+ PCMind-2.1-Kaiyuan-2B is a cutting-edge, **fully open-source language model** (i.e., open dataset) trained on a Ascend 910A cluster.
15
  With 1.4B non-embedding parameters and training on 2.2 trillion tokens,
16
  it achieves performance competitive with current state-of-the-art fully open models and even rivals some leading open-weight models of similar scale.
17
 
18
+ <center>
19
+ ![Model Performance Comparison](model_performance_comparison.svg)
20
+ </center>
21
+
22
+ We will publish the datasets used to train Kaiyuan-2B soon.
23
 
24
+ ## Introduction
25
 
26
  Our data preprocessing and pre-training pipeline is designed for enhanced training efficiency and model quality,
27
  achieved through several key innovations:
 
47
  The model architecture is similar to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B),
48
  and can be easily loaded by libraries like `transformers`.
49
 
50
+ Please use [`demo.py`](demo.py) as an example.
51
 
52
  *Note: This is a pretrained base model only and has not undergone fine-tuning,
53
  reinforcement learning (RL), or any other post-training procedures.