Update README.md with benchmark results (#2)
Browse files- Update README.md with benchmark results (01424c4ff768ab5c93fd15cca763be316a72d344)
Co-authored-by: Bowen Li <bowenli@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -36,16 +36,34 @@ InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model t
|
|
| 36 |
|
| 37 |
- **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
|
| 38 |
|
| 39 |
-
- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference.
|
| 40 |
|
| 41 |
- **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
|
| 42 |
|
| 43 |
## InternLM2.5-7B-Chat-1M
|
| 44 |
|
| 45 |
-
InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
### LMDeploy
|
| 48 |
|
|
|
|
|
|
|
|
|
|
| 49 |
LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
|
| 50 |
|
| 51 |
Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**
|
|
|
|
| 36 |
|
| 37 |
- **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
|
| 38 |
|
| 39 |
+
- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference and a [file chat demo](https://github.com/InternLM/InternLM/tree/main/long_context).
|
| 40 |
|
| 41 |
- **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
|
| 42 |
|
| 43 |
## InternLM2.5-7B-Chat-1M
|
| 44 |
|
| 45 |
+
InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat.
|
| 46 |
+
|
| 47 |
+
### Performance Evaluation
|
| 48 |
+
|
| 49 |
+
We employed the "*needle in a haystack approach*" to evaluate the model's ability to retrieve information from long texts. Results show that InternLM2.5-7B-Chat-1M can accurately locate key information in documents up to 1M tokens in length.
|
| 50 |
+
|
| 51 |
+
<p align="center">
|
| 52 |
+
<img src="https://github.com/libowen2121/InternLM/assets/19970308/2ce3745f-26f5-4a39-bdcd-2075790d7b1d" alt="drawing" width="700"/>
|
| 53 |
+
</p>
|
| 54 |
+
|
| 55 |
+
We also used the [LongBench](https://github.com/THUDM/LongBench) benchmark to assess long-document comprehension capabilities. Our model achieved optimal performance in these tests.
|
| 56 |
+
|
| 57 |
+
<p align="center">
|
| 58 |
+
<img src="https://github.com/libowen2121/InternLM/assets/19970308/1e8f7da8-8193-4def-8b06-0550bab6a12f" alt="drawing" width="800"/>
|
| 59 |
+
</p>
|
| 60 |
+
|
| 61 |
|
| 62 |
### LMDeploy
|
| 63 |
|
| 64 |
+
Since huggingface Transformers does not directly support inference with 1M-long context, we recommand to use LMDeploy. The conventional usage with huggingface Transformers is also shown below.
|
| 65 |
+
|
| 66 |
+
|
| 67 |
LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
|
| 68 |
|
| 69 |
Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**
|