Update README.md with benchmark results (#2)

Browse files

- Update README.md with benchmark results (01424c4ff768ab5c93fd15cca763be316a72d344)

Co-authored-by: Bowen Li <bowenli@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +20 -2

README.md CHANGED Viewed

@@ -36,16 +36,34 @@ InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model t
 - **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
-- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference.
 - **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
 ## InternLM2.5-7B-Chat-1M
-InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat. Since huggingface Transformers does not directly support inference with 1M-long context, we recommand to use LMDeploy. The conventional usage with huggingface Transformers is also shown below.
 ### LMDeploy
 LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
 Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**

 - **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
+- **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference and a [file chat demo](https://github.com/InternLM/InternLM/tree/main/long_context).
 - **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
 ## InternLM2.5-7B-Chat-1M
+InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat.
+### Performance Evaluation
+We employed the "*needle in a haystack approach*" to evaluate the model's ability to retrieve information from long texts. Results show that InternLM2.5-7B-Chat-1M can accurately locate key information in documents up to 1M tokens in length.
+<p align="center">
+<img src="https://github.com/libowen2121/InternLM/assets/19970308/2ce3745f-26f5-4a39-bdcd-2075790d7b1d" alt="drawing" width="700"/>
+</p>
+We also used the [LongBench](https://github.com/THUDM/LongBench) benchmark to assess long-document comprehension capabilities. Our model achieved optimal performance in these tests.
+<p align="center">
+<img src="https://github.com/libowen2121/InternLM/assets/19970308/1e8f7da8-8193-4def-8b06-0550bab6a12f" alt="drawing" width="800"/>
+</p>
 ### LMDeploy
+Since huggingface Transformers does not directly support inference with 1M-long context, we recommand to use LMDeploy. The conventional usage with huggingface Transformers is also shown below.
 LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
 Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**