File size: 9,541 Bytes

1593545
 
8acd1f8
6032208
1593545
d550a7d
0ae1628
 
 
16ac904
8acd1f8
 
31c07cb
8acd1f8
8cf0787
8acd1f8
fb2393d
bff529c
570a749
4ecbf84
8acd1f8
 
bff529c
7c64f93
 
 
 
 
 
bff529c
 
 
 
 
 
 
 
 
8acd1f8
 
4ecbf84
 
 
aeac114
4ecbf84
 
 
 
 
 
aeac114
4ecbf84
 
8acd1f8
be759fe
d30afb2
 
 
 
 
 
 
 
 
 
b5f3bbe
d30afb2
b5f3bbe
4ecbf84
be759fe
8acd1f8
 
 
 
 
a0e30c3
8acd1f8
 
 
 
 
 
 
 
 
1a640c9
9553035
8acd1f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c805654
85e8676
8acd1f8
 
be759fe
8acd1f8
 
a0e30c3
8acd1f8
 
 
 
1593545
4c10005
 
 
5429042
4c10005
 
be759fe
 
 
8acd1f8
1593545
8acd1f8
1593545
8acd1f8
1593545
8acd1f8
1593545
8acd1f8
1593545
fdfbf15
1593545
8acd1f8
1593545
1a86c55
 
1593545
8acd1f8
 
 
 
 
 
 
 
 
1a86c55
 
 
 
 
 
 
 
 
 
 
8acd1f8

---
base_model:
- scb10x/llama3.1-typhoon2-70b-instruct
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
library_name: transformers
license: llama3.1
tags:
- mergekit
- merge
pipeline_tag: text-generation
---

**Typhoon2-DeepSeek-R1-70B**: Thai reasoning large language model. (research preview)

**Typhoon2-DeepSeek-R1-70B** is a Thai 🇹🇭 reasoning large language model with 70 billion parameters, and it is based on DeepSeek R1 70B Distill and Typhoon2 70B Instruct + SFT merged.

For more details, please see our [blog](https://blog.opentyphoon.ai/introducing-typhoon2-r1-70b-enhanced-reasoning-with-deepseek-r1-abcedaa9b7ad). Demo on [opentyphoon.ai](https://playground.opentyphoon.ai/playground).

Paper: [https://arxiv.org/abs/2502.09056](https://arxiv.org/abs/2502.09056)

*To acknowledge Meta's effort in creating the foundation model and to comply with the license, we explicitly include "llama-3.1" in the model name.


## **Key Highlights**
- **Advanced Reasoning:** Comparable to DeepSeek R1 70B Distill’s reasoning capabilities.
- **Enhanced Math & Coding Performance**: Up to 6 times more accurate than Typhoon2 70B Instruct.
- **Strong Thai Language Proficiency**: Performs similarly to Typhoon2 70B Instruct.
- **Cross-Domain Reasoning**: Adapts to various domains, going beyond math and coding.

## **Model Description**

- **Model type**: A 70B instruct decoder-only model based on Llama architecture.
- **Requirement**: transformers 4.45.0 or newer.
- **Primary Language(s)**: Thai 🇹🇭 and English 🇬🇧
- **License**: [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
- **Demo**: [opentyphoon.ai](https://playground.opentyphoon.ai/playground).


## **Performance**

**Reasoning Performance**

<div align="center">
  <img src="https://storage.googleapis.com/typhoon-public/assets/typhoon2-text/r1_reasoning.jpg" alt="Typhoon2 DeepSeek R1 70B Reasoning Performance" width="100%" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</div>


**General Instruction-Following Performance**

<div align="center">
  <img src="https://storage.googleapis.com/typhoon-public/assets/typhoon2-text/r1_general.jpg" alt="Typhoon2 DeepSeek R1 70B General Performance" width="100%" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</div>


## **Recommend system prompt**

More controllable | less reasoning capability
```
You are a helpful assistant named Typhoon that always reasons before answering. First, provide reasoning in English, starting with "Alright!" and ending with the `</think>` token. Then, always respond to the user in the language they use or request. Avoid unnecessary affirmations or filler phrases such as "Alright", "Okay", etc., in the response.
```
More reasoning capability | less controllable
```
You are a helpful assistant named Typhoon that always reasons before answering. First, provide reasoning in English start with Alright!, then respond to the user in the language they use or request.
```
Highest reasoning capability | least controllable
```
# No system prompt
```

## **Usage Example**

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "scb10x/llama3.1-typhoon2-deepseek-r1-70b-preview"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": 'You are a helpful assistant named Typhoon that always reasons before answering. First, provide reasoning in English, starting with "Alright!" and ending with the `</think>` token. Then, always respond to the user in the language they use or request. Avoid unnecessary affirmations or filler phrases such as "Alright", "Okay", etc., in the response'},
    {"role": "user", "content": "จำนวนเต็มบวกที่น้อยที่สุดที่เป็นผลคูณของ 30 ซึ่งสามารถเขียนได้ด้วยตัวเลข 0 และ 2 เท่านั้นคืออะไร"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
)
response = outputs[0][input_ids.shape[-1]:]
# if you want only response without thinking trace. try `tokenizer.decode(response, skip_special_tokens=True).split('</think>')[-1]` to get only response
print(tokenizer.decode(response, skip_special_tokens=True)) # <think> Okay, .... </think> ดังนั้น จำนวนเต็มบวกที่น้อยที่สุดที่เป็นผลคูณของ 30 และเขียนได้ด้วยตัวเลข 0 และ 2 เท่านั้นคือ 2220 boxed{2220}
```

## **Inference Server Hosting Example**
```bash
pip install vllm
vllm serve scb10x/llama3.1-typhoon2-deepseek-r1-70b-preview --tensor-parallel-size 2 --gpu-memory-utilization 0.95 --max-model-len 16384 --enforce-eager
# using at least 2 80GB gpu eg A100, H100 for hosting 70b model
# to serving longer context (90k), 4 gpu is required (and you can omit --enforce-eager to improve throughput)
# see more information at https://docs.vllm.ai/
```

## **Chat template**
This model is using the DeepSeek R1 70B Distill chat template, not the Llama 3 template. Be careful.
```
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜>'}}{% endif %}
```

## **Tool use**
We don't recommend using tool use on this model.

## **Intended Uses & Limitations**

This model is an reasoning instructional model. However, it’s still undergoing development. It incorporates some level of guardrails, but it still may produce answers that are inaccurate, biased, or otherwise objectionable in response to user prompts. We recommend that developers assess these risks in the context of their use case.

## **Follow us**

**https://twitter.com/opentyphoon**

## **Support**

**https://discord.gg/us5gAYmrxw**

## **Citation**


- If you find Typhoon 2 / Typhoon 2 R1 useful for your work, please cite it using:
```
@misc{typhoon2,
      title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models}, 
      author={Kunat Pipatanakul and Potsawee Manakul and Natapong Nitarach and Warit Sirichotedumrong and Surapon Nonesung and Teetouch Jaknamon and Parinthapat Pengpun and Pittawat Taveekitworachai and Adisai Na-Thalang and Sittipong Sripaisarnmongkol and Krisanapong Jirayoot and Kasima Tharnpipitchai},
      year={2024},
      eprint={2412.13702},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13702}, 
}
```
```
@misc{pipatanakul2025adaptinglanguagespecificllmsreasoning,
      title={Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging - An Open Recipe}, 
      author={Kunat Pipatanakul and Pittawat Taveekitworachai and Potsawee Manakul and Kasima Tharnpipitchai},
      year={2025},
      eprint={2502.09056},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.09056}, 
}
```