CoT data construction
#1
by
anaconda123
- opened
The model used to build the CoT dataset in the paper and code is not consistent. So is the data produced using the GLM abandoned?
In the preprint paper: "Subsequently, we employ the pure-thinking model GLM-4.1V-Thinking (Hong et al., 2025b) to generate CoT rationales for both the query and the target of each pair."
In the code:
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=messages,
temperature=0.2,
max_tokens=8192,
)
Thanks for your question!
The script create_vision_cot_data.py is not the one we used to construct the actual CoT dataset. The real data construction process followed exactly what was described in the paper.