CoT data construction

#1
by anaconda123 - opened

The model used to build the CoT dataset in the paper and code is not consistent. So is the data produced using the GLM abandoned?

In the preprint paper: "Subsequently, we employ the pure-thinking model GLM-4.1V-Thinking (Hong et al., 2025b) to generate CoT rationales for both the query and the target of each pair."

In the code:
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=messages,
temperature=0.2,
max_tokens=8192,
)

Thanks for your question!

The script create_vision_cot_data.py is not the one we used to construct the actual CoT dataset. The real data construction process followed exactly what was described in the paper.

Sign up or log in to comment