File size: 4,938 Bytes
bd6d3e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e6cfee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---

library_name: transformers
license: mit
datasets:
- Geraldine/Ead-Instruct-4k-Distilled
language:
- zho
- eng
- fra
- spa
- por
- deu
- ita
- rus
- jpn
- kor
- vie
- tha
- ara
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
---


# Gemini-Distill-Qwen2.5-0.5B-ead : Qwen2.5-0.5B-Instruct Fine-Tuned on EAD/XML (Distilled from Gemini-2.0-Flash-Thinking-Exp)

## Model Description
This model is a fine-tuned version of **Qwen2.5-0.5B-Instruct**, trained via knowledge distillation from **Gemini-2.0-Flash-Thinking-Exp**. The goal of this fine-tuning process is to teach the model to reason through and generate **Encoded Archival Description (EAD/XML)** outputs.

It follows a structured reasoning approach:
1. **First**, the model provides detailed reasoning.
2. **Then**, it outputs the final **EAD/XML** response.

This structure ensures that the model justifies its output before producing the archival XML format, improving interpretability and accuracy.

---
## Training Details

### **Dataset**
- Dataset: [Geraldine/Ead-Instruct-4k-Distilled](https://huggingface.co/datasets/Geraldine/Ead-Instruct-4k-Distilled)
- **Columns:**
  - `tag`: EAD/XML element
  - `prompt`: User query
  - `reasoning`: Gemini-generated reasoning traces
  - `final_output`: EAD/XML archival response
  - `completion`: Concatenation of `reasoning` and `final_output`

### **Training Process**
- **Hardware:** : NVIDIA A100-SXM4-80GB
- **Distillation Source:** Gemini-2.0-Flash-Thinking-Exp
- **Model trained on:** User prompt → Gemini reasoning traces → Final EAD/XML response
- **Tokenization Strategy:**
  - **Assistant (reasoning):** The start of the reasoning section
  - **Assistant (final answer):** The start of the XML output
  - Labels are masked (`-100`) before the reasoning phase to optimize the learning process
- **Training Hyperparameters:**
  - **Batch Size:** 4 (per device) with gradient accumulation (steps=2)
  - **Max Sequence Length:** 4096 tokens
  - **Precision:** bf16
  - **Epochs:** 5
  - **Gradient Checkpointing:** Enabled (reduces memory usage)
  - **Dataloader Efficiency:** dataloader_pin_memory=True, dataloader_num_workers=4
  - **Warmup Steps:** 100
  - **Checkpointing:** Model saved at every epoch, with a maximum of 2 saved checkpoints (save_total_limit=2)
  - **Evaluation Strategy:** Evaluates after each epoch (eval_strategy="epoch")

  - **Logging:** Logs stored in ./logs

  - **Other:** dataloader_drop_last=False to preserve all batches



This setup ensures an optimal balance between performance and memory efficiency, leveraging gradient accumulation and checkpointing for stable training on long sequences



### **Training notebook**



[https://www.kaggle.com/code/geraldinegeoffroy/ead-distilled-qwen2-5-0-5b-instruct](https://www.kaggle.com/code/geraldinegeoffroy/ead-distilled-qwen2-5-0-5b-instruct)



---

## Model Usage



### **Load Model**

To use the model with the 🤗 Transformers library:

```python

from transformers import AutoModelForCausalLM, AutoTokenizer



model_name = "Geraldine/Gemini-Distilled-Qwen2.5-0.5B-ead"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(

    model_name,
    torch_dtype="auto",

    device_map="auto", 

)

```


### **Inference Example**
```python

tokenizer = AutoTokenizer.from_pretrained(model_name)



prompt = "Give me an example of <controlaccess> content."

messages = [

    {"role": "system", "content": "You are an archivist expert in EAD/XML format for archival records metadata."},

    {"role": "user", "content": prompt}

]

text = tokenizer.apply_chat_template(

    messages,

    tokenize=False,

    add_generation_prompt=True

)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)



generated_ids = model.generate(

    **model_inputs,

    max_new_tokens=1024

)

generated_ids = [

    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)

]



response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

```

---
## Limitations & Future Improvements
- **Training Data Size:** The dataset consists of **4,000 distilled samples**, which may limit generalization.
- **Inference Speed:** Ensure that **Sliding Window Attention (SWA) is disabled**, as it may slow down inference.
  - To disable: `model.config.sliding_window = None`
- **Potential Future Steps:**
  - Fine-tuning on larger datasets
  - Exploring LoRA/QLoRA for efficient parameter tuning

---
## Citation & Acknowledgments
If you use this model in research or production, please cite:
```

@misc{your-citation,

  author = {Géralidne Geoffroy},

  title = {Qwen2.5-0.5B-Instruct Fine-Tuned on EAD/XML},

  year = {2025},

  publisher = {Hugging Face},

  url = {https://huggingface.co/Geraldine/Gemini-Distill-Qwen2.5-0.5B-ead}

}

```