File size: 3,603 Bytes
b84e0b5 a80d793 b84e0b5 a80d793 b84e0b5 89a070c b84e0b5 a80d793 b84e0b5 a80d793 b84e0b5 a80d793 b84e0b5 a80d793 b84e0b5 a80d793 b84e0b5 cbddefd 9e610c4 cbddefd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
license: apache-2.0
datasets:
- IVUL-KAUST/MOLE-plus
metrics:
- f1
base_model:
- Qwen/Qwen2.5-3B-Instruct
---
## Model Description
MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct
on synthetically generated dataset. Metadata attributes are defined using schema-based apporach where for each attribute we define the Type,
min length and max lenght, and options if possible.
## Usage
Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then
```python
from schema import TextSchema
from type_classes import *
from search import extract
class ExampleSchema(TextSchema):
Name: Field(Str, 1, 5)
Hobbies: Field(List[Str], 1, 1, ['Hiking', 'Swimming', 'Reading'])
Age : Field(Int, 1, 100)
Married: Field(Bool, 1, 1)
text = """
My name is Zaid. I am 25 years old. I like swimming and reading. I am is married.
"""
metadata = extract(
text, "IVUL-KAUST/MeXtract-3B", schema=ExampleSchema, backend = "transformers"
)
print(metadata)
## {'Name': 'Zaid', 'Hobbies': ['Swimming'], 'Age': 25, 'Married': True}
```
## Model Details
- Developed by: IVUL at KAUST
- Model type: The model is based on transformers as it was finetuned from Qwen2.5
- Language(s): languages supported in the model if it is an LLM
- Datasets: we use synthetically generated dataset
## Evaluation Results
The dataset is evaluated on the [MOLE+](https://huggingface.co/IVUL-KAUST/MOLE-plus).
| **Model** | **ar** | **en** | **jp** | **fr** | **ru** | **multi** | **model** | **Average** |
| ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- |
| **Falcon3 3B Instruct** | 20.46 | 16.30 | 20.29 | 17.81 | 17.23 | 16.13 | 15.96 | 17.74 |
| **Llama3.2 3B Instruct** | 28.77 | 25.17 | 33.14 | 27.73 | 22.21 | 22.58 | 33.37 | 27.57 |
| **Gemma 3 4B It** | 44.88 | 46.50 | 48.46 | 43.85 | 46.06 | 42.05 | 56.04 | 46.83 |
| **Qwen2.5 3B Instruct** | 49.99 | 56.72 | 61.13 | 57.08 | 64.10 | 52.07 | 59.05 | 57.16 |
| **MOLE 3B** | 23.03 | 50.88 | 50.83 | 50.05 | 57.72 | 43.34 | 17.17 | 41.86 |
| **Nuextract 2.0 4B** | 44.61 | 43.57 | 43.82 | 48.96 | 47.78 | 40.14 | 49.90 | 45.54 |
| **Nuextract 2.0 8B** | 51.93 | 58.93 | 62.11 | 58.41 | 63.21 | 38.21 | 53.70 | 55.21 |
| **MeXtract 0.5B** | 65.96 | 69.95 | 73.79 | 68.42 | 72.07 | 68.20 | 32.41 | 64.40 |
| **MeXtract 1.5B** | 67.06 | 73.71 | 75.08 | 71.57 | 76.28 | 71.87 | 52.05 | 69.66 |
| **MeXtract 3B** | **70.81** | **78.02** | **78.32** | **72.87** | **77.51** | **74.92** | **60.18** | **73.23** |
## Use and Limitations
### Limitations and Bias
the model is optimized for metadata extraction, it might not work for regular NLP tasks.
## License
the model is licensed under Apache 2.0
## Citation
```
@misc{mextract,
title={MeXtract: Light-Weight Metadata Extraction from Scientific Papers},
author={Zaid Alyafeai and Maged S. Al-Shaibani and Bernard Ghanem},
year={2025},
eprint={2510.06889},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.06889},
}
``` |