Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
-
- IVUL-KAUST/MOLE
|
| 5 |
metrics:
|
| 6 |
- f1
|
| 7 |
base_model:
|
|
@@ -10,23 +10,17 @@ base_model:
|
|
| 10 |
|
| 11 |
## Model Description
|
| 12 |
MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct
|
| 13 |
-
on synthetically generated dataset.
|
|
|
|
| 14 |
|
| 15 |
## Usage
|
| 16 |
|
| 17 |
Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then
|
| 18 |
|
| 19 |
```python
|
| 20 |
-
from search import get_metadata
|
| 21 |
-
from rich import print
|
| 22 |
from schema import TextSchema
|
| 23 |
from type_classes import *
|
| 24 |
-
|
| 25 |
-
def extract(text, model_name, schema_name = "ar", backend = "openrouter", max_model_len = 8192, max_output_len = 2084, schema = None):
|
| 26 |
-
message, metadata, cost, error = get_metadata(
|
| 27 |
-
text, model_name, schema_name=schema_name, backend = backend, log = False, max_model_len = max_model_len, max_output_len = max_output_len, schema = schema
|
| 28 |
-
)
|
| 29 |
-
return metadata
|
| 30 |
|
| 31 |
|
| 32 |
class ExampleSchema(TextSchema):
|
|
@@ -43,15 +37,19 @@ metadata = extract(
|
|
| 43 |
)
|
| 44 |
print(metadata)
|
| 45 |
|
|
|
|
| 46 |
```
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
-
- Developed by: IVUL
|
| 50 |
- Model type: The model is based on transformers as it was finetuned from Qwen2.5
|
| 51 |
- Language(s): languages supported in the model if it is an LLM
|
| 52 |
-
- Datasets:
|
| 53 |
|
| 54 |
## Evaluation Results
|
|
|
|
|
|
|
|
|
|
| 55 |
| **Model** | **ar** | **en** | **jp** | **fr** | **ru** | **multi** | **model** | **Average** |
|
| 56 |
| ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- |
|
| 57 |
| **Falcon3 3B Instruct** | 20.46 | 16.30 | 20.29 | 17.81 | 17.23 | 16.13 | 15.96 | 17.74 |
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
+
- IVUL-KAUST/MOLE-plus
|
| 5 |
metrics:
|
| 6 |
- f1
|
| 7 |
base_model:
|
|
|
|
| 10 |
|
| 11 |
## Model Description
|
| 12 |
MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct
|
| 13 |
+
on synthetically generated dataset. Metadata attributes are defined using schema-based apporach where for each attribute we define the Type,
|
| 14 |
+
min length and max lenght, and options if possible.
|
| 15 |
|
| 16 |
## Usage
|
| 17 |
|
| 18 |
Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then
|
| 19 |
|
| 20 |
```python
|
|
|
|
|
|
|
| 21 |
from schema import TextSchema
|
| 22 |
from type_classes import *
|
| 23 |
+
from search import extract
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
|
| 26 |
class ExampleSchema(TextSchema):
|
|
|
|
| 37 |
)
|
| 38 |
print(metadata)
|
| 39 |
|
| 40 |
+
## {'Name': 'Zaid', 'Hobbies': ['Swimming'], 'Age': 25, 'Married': True}
|
| 41 |
```
|
| 42 |
|
| 43 |
## Model Details
|
| 44 |
+
- Developed by: IVUL at KAUST
|
| 45 |
- Model type: The model is based on transformers as it was finetuned from Qwen2.5
|
| 46 |
- Language(s): languages supported in the model if it is an LLM
|
| 47 |
+
- Datasets: we use synthetically generated dataset
|
| 48 |
|
| 49 |
## Evaluation Results
|
| 50 |
+
|
| 51 |
+
The dataset is evaluated on the [MOLE+](https://huggingface.co/IVUL-KAUST/MOLE-plus).
|
| 52 |
+
|
| 53 |
| **Model** | **ar** | **en** | **jp** | **fr** | **ru** | **multi** | **model** | **Average** |
|
| 54 |
| ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- |
|
| 55 |
| **Falcon3 3B Instruct** | 20.46 | 16.30 | 20.29 | 17.81 | 17.23 | 16.13 | 15.96 | 17.74 |
|