IVUL-KAUST
/

MeXtract-3B

Safetensors

qwen2

Model card Files Files and versions

xet

Community

Zaid commited on Oct 8

Commit

a80d793

verified ·

1 Parent(s): 89a070c

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -12

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 license: apache-2.0
 datasets:
-- IVUL-KAUST/MOLE
 metrics:
 - f1
 base_model:
@@ -10,23 +10,17 @@ base_model:
 ## Model Description
 MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct
-on synthetically generated dataset.
 ## Usage
 Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then
 ```python
-from search import get_metadata
-from rich import print
 from schema import TextSchema
 from type_classes import *
-def extract(text, model_name, schema_name = "ar", backend = "openrouter", max_model_len = 8192, max_output_len = 2084, schema = None):
-    message, metadata, cost, error = get_metadata(
-        text, model_name, schema_name=schema_name, backend = backend, log = False, max_model_len = max_model_len, max_output_len = max_output_len, schema = schema
-    )
-    return metadata
 class ExampleSchema(TextSchema):
@@ -43,15 +37,19 @@ metadata = extract(
 )
 print(metadata)
 ```
 ## Model Details
-- Developed by: IVUL-KAUST
 - Model type: The model is based on transformers as it was finetuned from Qwen2.5
 - Language(s): languages supported in the model if it is an LLM
-- Datasets: datasets used for pretraining or finetuning
 ## Evaluation Results
 | **Model**                | **ar**    | **en**    | **jp**    | **fr**    | **ru**    | **multi** | **model** | **Average** |
 | ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- |
 | **Falcon3 3B Instruct**  | 20.46     | 16.30     | 20.29     | 17.81     | 17.23     | 16.13     | 15.96     | 17.74       |

 ---
 license: apache-2.0
 datasets:
+- IVUL-KAUST/MOLE-plus
 metrics:
 - f1
 base_model:
 ## Model Description
 MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct
+on synthetically generated dataset. Metadata attributes are defined using schema-based apporach where for each attribute we define the Type,
+min length and max lenght, and options if possible.
 ## Usage
 Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then
 ```python
 from schema import TextSchema
 from type_classes import *
+from search import extract
 class ExampleSchema(TextSchema):
 )
 print(metadata)
+## {'Name': 'Zaid', 'Hobbies': ['Swimming'], 'Age': 25, 'Married': True}
 ```
 ## Model Details
+- Developed by: IVUL at KAUST
 - Model type: The model is based on transformers as it was finetuned from Qwen2.5
 - Language(s): languages supported in the model if it is an LLM
+- Datasets: we use synthetically generated dataset
 ## Evaluation Results
+The dataset is evaluated on the [MOLE+](https://huggingface.co/IVUL-KAUST/MOLE-plus).
 | **Model**                | **ar**    | **en**    | **jp**    | **fr**    | **ru**    | **multi** | **model** | **Average** |
 | ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- |
 | **Falcon3 3B Instruct**  | 20.46     | 16.30     | 20.29     | 17.81     | 17.23     | 16.13     | 15.96     | 17.74       |