Safetensors
qwen2
Zaid commited on
Commit
a80d793
·
verified ·
1 Parent(s): 89a070c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -12
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  datasets:
4
- - IVUL-KAUST/MOLE
5
  metrics:
6
  - f1
7
  base_model:
@@ -10,23 +10,17 @@ base_model:
10
 
11
  ## Model Description
12
  MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct
13
- on synthetically generated dataset.
 
14
 
15
  ## Usage
16
 
17
  Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then
18
 
19
  ```python
20
- from search import get_metadata
21
- from rich import print
22
  from schema import TextSchema
23
  from type_classes import *
24
-
25
- def extract(text, model_name, schema_name = "ar", backend = "openrouter", max_model_len = 8192, max_output_len = 2084, schema = None):
26
- message, metadata, cost, error = get_metadata(
27
- text, model_name, schema_name=schema_name, backend = backend, log = False, max_model_len = max_model_len, max_output_len = max_output_len, schema = schema
28
- )
29
- return metadata
30
 
31
 
32
  class ExampleSchema(TextSchema):
@@ -43,15 +37,19 @@ metadata = extract(
43
  )
44
  print(metadata)
45
 
 
46
  ```
47
 
48
  ## Model Details
49
- - Developed by: IVUL-KAUST
50
  - Model type: The model is based on transformers as it was finetuned from Qwen2.5
51
  - Language(s): languages supported in the model if it is an LLM
52
- - Datasets: datasets used for pretraining or finetuning
53
 
54
  ## Evaluation Results
 
 
 
55
  | **Model** | **ar** | **en** | **jp** | **fr** | **ru** | **multi** | **model** | **Average** |
56
  | ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- |
57
  | **Falcon3 3B Instruct** | 20.46 | 16.30 | 20.29 | 17.81 | 17.23 | 16.13 | 15.96 | 17.74 |
 
1
  ---
2
  license: apache-2.0
3
  datasets:
4
+ - IVUL-KAUST/MOLE-plus
5
  metrics:
6
  - f1
7
  base_model:
 
10
 
11
  ## Model Description
12
  MeXtract 3B is a light-weight model for metadata extraction from scientific papers. The model was created by finetuning Qwen2.5 3B Instruct
13
+ on synthetically generated dataset. Metadata attributes are defined using schema-based apporach where for each attribute we define the Type,
14
+ min length and max lenght, and options if possible.
15
 
16
  ## Usage
17
 
18
  Follow the instructions from [MeXtract](https://github.com/IVUL-KAUST/MeXtract) to install all the dendencies then
19
 
20
  ```python
 
 
21
  from schema import TextSchema
22
  from type_classes import *
23
+ from search import extract
 
 
 
 
 
24
 
25
 
26
  class ExampleSchema(TextSchema):
 
37
  )
38
  print(metadata)
39
 
40
+ ## {'Name': 'Zaid', 'Hobbies': ['Swimming'], 'Age': 25, 'Married': True}
41
  ```
42
 
43
  ## Model Details
44
+ - Developed by: IVUL at KAUST
45
  - Model type: The model is based on transformers as it was finetuned from Qwen2.5
46
  - Language(s): languages supported in the model if it is an LLM
47
+ - Datasets: we use synthetically generated dataset
48
 
49
  ## Evaluation Results
50
+
51
+ The dataset is evaluated on the [MOLE+](https://huggingface.co/IVUL-KAUST/MOLE-plus).
52
+
53
  | **Model** | **ar** | **en** | **jp** | **fr** | **ru** | **multi** | **model** | **Average** |
54
  | ------------------------ | --------- | --------- | --------- | --------- | --------- | --------- | --------- | ----------- |
55
  | **Falcon3 3B Instruct** | 20.46 | 16.30 | 20.29 | 17.81 | 17.23 | 16.13 | 15.96 | 17.74 |