File size: 1,745 Bytes
a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 f984e6d a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 a1f7fec f5aa5d1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
language:
- ko
metrics:
- accuracy
library_name: transformers
---
# KLUE Robeta-base for legal documents
<!-- Provide a quick summary of what the model is/does. -->
- KLUE/Robeta-Base Model์ ํ๊ฒฐ๋ฌธ์ผ๋ก ์ด๋ค์ง legal_text_merged02_light.txt ํ์ผ์ ์ฌ์ฉํ์ฌ ์ฌํ์ต ์ํจ ๋ชจ๋ธ์
๋๋ค.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** J.Park @ KETI
- **Model type:** klue/roberta-base
- **Language(s) (NLP):** korean
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### ํ์ต ๋ฐฉ๋ฒ
```python
base_model = 'klue/roberta-base'
base_tokenizer = 'klue/roberta-base'
from transformers import RobertaTokenizer, RobertaForMaskedLM
from transformers import AutoModel, AutoTokenizer
model = RobertaForMaskedLM.from_pretrained(base_model)
tokenizer = AutoTokenizer.from_pretrained(base_tokenizer)
from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(
tokenizer=tokenizer,
file_path=fpath_dataset,
block_size=512,
)
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir=output_dir,
overwrite_output_dir=True,
num_train_epochs=5,
per_device_train_batch_size=18,
save_steps=100,
save_total_limit=2,
seed=1
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset
)
train_metrics = trainer.train()
trainer.save_model(output_dir)
trainer.push_to_hub()
```
|