YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

HiTZ's Capitalization & Punctuation model for Basque

Model description

| Model architecture | Model size | Language

[Brief introduction to the model: What does? Is it fine-tuned, from which model? Any framework or toolkit used for training? How many parameters? Any links to documentation?]

Usage

Required packages:

  • torch
  • transformers
  • sentencepiece

Capitalizing using python:

Clone repository to download the model:

git clone https://huggingface.co/HiTZ/eu_norm-eu

Given MARIANMT_MODEL_FOLDERPATH is the path that points to the downloaded eu_norm-eu folder.

from transformers import pipeline

device = 0 # 0-->GPU, -1-->CPU

segment_list = ["kaixo egun on guztioi", "faktoria e i te beko irratian entzuten da", "gutxi gora behera ehuneko berrogeita bikoa","lau zortzi hamabost hamasei hogeita hiru berrogeita bi", "nire jaio urtea mila bederatziehun eta laurogeita hamasei da", "informazio gehiago hitz puntu e hatxe u puntu eus web horrian"]

translator = pipeline(task="translation", model=model_path, tokenizer=MARIANMT_MODEL_FOLDERPATH, device=device)    
result_list = translator(segment_list)
cp_segment_list = [result["translation_text"] for result in result_list]

for text, cp_text in zip(segment_list, cp_segment_list):
    print(f"Normalized: {text}\n  With C&P: {cp_text}\n")

Expected output:

Normalized: kaixo egun on guztioi
  With C&P: Kaixo, egun on guztioi.

Normalized: faktoria e i te beko irratian entzuten da
  With C&P: Faktoria EiTBko irratian entzuten da.

Normalized: gutxi gora behera ehuneko berrogeita bikoa
  With C&P: Gutxi gora behera %42koa.

Normalized: lau zortzi hamabost hamasei hogeita hiru berrogeita bi
  With C&P: Lau, zortzi, hamabost, hamasei, hogeita hiru, berrogeita bi.

Normalized: nire jaio urtea mila bederatziehun eta laurogeita hamasei da
  With C&P: Nire jaio urtea 1996 da.

Normalized: informazio gehiago hitz puntu e hatxe u puntu eus web horrian
  With C&P: Informazio gehiago hitz.ehu.eus web horrian.

Training

Data preparation

[Brief descriptions of the data used for training and evaluation. How many corpus, how many sentences and if itis possible, links to the used datasets]

Training procedure

[Any pre-trained model, which one? Using GPU, how many? Using any toolkit, which one? Using any other script for preparing data like tokenizers, which one?]

Performance

[Which metrics are use dfor measuring the performance of the model? Add a table with the results in each evaluation split]

Limitations

[Describe the limitations of the model]

Aditional Information

Author

HiTZ Basque Center for Language Technology - Aholab Signal Processing Laboratory, University of the Basque Country UPV/EHU.

Copyright

Copyright (c) 2025 HiTZ Basque Center for Language Technology - Aholab Signal Processing Laboratory, University of the Basque Country UPV/EHU.

Licensing Information

[Link to the License]

Funding

[Add information]

References

[Add here any referenced tools or papers]

Disclaimer

Click to expand The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.

When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.

In no event shall the owner and creator of the models (HiTZ Basque Center for Language Technology - Aholab Signal Processing Laboratory, University of the Basque Country UPV/EHU.) be liable for any results arising from the use made by third parties of these models.

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support