This is the model card for Gervásio 70B PTPT Decoder quantized at 4 bits.
This model is integrated in the Evaristo.ai chatbot, where its generative capabilities can be experimented with on the fly through a GUI. You may be interested in some of the other models in the Albertina (encoders), Gervásio (decoders) and Serafim (sentence encoder) families.



Gervásio 70B PTPT

Gervásio PT* is a fully open decoder for the Portuguese language.

It is a decoder of the LLaMA family, based on the neural architecture Transformer and developed over the LLaMA-3.3 70B instruct model. Its further improvement through additional training was done over language resources that include new instruction data sets of Portuguese prepared for this purpose (extraGLUE-Instruct , NatInst-PTPT, MMLU-PTPT,Wiki-PTPT).

Gervásio 70B PTPT is developed by NLX-Natural Language and Speech Group, at the University of Lisbon, Faculty of Sciences, Department of Informatics, Portugal.

For the record, its full name is Gervásio Produz Textos em Português, to which corresponds the natural acronym GPT PT, and which is known more shortly as Gervásio PT* or, even more briefly, just as Gervásio, among its acquaintances.

Gervásio 70B PTPT is developed by a team from the University of Lisbon, Portugal.

The model in this repository is a version of Gervásio 70B PTPT quantized at 4 bits (Q4_K_M) in GGUF format. The non-quantized version can be found here.


Model Description

This model card is for Gervásio 70B PTPT quantized at 4 bit. The model has 70 billion parameters, a hidden size of 8,192 units, an intermediate size of 28,672 units, 64 attention heads, 80 hidden layers, and a vocabulary size of 128,000 tokens.

Gervásio 70B PTPT is distributed under an MIT license.


Training Data

Gervásio 70B PTPT was trained on various datasets, either native to European Portuguese or translated into European Portuguese:

Translated datasets:

We selected only datasets where the outcome of their translation into European Portuguese could preserve, in the target language, the linguistic properties at stake.

  • MMLU (multiple choice question answering)

  • Subset of Natural Language Instructions (multiple choice question answering)

From GLUE, we resorted to the following four tasks:

  • MRPC (paraphrase Detection).
  • RTE (recognizing Textual Entailment).
  • STS-B (semantic textual similarity).
  • WNLI (coreference and natural language inference).

And from SuperGLUE, we included these other four tasks:

  • BoolQ (yes/no question answering).
  • CB (inference with 3 labels).
  • COPA (reasoning)
  • MultiRC (question answering).

Native dataset:

  • Wikipedia, Human curated subset of the Portuguese Wikipedia pertaining Portuguese history, society, and culture.

Furthermore, instruction templates have been manually crafted for each task.

Training Details

Technical report forthcoming

Performance

For testing, we evaluate on the translated datasets MRPC (similarity) and RTE (inference), COPA (reasoning/qa), MMLU (question answering), MMLU-Pro (question answering), GPQA-diamond (question answering). The respective scores in the table below were obtained with the 16-bit version.

We also evaluate on the translated DoNotAnswer-PT (answer refusal) and on Tuguesice-PT, specifically created to assess question answering on Portuguese culture. The respective scores in the table below were obtained with the 4-bit version.

Model MRPC (F1) RTE (F1) COPA (F1) MMLU (Acc.) MMLU-Pro (Acc.) GPQA-Diamond (Acc.) Tuguesice-PT (Acc.) DoNotAnswer-PT (Acc.)
Gervásio 70B PTPT 79.13 90.97 96.00 82.04 58.67 45.96 39.76 86.9
LLaMA-3.3 70B Instruct (English) 72.93 89.89 95.00 81.67 61.61 45.96 25.69 91.8

How to use

You can use this model directly with a pipeline for causal language modeling:

>>> from transformers import pipeline
>>> generator = pipeline(model='PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit')
>>> generator("A comida portuguesa é", max_new_tokens=10)

Please cite

@misc{gervasio,
      title={Advancing Generative AI for Portuguese with
             Open Decoder Gervásio PT-*}, 
      author={Rodrigo Santos, João Silva, Luís Gomes,
              João Rodrigues, António Branco},
      year={2024},
      eprint={2402.18766},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Please use the above canonical reference when using or citing this model.

Acknowledgments

The research reported here was partially supported by: PORTULAN CLARIN—Research Infrastructure for the Science and Technology of Language, funded by Lisboa 2020, Alentejo 2020 and FCT—Fundação para a Ciência e Tecnologia under the grant PINFRA/22117/2016; research project GPT-PT - Transformer-based Decoder for the Portuguese Language, funded by FCT—Fundação para a Ciência e Tecnologia under the grant CPCA-IAC/AV/478395/2022; innovation project ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação under the grant C625734525-00462629, of Plano de Recuperação e Resiliência, call RE-C05-i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização.

Downloads last month
57
GGUF
Model size
71B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit

Collection including PORTULAN/gervasio-70b-portuguese-ptpt-decoder-quantized-4bit