Mambarim-110M

Camarim Logo

Model Summary

Mambarim-110M is a pioneering 110-million-parameter language model for Portuguese, built upon the Mamba architecture. Unlike traditional Transformer models that rely on quadratic self-attention, Mamba is a State-Space Model (SSM) that processes sequences with linear complexity.

This design choice leads to significantly faster inference and reduced memory consumption, especially for long sequences. Mamba employs a selection mechanism that allows it to effectively focus on relevant information in the context, making it a powerful and efficient alternative to Transformers. Mambarim-110M is one of the first Mamba-based models developed specifically for the Portuguese language.

Details

Architecture: a Mamba model pre-trained via causal language modeling
Size: 119,930,880 parameters
Context length: 2048 tokens
Dataset: Pt-Corpus-Instruct-tokenized-large (6.2B tokens)
Language: Portuguese
Number of steps: 758,423

Training & Reproducibility

This model was trained to be fully open and reproducible. You can find all the resources used below:

Source Code: GitHub Repository
Training Notebook: Open in Colab
Training Metrics: View on Weights & Biases

Intended Uses

This model is intended for a variety of text generation tasks in Portuguese. Given its size, it is particularly well-suited for environments with limited computational resources.

General-Purpose Text Generation: The model can be used for creative writing, continuing a story, or generating text based on a prompt.
Research and Education: As one of the first Portuguese Mamba models, it serves as an excellent resource for researchers studying State-Space Models, computational efficiency in LLMs, and NLP for non-English languages. Its smaller size also makes it an accessible tool for educational purposes.
Fine-tuning Base: The model can be fine-tuned on specific datasets to create more specialized models for tasks like simple chatbots, content creation aids, or domain-specific text generation.

Out-of-scope Use

The model is not intended for use in critical applications without comprehensive testing and fine-tuning. Users should be aware of the following limitations:

Factual Accuracy: This model is not a knowledge base and can generate incorrect or fabricated information ("hallucinate"). It should not be used as a source of truth.
High-Stakes Decisions: Do not use this model for making important decisions in domains such as medical, legal, or financial advice, as its outputs may be unreliable.
Bias and Safety: The model was trained on a large corpus of public data from the internet and may reflect societal biases present in that data. It can generate content that is biased, offensive, or otherwise harmful.

Basic usage

You need to install transformers from main until transformers>=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
    input_ids,
    repetition_penalty=1.2,
    temperature=0.8,
    top_k=50,
    top_p=0.85,
    do_sample=True,
    max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]

Benchmarks

Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).

Detailed results can be found here

Model	Average	ENEM	BLUEX	OAB Exams	ASSIN2 RTE	ASSIN2 STS	FAQNAD NLI	HateBR	PT Hate Speech	tweetSentBR	Architecture
TeenyTinyLlama-460m	28.86	20.15	25.73	27.02	53.61	13	46.41	33.59	22.99	17.28	LlamaForCausalLM
TeenyTinyLlama-160m	28.2	19.24	23.09	22.37	53.97	0.24	43.97	36.92	42.63	11.39	LlamaForCausalLM
MulaBR/Mula-4x160-v0.1	26.24	21.34	25.17	25.06	33.57	11.35	43.97	41.5	22.99	11.24	MixtralForCausalLM
TeenyTinyLlama-460m-Chat	25.49	20.29	25.45	26.74	43.77	4.52	34	33.49	22.99	18.13	LlamaForCausalLM
Mambarim-110M	14.16	18.4	10.57	21.87	16.09	1.89	9.29	15.75	17.77	15.79	MambaForCausalLM
GloriaTA-3B	4.09	1.89	3.2	5.19	0	2.32	0.26	0.28	23.52	0.19	GPTNeoForCausalLM

Downloads last month: 28

Safetensors

Model size

69.8M params

Tensor type

F32

Model tree for dominguesm/mambarim-110m

Adapters

1 model

dominguesm
/

mambarim-110m