Mambarim-110M

Camarim Logo


Model Summary

Mambarim-110M is a pioneering 110-million-parameter language model for Portuguese, built upon the Mamba architecture. Unlike traditional Transformer models that rely on quadratic self-attention, Mamba is a State-Space Model (SSM) that processes sequences with linear complexity.

This design choice leads to significantly faster inference and reduced memory consumption, especially for long sequences. Mamba employs a selection mechanism that allows it to effectively focus on relevant information in the context, making it a powerful and efficient alternative to Transformers. Mambarim-110M is one of the first Mamba-based models developed specifically for the Portuguese language.

Details

  • Architecture: a Mamba model pre-trained via causal language modeling
  • Size: 119,930,880 parameters
  • Context length: 2048 tokens
  • Dataset: Pt-Corpus-Instruct-tokenized-large (6.2B tokens)
  • Language: Portuguese
  • Number of steps: 758,423

Training & Reproducibility

This model was trained to be fully open and reproducible. You can find all the resources used below:

Intended Uses

This model is intended for a variety of text generation tasks in Portuguese. Given its size, it is particularly well-suited for environments with limited computational resources.

  • General-Purpose Text Generation: The model can be used for creative writing, continuing a story, or generating text based on a prompt.
  • Research and Education: As one of the first Portuguese Mamba models, it serves as an excellent resource for researchers studying State-Space Models, computational efficiency in LLMs, and NLP for non-English languages. Its smaller size also makes it an accessible tool for educational purposes.
  • Fine-tuning Base: The model can be fine-tuned on specific datasets to create more specialized models for tasks like simple chatbots, content creation aids, or domain-specific text generation.

Out-of-scope Use

The model is not intended for use in critical applications without comprehensive testing and fine-tuning. Users should be aware of the following limitations:

  • Factual Accuracy: This model is not a knowledge base and can generate incorrect or fabricated information ("hallucinate"). It should not be used as a source of truth.
  • High-Stakes Decisions: Do not use this model for making important decisions in domains such as medical, legal, or financial advice, as its outputs may be unreliable.
  • Bias and Safety: The model was trained on a large corpus of public data from the internet and may reflect societal biases present in that data. It can generate content that is biased, offensive, or otherwise harmful.

Basic usage

You need to install transformers from main until transformers>=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
    input_ids,
    repetition_penalty=1.2,
    temperature=0.8,
    top_k=50,
    top_p=0.85,
    do_sample=True,
    max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]

Benchmarks

Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).

Detailed results can be found here

Model Average ENEM BLUEX OAB Exams ASSIN2 RTE ASSIN2 STS FAQNAD NLI HateBR PT Hate Speech tweetSentBR Architecture
TeenyTinyLlama-460m 28.86 20.15 25.73 27.02 53.61 13 46.41 33.59 22.99 17.28 LlamaForCausalLM
TeenyTinyLlama-160m 28.2 19.24 23.09 22.37 53.97 0.24 43.97 36.92 42.63 11.39 LlamaForCausalLM
MulaBR/Mula-4x160-v0.1 26.24 21.34 25.17 25.06 33.57 11.35 43.97 41.5 22.99 11.24 MixtralForCausalLM
TeenyTinyLlama-460m-Chat 25.49 20.29 25.45 26.74 43.77 4.52 34 33.49 22.99 18.13 LlamaForCausalLM
Mambarim-110M 14.16 18.4 10.57 21.87 16.09 1.89 9.29 15.75 17.77 15.79 MambaForCausalLM
GloriaTA-3B 4.09 1.89 3.2 5.19 0 2.32 0.26 0.28 23.52 0.19 GPTNeoForCausalLM
Downloads last month
28
Safetensors
Model size
69.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dominguesm/mambarim-110m

Adapters
1 model

Dataset used to train dominguesm/mambarim-110m

Space using dominguesm/mambarim-110m 1