A collection with text-classification and token-classification models for PII Protection
Alvaro Bartolome
AI & ML interests
machine learning @huggingface (inference + cloud)
Recent Activity
updated
a dataset
about 13 hours ago
huggingface/DEH-image-scan-data
updated
a dataset
about 16 hours ago
google-cloud-partnership/finewiki-en-100k-embeddings
published
a dataset
about 16 hours ago
google-cloud-partnership/finewiki-en-100k-embeddings
Organizations
Critique Models (CM) on the 🤗 Hub
This collection contains some Critique Models (CM) for LLM evaluation available in the HuggingFace Hub
-
openbmb/UltraCM-13b
Text Generation • Updated • 33 • 20 -
prometheus-eval/prometheus-7b-v1.0
Text Generation • Updated • 32 • 31 -
prometheus-eval/prometheus-13b-v1.0
Text Generation • Updated • 145 • 143 -
prometheus-eval/prometheus-7b-v2.0
Text Generation • 7B • Updated • 20.7k • 100
AIF Datasets (with distilabel)
Small to medium size datasets either: synthetically generated, labelled with AI Feedback (AIF), or both
NER in Spanish
Fine-tuned models to perform NER in Spanish using the framework SpanMarker and different encoders and datasets
-
alvarobartt/bert-base-multilingual-cased-ner-spanish
Token Classification • 0.2B • Updated • 5 • 3 -
alvarobartt/span-marker-xlm-roberta-large-conll-2002-es
Token Classification • Updated • 2 • 2 -
alvarobartt/span-marker-roberta-base-bne-conll-2002-es
Token Classification • Updated • 14 • 1
From zero to GPT-hero
Reading list to fully understand GPT (and GPT-2) and to be able to implement those from scratch
-
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Generating Wikipedia by Summarizing Long Sequences
Paper • 1801.10198 • Published • 3
Studio Ghibli Diffusion
Text-To-Image fine-tunes with Studio Ghibli style
-
Running on Zero22
FLUX.1 Studio Ghibli LoRA
🖼22Generate Studio Ghibli-style images from text prompts
-
alvarobartt/ghibli-characters
Viewer • Updated • 9 • 48 • 9 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 707k • • 12.3k -
alvarobartt/ghibli-characters-flux-lora
Text-to-Image • Updated • 187 • • 64
About ORPO
Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • 141B • Updated • 191 • 269 -
alvarobartt/mistral-orpo-mix
Text Generation • 7B • Updated • 5 • 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • 7B • Updated • 19 • 14
Apple MLX-compatible 7B LLMs on the 🤗 Hub
This collection contains the model weights for 7B LLMs for Apple's MLX framework. Find more information at https://github.com/ml-explore/mlx
🇪🇸 Datasets in Spanish for LLM Evaluation
This collection contains some datasets for LLM evaluation in Spanish, from nlp.uoregon.edu, translated using ChatGPT (including English counterparts)
🔒 Models for PII Protection
A collection with text-classification and token-classification models for PII Protection
Studio Ghibli Diffusion
Text-To-Image fine-tunes with Studio Ghibli style
-
Running on Zero22
FLUX.1 Studio Ghibli LoRA
🖼22Generate Studio Ghibli-style images from text prompts
-
alvarobartt/ghibli-characters
Viewer • Updated • 9 • 48 • 9 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 707k • • 12.3k -
alvarobartt/ghibli-characters-flux-lora
Text-to-Image • Updated • 187 • • 64
Critique Models (CM) on the 🤗 Hub
This collection contains some Critique Models (CM) for LLM evaluation available in the HuggingFace Hub
-
openbmb/UltraCM-13b
Text Generation • Updated • 33 • 20 -
prometheus-eval/prometheus-7b-v1.0
Text Generation • Updated • 32 • 31 -
prometheus-eval/prometheus-13b-v1.0
Text Generation • Updated • 145 • 143 -
prometheus-eval/prometheus-7b-v2.0
Text Generation • 7B • Updated • 20.7k • 100
About ORPO
Contains some information and experiments fine-tuning LLMs using 🤗 `trl.ORPOTrainer`
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 72 -
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Text Generation • 141B • Updated • 191 • 269 -
alvarobartt/mistral-orpo-mix
Text Generation • 7B • Updated • 5 • 1 -
alvarobartt/Mistral-7B-v0.1-ORPO
Text Generation • 7B • Updated • 19 • 14
AIF Datasets (with distilabel)
Small to medium size datasets either: synthetically generated, labelled with AI Feedback (AIF), or both
Apple MLX-compatible 7B LLMs on the 🤗 Hub
This collection contains the model weights for 7B LLMs for Apple's MLX framework. Find more information at https://github.com/ml-explore/mlx
NER in Spanish
Fine-tuned models to perform NER in Spanish using the framework SpanMarker and different encoders and datasets
-
alvarobartt/bert-base-multilingual-cased-ner-spanish
Token Classification • 0.2B • Updated • 5 • 3 -
alvarobartt/span-marker-xlm-roberta-large-conll-2002-es
Token Classification • Updated • 2 • 2 -
alvarobartt/span-marker-roberta-base-bne-conll-2002-es
Token Classification • Updated • 14 • 1
🇪🇸 Datasets in Spanish for LLM Evaluation
This collection contains some datasets for LLM evaluation in Spanish, from nlp.uoregon.edu, translated using ChatGPT (including English counterparts)
From zero to GPT-hero
Reading list to fully understand GPT (and GPT-2) and to be able to implement those from scratch
-
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
Attention Is All You Need
Paper • 1706.03762 • Published • 112 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Generating Wikipedia by Summarizing Long Sequences
Paper • 1801.10198 • Published • 3