Model Summary

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

This is a GGUF quantized version.

Usage

Requires latest llama.cpp to run.

Generation

This is a simple example of how to use the PowerMoe GGUF:

./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"

Downloads last month: 8

GGUF

Model size

4B params

Architecture

granite

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for TobDeBer/PowerMoe-3b-GGUF

Base model

ibm-research/PowerMoE-3b

Quantized

(6)

this model

Paper for TobDeBer/PowerMoe-3b-GGUF

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23, 2024 • 23

Evaluation results

accuracy-norm on ARC
self-reported

58.100
accuracy on BoolQ
self-reported

65.000
accuracy-norm on Hellaswag
self-reported

71.500
accuracy-norm on OpenBookQA
self-reported

41.000
accuracy-norm on PIQA
self-reported

79.100
accuracy-norm on Winogrande
self-reported

65.000
accuracy on MMLU (5 shot)
self-reported

42.800
accuracy on GSM8k (5 shot)
self-reported

25.900
accuracy on math (4 shot)
self-reported

14.800
pass@1 on humaneval
self-reported

20.100