English

tuxsentience-beta3

Our second open-weight model, in progress. For now this documents progress and details.

Model Information

It has been decided that this will be based off Qwen3 8B.

It will like the last one most likely be 4-bit, but due to our new training methods (detailed below) we may release larger sizes.

Training Information

We are attempting to train this model via distributed computing, this is how our current setup looks so far:

  • i9-10910, 32GB RAM, RX 7600 (8GB)
  • i5-13420H, 16GB RAM, RTX 3050 Mobile (6GB)
  • i5-12400, 32GB RAM, RTX 3060 (12GB)
  • Ryzen 7 9800X3D, 32GB RAM, RTX 3080 (10GB)

Amounting to around 98.47 TFLOPS. image/png

In the future we are trying to aquire better hardware and a RX 9070 XT is planned for future models. Currently we are attempting unsloth + ray for distributed computing.

Benchmarks

Coming soon to an accuracy near you

FAQ

  • Q: This implies the existance of beta1 and alpha versions
  • A: They do exist, however they were never published and most likely never will be

Made possible by

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GrainWare/tuxsentience-beta3

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(1)
this model

Dataset used to train GrainWare/tuxsentience-beta3

Collection including GrainWare/tuxsentience-beta3