File size: 3,770 Bytes
2ace206
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
base_model:
  - HiDream-ai/HiDream-I1-Full
base_model_relation: quantized
pipeline_tag: text-to-image
tags:
- dfloat11
- df11
- lossless compression
- 70% size, 100% accuracy
---

# DFloat11 Compressed Model: `HiDream-ai/HiDream-I1-Full`

This is a **DFloat11 losslessly compressed** version of the original `HiDream-ai/HiDream-I1-Full` model. It reduces model size by **30%** compared to the original BFloat16 model, while maintaining **bit-identical outputs** and supporting **efficient GPU inference**.

πŸ”₯πŸ”₯πŸ”₯ Thanks to DFloat11 compression, HiDream-I1-Full can now run smoothly on a single 32GB GPU without any quality loss. πŸ”₯πŸ”₯πŸ”₯

### πŸ“Š Performance Comparison

| Metric                                          | HiDream-I1-Full (BFloat16) | HiDream-I1-Full (DFloat11) |
| ----------------------------------------------- | ------------------- | ------------------- |
| Model Size                                      | 34.21 GB            | 24.19 GB            |
| Peak GPU Memory<br>(1024Γ—1024 image generation) | 35.61 GB            | 26.42 GB            |
| Generation Time<br>(A100 GPU)                   | 140 seconds          | 161 seconds          |

### πŸ”§ How to Use

1. Install or upgrade the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*:

    ```bash
    pip install -U dfloat11[cuda12]
    # or if you have CUDA version 11:
    # pip install -U dfloat11[cuda11]
    ```

2. Install or upgrade the diffusers library.

    ```bash
    pip install -U diffusers
    ```

3. To use the DFloat11 model, run the following example code in Python:

    ```python
    import torch
    from transformers import AutoTokenizer
    from diffusers import HiDreamImagePipeline
    from dfloat11 import DFloat11Model

    tokenizer_4 = AutoTokenizer.from_pretrained("DFloat11/Llama-3.1-8B-Instruct-DF11")
    text_encoder_4 = DFloat11Model.from_pretrained("DFloat11/Llama-3.1-8B-Instruct-DF11", device="cpu")
    text_encoder_4.config.output_hidden_states = True
    text_encoder_4.config.output_attentions = True

    pipe = HiDreamImagePipeline.from_pretrained(
        "HiDream-ai/HiDream-I1-Full",
        tokenizer_4=tokenizer_4,
        text_encoder_4=text_encoder_4,
        torch_dtype=torch.bfloat16,
    )
    DFloat11Model.from_pretrained(
        "DFloat11/HiDream-I1-Full-DF11",
        device="cpu",
        bfloat16_model=pipe.transformer,
    )
    pipe.enable_model_cpu_offload()

    image = pipe(
        'A cat wearing a vintage astronaut suit, floating inside a spaceship and gazing out the window at Earth.',
        height=1024,
        width=1024,
        guidance_scale=5.0,
        num_inference_steps=50,
        generator=torch.Generator("cuda").manual_seed(0),
    ).images[0]
    image.save("output.png")
    ```


### πŸ” How It Works

We apply **Huffman coding** to losslessly compress the exponent bits of BFloat16 model weights, which are highly compressible (their 8 bits carry only ~2.6 bits of actual information). To enable fast inference, we implement a highly efficient CUDA kernel that performs on-the-fly weight decompression directly on the GPU.

The result is a model that is **~30% smaller**, delivers **bit-identical outputs**, and achieves performance **comparable to the original** BFloat16 model.

Learn more in our [research paper](https://arxiv.org/abs/2504.11651).

### πŸ“„ Learn More

* **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
* **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
* **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)