Papers
arXiv:2510.25602

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Published on Oct 29
· Submitted by Mengzhao Chen on Nov 3
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

A comprehensive comparison of low-precision floating-point and integer quantization in large language models reveals that fine-grained integer formats, especially MXINT8, offer superior accuracy and efficiency over floating-point formats in many cases.

AI-generated summary

Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats to handle the pervasive activation outliers in Large Language Models (LLMs). Despite this industry trend, a unified comparison of FP and integer (INT) quantization across varying granularities has been missing, leaving algorithm and hardware co-design without clear guidance. This paper fills that gap by systematically investigating the trade-offs between FP and INT formats. We reveal a critical performance crossover: while FP excels in coarse-grained quantization, the comparison at fine-grained (block-wise) levels is more nuanced. Our comprehensive comparison demonstrates that for popular 8-bit fine-grained formats (e.g., MX with block size 32), MXINT8 is superior to its FP counterpart in both algorithmic accuracy and hardware efficiency. However, for 4-bit formats, FP (e.g., MXFP4, NVFP4) often holds an accuracy advantage , though we show that NVINT4 can surpass NVFP4 when outlier-mitigation techniques like Hadamard rotation are applied. We also introduce a symmetric clipping method that resolves gradient bias in fine-grained low-bit INT training, enabling nearly lossless performance for MXINT8 training. These findings challenge the current hardware trajectory, demonstrating that a one-size-fits-all FP approach is suboptimal and advocating that fine-grained INT formats, particularly MXINT8, offer a better balance of accuracy, power, and efficiency for future AI accelerators.

Community

Paper submitter

Is nvidia wrong? This paper proves that MXINT8 is better than MXFP8

·

From the paper:
We train 1B and 3B Llama3-
style [13] models on the OLMo2-Mix-1124 [33] pretraining dataset, with 100B and 200B training tokens, respectively.

Once tested on large model + dataset then it will be a game changer.

Thanks!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Has this approach been verified to on larger-scale data and models?

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.25602 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.25602 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.25602 in a Space README.md to link it from this page.

Collections including this paper 5