nielsr HF Staff commited on
Commit
5e26e3a
·
verified ·
1 Parent(s): 740fb86

Improve model card: add metadata and refine description

Browse files

This PR improves the model card for LYNX by:

- Adding `pipeline_tag: text-generation` to reflect the model's primary function of controlling and optimizing text generation from LLMs.
- Adding `library_name: transformers` as the model is designed to work with Hugging Face causal LMs, providing compatibility with the `transformers` library for model loading and tokenization. This enables the automated "how to use" widget on the Hub.
- Replacing the entire paper abstract with a concise and informative description directly from the project's GitHub README, improving readability and adhering to best practices for model cards.
- Including the full BibTeX citation from the GitHub repository.

The existing `license: apache-2.0`, arXiv, and GitHub links are maintained. A sample usage snippet has not been added as no direct inference code was found within the provided LYNX GitHub README, adhering to the project's guidelines.

Files changed (1) hide show
  1. README.md +27 -3
README.md CHANGED
@@ -1,12 +1,36 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
4
 
 
 
5
  <!-- Badges -->
6
  [![arXiv](https://img.shields.io/badge/arXiv-2512.05325-b31b1b.svg)](https://arxiv.org/abs/2512.05325)
7
  [![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/farukakgul/LYNX)
8
 
9
- <div style="font-size: 15px; line-height: 1.55;">
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often “overthink”: continuing to reason long after they internally have enough information to answer correctly. This wastes inference-time compute and can even hurt accuracy. Existing attempts to stop early either manipulate decoding with extra sampling and heuristics, rely on auxiliary verifier models, or operate only as post-hoc analysis pipelines without formal guarantees. We introduce LYNX, an online early-exit mechanism that turns a model’s own hidden-state awareness into confidence-controlled stopping decisions. LYNX attaches exit decisions to naturally occurring reasoning cues (e.g., “hmm”, “wait”) during generation, trains a lightweight probe on hidden states at those cue tokens using supervision from forced exits, and wraps the resulting scores in split conformal prediction to obtain distribution-free control over the rate of premature exits. Crucially, we train and calibrate this probe once on a generic mathematical corpus and then reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks. Across three model families spanning 1.5B to 32B parameters (DeepSeek-R1-1.5B, QwQ-32B, and Llama-3.1-Nemotron-8B), a single mathematically trained probe per base model yields strong accuracy–efficiency tradeoffs. On GSM8K, LYNX matches or improves baseline accuracy while reducing tokens by 40–65%; on MATH-500 it improves accuracy by up to 12 points with roughly 35–60% fewer tokens; on AIME 2024 it recovers baseline accuracy with more than 50% token savings; and on CommonsenseQA, a non-math benchmark, it transfers zero-shot with modest accuracy gains and up to 70% fewer tokens. Compared to state-of-the-art early-exit methods, LYNX offers competitive or superior Pareto frontiers while remaining fully online, requiring no proxy models at inference, and providing explicit, user-tunable confidence guarantees. Code is available at https://github.com/farukakgul/LYNX.
12
- </div>
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
  ---
6
 
7
+ # LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
8
+
9
  <!-- Badges -->
10
  [![arXiv](https://img.shields.io/badge/arXiv-2512.05325-b31b1b.svg)](https://arxiv.org/abs/2512.05325)
11
  [![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/farukakgul/LYNX)
12
 
13
+ LYNX turns a reasoning model’s own hidden states into **confidence‑controlled early exits**. At naturally occurring cue tokens (e.g., `hmm`, `wait`, `alternatively`), LYNX:
14
+ 1. Extracts features from a few intermediate layers.
15
+ 2. Uses a lightweight probe to predict whether the final answer will be correct if we stop now.
16
+ 3. Wraps the probe with split conformal prediction to get a **user‑tunable confidence level** and explicit guarantees.
17
+
18
+ This repository contains a minimal, self‑contained implementation of that pipeline for open‑weight LMs (e.g., DeepSeek‑R1‑1.5B, QwQ‑32B, and Llama‑3.1‑Nemotron‑8B), featuring an HF‑only pipeline for training, calibration, and evaluation.
19
+
20
+ For more details, refer to the [paper](https://huggingface.co/papers/2512.05325) and the [GitHub repository](https://github.com/farukakgul/LYNX).
21
+
22
+ ## Citation
23
+
24
+ If you find LYNX useful, please cite the accompanying paper:
25
 
26
+ ```bibtex
27
+ @misc{akgül2025lynxlearningdynamicexits,
28
+ title={LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning},
29
+ author={Ömer Faruk Akgül and Yusuf Hakan Kalaycı and Rajgopal Kannan and Willie Neiswanger and Viktor Prasanna},
30
+ year={2025},
31
+ eprint={2512.05325},
32
+ archivePrefix={arXiv},
33
+ primaryClass={cs.CL},
34
+ url={https://arxiv.org/abs/2512.05325},
35
+ }
36
+ ```