-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:1611.01578
-
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Paper • 2011.09011 • Published • 2 -
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Paper • 2005.14187 • Published • 2 -
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Paper • 2003.11142 • Published • 2 -
Efficient Architecture Search by Network Transformation
Paper • 1707.04873 • Published • 2
-
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper • 2504.07128 • Published • 86 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 75 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 27
-
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Paper • 2011.09011 • Published • 2 -
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Paper • 2005.14187 • Published • 2 -
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Paper • 2003.11142 • Published • 2 -
Efficient Architecture Search by Network Transformation
Paper • 1707.04873 • Published • 2