view article Article Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models Sep 29 • 21
view article Article Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models Sep 29 • 21
view article Article Breaking Language Barriers in Mathematical AI: Introducing Hebrew Math Tutor Sep 7 • 2
view article Article Introducing HELMET: Holistically Evaluating Long-context Language Models Apr 16 • 40
view article Article Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques Mar 24 • 20
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published Feb 13 • 16
Speculative Decoding Draft Models Collection Collection of OpenVINO optimized efficient draft models for speculative decoding • 4 items • Updated Sep 16 • 10
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 39
view article Article Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding Jan 30, 2024 • 9
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs Paper • 2306.16601 • Published Jun 28, 2023 • 4
Running on CPU Upgrade 13.7k Open LLM Leaderboard 🏆 13.7k Track, rank and evaluate open LLMs and chatbots