2 10 5

Ofir Zafrir

ofirzaf

AI & ML interests

Sparsity, Qunatization, Model Compression

Recent Activity

upvoted an article about 2 months ago

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

published an article about 2 months ago

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

liked a model about 2 months ago

OpenVINO/Qwen3-pruned-6L-from-0.6B-int8-ov

View all activity

Organizations

upvoted an article about 2 months ago

Article

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Sep 29

•

published an article about 2 months ago

Article

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Sep 29

•

liked a model about 2 months ago

OpenVINO/Qwen3-pruned-6L-from-0.6B-int8-ov

Updated Sep 24 • 75 • 1

upvoted an article 3 months ago

Article

Breaking Language Barriers in Mathematical AI: Introducing Hebrew Math Tutor

Sep 7

•

liked 2 models 5 months ago

OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov

Updated Dec 16, 2024 • 78 • 5

OpenVINO/Phi-4-mini-FastDraft-120M-int8-ov

Updated May 8 • 14 • 2

upvoted an article 7 months ago

Article

Introducing HELMET: Holistically Evaluating Long-context Language Models

Apr 16

•

upvoted an article 8 months ago

Article

Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques

Mar 24

•

upvoted a paper 9 months ago

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Paper • 2502.09390 • Published Feb 13 • 16

upvoted a collection 9 months ago

Speculative Decoding Draft Models

Collection

Collection of OpenVINO optimized efficient draft models for speculative decoding • 4 items • Updated Sep 16 • 10

liked a model 9 months ago

OpenVINO/Llama-3.1-8B-Instruct-FastDraft-150M-int8-ov

Updated Dec 16, 2024 • 1.77k • 9

upvoted an article 10 months ago

Article

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

Mar 20, 2024

•

authored 2 papers about 1 year ago

Q8BERT: Quantized 8Bit BERT

Paper • 1910.06188 • Published Oct 14, 2019 • 2

FastDraft: How to Train Your Draft

Paper • 2411.11055 • Published Nov 17, 2024 • 11

upvoted a paper about 1 year ago

FastDraft: How to Train Your Draft

Paper • 2411.11055 • Published Nov 17, 2024 • 11

upvoted a paper over 1 year ago

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Paper • 2408.02545 • Published Aug 5, 2024 • 39

published an article over 1 year ago

Article

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

Mar 20, 2024

•

published an article almost 2 years ago

Article

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Jan 30, 2024

•

authored a paper over 2 years ago

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Paper • 2306.16601 • Published Jun 28, 2023 • 4

liked a Space over 2 years ago

Open LLM Leaderboard

🏆

13.7k

Track, rank and evaluate open LLMs and chatbots

Ofir Zafrir

AI & ML interests

Recent Activity

Organizations

ofirzaf's activity

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Breaking Language Barriers in Mathematical AI: Introducing Hebrew Math Tutor

Introducing HELMET: Holistically Evaluating Long-context Language Models

Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Open LLM Leaderboard