Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Zihan Wang's picture

1 1

Zihan Wang

ZihanWang99

21world's profile picture

weizongNB's profile picture

JKuang96's profile picture

·

AI & ML interests

None yet

Organizations

ZihanWang99 's collections 6

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Paper • 2401.08671 • Published Jan 9, 2024 • 15

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 56
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 159
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4, 2024 • 38

reading comprehension

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers

Paper • 2401.04695 • Published Jan 9, 2024 • 13

long context LLM

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 26
Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 16
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7, 2024 • 27
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6, 2024 • 15

The Impact of Reasoning Step Length on Large Language Models

Paper • 2401.04925 • Published Jan 10, 2024 • 18
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 131

Code Generation

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 66

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Paper • 2401.08671 • Published Jan 9, 2024 • 15

long context LLM

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 26
Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 16
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7, 2024 • 27
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6, 2024 • 15

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 56
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 159
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4, 2024 • 38

The Impact of Reasoning Step Length on Large Language Models

Paper • 2401.04925 • Published Jan 10, 2024 • 18
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 131

reading comprehension

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers

Paper • 2401.04695 • Published Jan 9, 2024 • 13

Code Generation

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 66

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs