Zihan Wang
ZihanWang99
AI & ML interests
None yet
Organizations
MOE
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
Mixtral of Experts
Paper • 2401.04088 • Published • 159 -
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 38
reading comprehension
long context LLM
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Paper • 2402.04347 • Published • 15
COT
Code Generation
LLM infer
long context LLM
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 26 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Paper • 2402.04347 • Published • 15
MOE
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
Mixtral of Experts
Paper • 2401.04088 • Published • 159 -
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 38
COT
reading comprehension
Code Generation