Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

GEON ZH's picture

1

GEON ZH

GEONTT

AI & ML interests

None yet

Organizations

None yet

GEONTT 's collections 7

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 22
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 111
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection

Paper • 2403.19888 • Published Mar 29, 2024 • 12
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 38
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 21
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 33
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 29
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 21

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Paper • 2406.11896 • Published Jun 14, 2024 • 20

Seamless Human Motion Composition with Blended Positional Encodings

Paper • 2402.15509 • Published Feb 23, 2024 • 15
TripoSR: Fast 3D Object Reconstruction from a Single Image

Paper • 2403.02151 • Published Mar 4, 2024 • 16
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Paper • 2403.09981 • Published Mar 15, 2024 • 8

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 195
MusicHiFi: Fast High-Fidelity Stereo Vocoding

Paper • 2403.10493 • Published Mar 15, 2024 • 19
Music Consistency Models

Paper • 2404.13358 • Published Apr 20, 2024 • 14
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 38

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1, 2024 • 15
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 23
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 22
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 111
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection

Paper • 2403.19888 • Published Mar 29, 2024 • 12
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107

Seamless Human Motion Composition with Blended Positional Encodings

Paper • 2402.15509 • Published Feb 23, 2024 • 15
TripoSR: Fast 3D Object Reconstruction from a Single Image

Paper • 2403.02151 • Published Mar 4, 2024 • 16
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Paper • 2403.09981 • Published Mar 15, 2024 • 8

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 38
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 53
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46
Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 195
MusicHiFi: Fast High-Fidelity Stereo Vocoding

Paper • 2403.10493 • Published Mar 15, 2024 • 19
Music Consistency Models

Paper • 2404.13358 • Published Apr 20, 2024 • 14
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 38

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 21
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 33
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 29
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 21

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1, 2024 • 15
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 23
FiT: Flexible Vision Transformer for Diffusion Model

Paper • 2402.12376 • Published Feb 19, 2024 • 48

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Paper • 2406.11896 • Published Jun 14, 2024 • 20

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs