ryanafufu
's Collections
my_read_book
updated
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
•
2407.08083
•
Published
•
32
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
•
2408.11039
•
Published
•
63
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
•
2408.15237
•
Published
•
42
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
•
2409.11355
•
Published
•
30
OmniGen: Unified Image Generation
Paper
•
2409.11340
•
Published
•
115
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
•
2409.12183
•
Published
•
39
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
•
2409.12568
•
Published
•
50
Imagine yourself: Tuning-Free Personalized Image Generation
Paper
•
2409.13346
•
Published
•
70
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
140
MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
•
2409.16211
•
Published
•
17
Emu3: Next-Token Prediction is All You Need
Paper
•
2409.18869
•
Published
•
95
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
•
2412.09626
•
Published
•
21
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
108
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper
•
2412.11815
•
Published
•
26
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
•
2412.18319
•
Published
•
39
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper
•
2501.06186
•
Published
•
65
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
54
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
301
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper
•
2501.06751
•
Published
•
32
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
425
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
•
2501.12202
•
Published
•
48
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient
Long-Context LLM Inference
Paper
•
2502.00299
•
Published
•
2
Region-Adaptive Sampling for Diffusion Transformers
Paper
•
2502.10389
•
Published
•
53
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent
Image Generation
Paper
•
2502.18364
•
Published
•
37
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
171
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
Paper
•
2503.18886
•
Published
•
22
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
•
2504.09454
•
Published
•
11
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Paper
•
2503.10772
•
Published
•
19
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
Transformers via In-Context Reflection
Paper
•
2503.12271
•
Published
•
9
From Reflection to Perfection: Scaling Inference-Time Optimization for
Text-to-Image Diffusion Models via Reflection Tuning
Paper
•
2504.16080
•
Published
•
15
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture
Design in Text to Image Generation
Paper
•
2503.10618
•
Published
•
18
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
•
2504.20966
•
Published
•
32
Flow-GRPO: Training Flow Matching Models via Online RL
Paper
•
2505.05470
•
Published
•
86
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Paper
•
2505.04588
•
Published
•
65
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision
Encoders for Multimodal Learning
Paper
•
2505.04601
•
Published
•
29
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
186
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper
•
2506.14603
•
Published
•
19
Medical World Model: Generative Simulation of Tumor Evolution for
Treatment Planning
Paper
•
2506.02327
•
Published
•
20
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction
and Planning
Paper
•
2506.09985
•
Published
•
29
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow
Development
Paper
•
2506.05010
•
Published
•
79
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing
Paper
•
2506.17450
•
Published
•
64
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Paper
•
2508.05004
•
Published
•
127
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed
Inference
Paper
•
2508.02193
•
Published
•
130
Representation Shift: Unifying Token Compression with FlashAttention
Paper
•
2508.00367
•
Published
•
15
Qwen-Image Technical Report
Paper
•
2508.02324
•
Published
•
261
Task structure and nonlinearity jointly determine learned
representational geometry
Paper
•
2401.13558
•
Published
DCPO: Dynamic Clipping Policy Optimization
Paper
•
2509.02333
•
Published
•
21
DoPE: Denoising Rotary Position Embedding
Paper
•
2511.09146
•
Published
•
87