Multimodal Pre-training
Exploring pre-training paradigms of large models across modalities towards Artificial General Intelligence (AGI).
Paper • 2405.16528 • Published • 3Note Efficient Training | Check TSAIL(https://ml.cs.tsinghua.edu.cn/), GaLore(https://github.com/jiaweizzhao/GaLore) for more.
Scaling Vision Pre-Training to 4K Resolution
Paper • 2503.19903 • Published • 41Note Scalibility | Scalability Research Paradigms for Large Models Research on the scalability of large models typically involves several common paradigms: Scaling Laws Studying how model performance improves as parameters, data, and compute increase, often fitting empirical power-law relationships. This helps predict the returns of training larger models. Key papers: Kaplan et al., "Scaling Laws for Neural Language Models" (2020) Henighan et al., "Scaling Laws for Autoregressive Generative Model
mohdmus99/slurm_commands
Viewer • Updated • 73 • 9Note DeepOps | SLURM skills range from basic job submission to expert-level cluster management. Level 1 covers essential commands like sbatch, srun, squeue, enabling users to run simple jobs. Level 2 focuses on efficient resource requests (--mem, --cpus-per-task), logging, and job history (sacct). Level 3 introduces advanced scheduling techniques such as job dependencies (--dependency), priority management, partitions, and job arrays. Level 4 involves expert skills like diagnosing scheduler behavior
lldacing/flash-attention-windows-wheel
Updated • 267Note Speed-up attention | Other work includes xformers, sage attention(https://github.com/thu-ml/SageAttention). NB CUDA complier.
Embodied-CoT/ecot-openvla-7b-oxe
Robotics • 8B • Updated • 135 • 2Note Minkowsky style Sparse Tensor, to speed up sparse voxel in space. | recommend of pruning of Large Reconstruction Model - style or NeRF - style work, TensorRF, distillat field of 3D Gaussian. Pretrained LRM could be leveraged for Mesh generation. Some game eigine companies: unity, roblox(https://github.com/Roblox/cube/tree/main/cube3d) Recent impressive works and pre-trained backbones: Hunyuan3D, TRELLIS, InstantMesh, Zero123, ... Also check Dream- style works for more generation
-
BartenderXD/Hunyuan3D
Updated • 6 -
BartenderXD/MVRLT
Viewer • Updated • 128 • 5 -
Scaling Language-Free Visual Representation Learning
Paper • 2504.01017 • Published • 32 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
Paper • 2505.11594 • Published • 75 -
ranjaykrishna/visual_genome
Updated • 523 • 79 -
BLINK-Benchmark/BLINK
Viewer • Updated • 3.81k • 13.4k • 33 -
gorilla-llm/Berkeley-Function-Calling-Leaderboard
Preview • Updated • 3.61k • 89 -
LightSwitch: Multi-view Relighting with Material-guided Diffusion
Paper • 2508.06494 • Published • 3 -
nvidia/Cosmos-Reason1-7B
Image-Text-to-Text • 8B • Updated • 289k • 196 -
Frywind/REVPT-7B
Image-Text-to-Text • 8B • Updated • 30 • 1