Multimodal Pre-training - a dy123947 Collection

dy123947 's Collections

Multimodal Pre-training

updated Sep 20

Exploring pre-training paradigms of large models across modalities towards Artificial General Intelligence (AGI).

Upvote

LoQT: Low Rank Adapters for Quantized Training

Paper • 2405.16528 • Published May 26, 2024 • 3

Note Efficient Training | Check TSAIL(https://ml.cs.tsinghua.edu.cn/), GaLore(https://github.com/jiaweizzhao/GaLore) for more.
Scaling Vision Pre-Training to 4K Resolution

Paper • 2503.19903 • Published Mar 25 • 41

Note Scalibility | Scalability Research Paradigms for Large Models Research on the scalability of large models typically involves several common paradigms: Scaling Laws Studying how model performance improves as parameters, data, and compute increase, often fitting empirical power-law relationships. This helps predict the returns of training larger models. Key papers: Kaplan et al., "Scaling Laws for Neural Language Models" (2020) Henighan et al., "Scaling Laws for Autoregressive Generative Model
mohdmus99/slurm_commands

Viewer • Updated Nov 5, 2024 • 73 • 9

Note DeepOps | SLURM skills range from basic job submission to expert-level cluster management. Level 1 covers essential commands like sbatch, srun, squeue, enabling users to run simple jobs. Level 2 focuses on efficient resource requests (--mem, --cpus-per-task), logging, and job history (sacct). Level 3 introduces advanced scheduling techniques such as job dependencies (--dependency), priority management, partitions, and job arrays. Level 4 involves expert skills like diagnosing scheduler behavior
lldacing/flash-attention-windows-wheel

Updated May 31 • 267

Note Speed-up attention | Other work includes xformers, sage attention(https://github.com/thu-ml/SageAttention). NB CUDA complier.
Embodied-CoT/ecot-openvla-7b-oxe

Robotics • 8B • Updated Jul 10, 2024 • 135 • 2

Note Minkowsky style Sparse Tensor, to speed up sparse voxel in space. | recommend of pruning of Large Reconstruction Model - style or NeRF - style work, TensorRF, distillat field of 3D Gaussian. Pretrained LRM could be leveraged for Mesh generation. Some game eigine companies: unity, roblox(https://github.com/Roblox/cube/tree/main/cube3d) Recent impressive works and pre-trained backbones: Hunyuan3D, TRELLIS, InstantMesh, Zero123, ... Also check Dream- style works for more generation
BartenderXD/Hunyuan3D

Updated May 20 • 6
BartenderXD/MVRLT

Viewer • Updated May 20 • 128 • 5
Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published Apr 1 • 32
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75
ranjaykrishna/visual_genome

Updated Jun 29, 2023 • 523 • 79
BLINK-Benchmark/BLINK

Viewer • Updated Sep 3 • 3.81k • 13.4k • 33
gorilla-llm/Berkeley-Function-Calling-Leaderboard

Preview • Updated Feb 14 • 3.61k • 89
LightSwitch: Multi-view Relighting with Material-guided Diffusion

Paper • 2508.06494 • Published Aug 8 • 3
nvidia/Cosmos-Reason1-7B

Image-Text-to-Text • 8B • Updated Aug 14 • 289k • 196
Frywind/REVPT-7B

Image-Text-to-Text • 8B • Updated Oct 3 • 30 • 1

Upvote