LAION eV

non-profit

https://laion.ai

laion_ai

LAION-AI

AI & ML interests

datasets, computer vision

Recent Activity

mehdidc updated a model 7 minutes ago

laion/scaling-laws-for-comparison

ChristophSchuhmann updated a dataset 32 minutes ago

laion/freesound-commercially-permissive-subset-with-captions

TianyuZhang authored a paper 9 days ago

ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods

View all activity

mehdidc

updated a model 7 minutes ago

laion/scaling-laws-for-comparison

Updated 7 minutes ago • 1

ChristophSchuhmann

updated a dataset 32 minutes ago

laion/freesound-commercially-permissive-subset-with-captions

Viewer • Updated 32 minutes ago • 397k • 33 • 2

TianyuZhang

authored 10 papers 9 days ago

ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods

Paper • 2110.02871 • Published Oct 6, 2021

MuPT: A Generative Symbolic Music Pretrained Transformer

Paper • 2404.06393 • Published Apr 9, 2024 • 16

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Paper • 2211.06687 • Published Nov 12, 2022 • 4

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 14

STRICT: Stress Test of Rendering Images Containing Text

Paper • 2505.18985 • Published May 25

A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning

Paper • 2507.06542 • Published Jul 9

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Paper • 2406.07529 • Published Jun 11, 2024

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

Paper • 2510.03230 • Published Oct 3 • 3

Chronological Thinking in Full-Duplex Spoken Dialogue Language Models

Paper • 2510.05150 • Published Oct 2

Scope: Selective Cross-modal Orchestration of Visual Perception Experts

Paper • 2510.12974 • Published 25 days ago

W4ng1204

authored a paper 9 days ago

InteractComp: Evaluating Search Agents With Ambiguous Queries

Paper • 2510.24668 • Published 11 days ago • 96

TianyuZhang

authored a paper 9 days ago

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published 10 days ago • 202

mrfakename

posted an update 12 days ago

Post

3171

Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)

4 replies

·

sheryc

authored a paper 19 days ago

Scope: Selective Cross-modal Orchestration of Visual Perception Experts

Paper • 2510.12974 • Published 25 days ago

W4ng1204

authored a paper 23 days ago

VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering

Paper • 2510.10828 • Published 27 days ago • 1

sheryc

authored a paper 25 days ago

VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering

Paper • 2510.10828 • Published 27 days ago • 1

sheryc

authored a paper about 1 month ago

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

Paper • 2510.03230 • Published Oct 3 • 3

huu-ontocord

authored a paper about 1 month ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29 • 7