Yamata Zen
yamatazen
AI & ML interests
None yet
Recent Activity
upvoted
an
article
1 day ago
Weβre open-sourcing our text-to-image model and the process behind it
upvoted
a
paper
4 days ago
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Organizations
None yet
Autoregressive image generation
AGI
-
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper β’ 2507.00951 β’ Published β’ 24 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper β’ 2505.04620 β’ Published β’ 82 -
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Paper β’ 2507.06952 β’ Published β’ 7
Multilingual LLMs
-
Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability
Paper β’ 2306.06688 β’ Published -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper β’ 2412.14471 β’ Published -
Language Models' Factuality Depends on the Language of Inquiry
Paper β’ 2502.17955 β’ Published β’ 33 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper β’ 2410.01335 β’ Published β’ 5
AI censorship
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper β’ 2501.18492 β’ Published β’ 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper β’ 2412.19512 β’ Published β’ 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper β’ 2407.16637 β’ Published β’ 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper β’ 2406.11717 β’ Published β’ 4
Grokking
-
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper β’ 2405.15071 β’ Published β’ 41 -
Grokking at the Edge of Numerical Stability
Paper β’ 2501.04697 β’ Published β’ 2 -
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Paper β’ 2506.21551 β’ Published β’ 28
Optimizers
GGUF tools
Model merging
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper β’ 2403.13257 β’ Published β’ 21 -
Model Stock: All we need is just a few fine-tuned models
Paper β’ 2403.19522 β’ Published β’ 13 -
Mergenetic: a Simple Evolutionary Model Merging Library
Paper β’ 2505.11427 β’ Published β’ 14 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper β’ 2410.01335 β’ Published β’ 5
Japanese LLMs
-
mradermacher/Himeyuri-v0.1-12B-i1-GGUF
12B β’ Updated β’ 236 β’ 2 -
spow12/ChatWaifu_12B_v2.0
Text Generation β’ 12B β’ Updated β’ 26 β’ β’ 22 -
Local-Novel-LLM-project/Vecteus-v1
Text Generation β’ 7B β’ Updated β’ 104 β’ 30 -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper β’ 2412.14471 β’ Published
LLM leaderboards
-
Running1.26k
UGI Leaderboard
π’1.26kUncensored General Intelligence Leaderboard
-
Running on CPU Upgrade101
Open Japanese LLM Leaderboard
πΈ101Explore and compare LLM models with interactive filters and visualizations
-
Running on CPU Upgrade13.7k
Open LLM Leaderboard
π13.7kTrack, rank and evaluate open LLMs and chatbots
-
Running4.66k
LMArena Leaderboard
π4.66kDisplay LMArena Leaderboard
Genshin Impact
Optimizers
Autoregressive image generation
GGUF tools
AGI
-
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper β’ 2507.00951 β’ Published β’ 24 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper β’ 2505.04620 β’ Published β’ 82 -
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Paper β’ 2507.06952 β’ Published β’ 7
Model merging
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper β’ 2403.13257 β’ Published β’ 21 -
Model Stock: All we need is just a few fine-tuned models
Paper β’ 2403.19522 β’ Published β’ 13 -
Mergenetic: a Simple Evolutionary Model Merging Library
Paper β’ 2505.11427 β’ Published β’ 14 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper β’ 2410.01335 β’ Published β’ 5
Multilingual LLMs
-
Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability
Paper β’ 2306.06688 β’ Published -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper β’ 2412.14471 β’ Published -
Language Models' Factuality Depends on the Language of Inquiry
Paper β’ 2502.17955 β’ Published β’ 33 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper β’ 2410.01335 β’ Published β’ 5
Japanese LLMs
-
mradermacher/Himeyuri-v0.1-12B-i1-GGUF
12B β’ Updated β’ 236 β’ 2 -
spow12/ChatWaifu_12B_v2.0
Text Generation β’ 12B β’ Updated β’ 26 β’ β’ 22 -
Local-Novel-LLM-project/Vecteus-v1
Text Generation β’ 7B β’ Updated β’ 104 β’ 30 -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper β’ 2412.14471 β’ Published
AI censorship
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper β’ 2501.18492 β’ Published β’ 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper β’ 2412.19512 β’ Published β’ 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper β’ 2407.16637 β’ Published β’ 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper β’ 2406.11717 β’ Published β’ 4
LLM leaderboards
-
Running1.26k
UGI Leaderboard
π’1.26kUncensored General Intelligence Leaderboard
-
Running on CPU Upgrade101
Open Japanese LLM Leaderboard
πΈ101Explore and compare LLM models with interactive filters and visualizations
-
Running on CPU Upgrade13.7k
Open LLM Leaderboard
π13.7kTrack, rank and evaluate open LLMs and chatbots
-
Running4.66k
LMArena Leaderboard
π4.66kDisplay LMArena Leaderboard
Grokking
-
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper β’ 2405.15071 β’ Published β’ 41 -
Grokking at the Edge of Numerical Stability
Paper β’ 2501.04697 β’ Published β’ 2 -
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Paper β’ 2506.21551 β’ Published β’ 28