19 69 224

Thomas Liang PRO

thliang01

https://ating.dev/

thliang01

AI & ML interests

Efficient ML, diffusion model, LLM, post-training

Recent Activity

liked a model 4 days ago

moonshotai/Kimi-K2-Thinking

liked a dataset 5 days ago

ntuai/multi-tw

upvoted an article 6 days ago

Fast LoRA inference for Flux with Diffusers and PEFT

View all activity

Organizations

liked a model 4 days ago

moonshotai/Kimi-K2-Thinking

Text Generation • Updated 3 days ago • 79.5k • • 953

liked a dataset 5 days ago

ntuai/multi-tw

Viewer • Updated Sep 26 • 900 • 12 • 2

upvoted an article 6 days ago

Article

Fast LoRA inference for Flux with Diffusers and PEFT

Jul 23

• 50

liked a Space 7 days ago

Alphagenome Agent

👁

Chat with AlphaGenome Agent to analyze gene data

upvoted a paper 7 days ago

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published 13 days ago • 67

liked a model 7 days ago

lianghsun/Llama-3.2-Taiwan-3B-Instruct

Text Generation • 4B • Updated Jan 9 • 28 • 27

upvoted a collection 7 days ago

🦙 Llama-3.2-Taiwan

Collection

Based on the meta-llama/Llama-3.2-*B model, we continue pre-training on a large corpus of Traditional Chinese and non-Chinese language data. • 9 items • Updated Apr 26 • 1

liked a Space 11 days ago

The Smol Training Playbook: The Secrets to Building World-Class LLMs

📝

Display loss curves for training LLMs

upvoted a paper 12 days ago

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

Paper • 2510.20661 • Published 19 days ago • 13

liked a Space 14 days ago

132

Gradio Hackathon Registration Winter 25

📝

Gradio Agents & MCP Hackathon Winter 2025 Registration Page

liked a Space 22 days ago

Gpu Tflop Finder

😻

Get the TFLOPs for your GPU quickly

liked a Space 25 days ago

Trackio

🐠

Log runs from the nanochat project

published a Space about 1 month ago

Twinkle Gallery

🚀

Streamlit template space

reacted to lianghsun's post with 🔥 about 1 month ago

Post

3301

With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project:

📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin .

Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the ikala/tmmluplus benchmark could take *
half a day without finishing.

One question we were especially curious about:
Does shuffling multiple-choice answer order impact model accuracy? 🤔
→ See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1

To address these challenges, Twinkle Eval brings three key innovations to the table:

1️⃣ Parallelized evaluation of samples
2️⃣ Multi-round testing for stability
3️⃣ Randomized answer order to test robustness

After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed.

This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆

If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗