5 74

Shivaen Ramshetty

shivr

sramshetty

AI & ML interests

NLP, CV, Multimodal

Recent Activity

upvoted a paper 20 days ago

Attention Is All You Need for KV Cache in Diffusion LLMs

upvoted a paper about 1 month ago

Less is More: Recursive Reasoning with Tiny Networks

upvoted a paper about 1 month ago

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

View all activity

Organizations

upvoted a paper 20 days ago

Attention Is All You Need for KV Cache in Diffusion LLMs

Paper • 2510.14973 • Published 24 days ago • 37

upvoted 4 papers about 1 month ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 468

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published Sep 28 • 115

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 54

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30 • 526

upvoted 4 papers 2 months ago

upvoted a paper 6 months ago

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29 • 32

upvoted 10 papers over 1 year ago

Tokenization Falling Short: The Curse of Tokenization

Paper • 2406.11687 • Published Jun 17, 2024 • 16

TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18, 2024 • 35

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Paper • 2406.06282 • Published Jun 10, 2024 • 38

The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published Jun 6, 2024 • 68

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Paper • 2405.18669 • Published May 29, 2024 • 12

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 115

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 79

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55

Shivaen Ramshetty

AI & ML interests

Recent Activity

Organizations

shivr's activity