speechlessai 's Collections Reading Papers
updated
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
Tuning Language Models by Proxy
Paper
• 2401.08565
• Published
• 22
TrustLLM: Trustworthiness in Large Language Models
Paper
• 2401.05561
• Published
• 69
Paper
• 2401.04088
• Published
• 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published
• 53
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
• 2401.02038
• Published
• 65
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
• 2401.02412
• Published
• 38
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper
• 2401.02330
• Published
• 18
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
• 2401.00908
• Published
• 189
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published
• 27
A Comprehensive Study of Knowledge Editing for Large Language Models
Paper
• 2401.01286
• Published
• 21
Improving Text Embeddings with Large Language Models
Paper
• 2401.00368
• Published
• 82
LARP: Language-Agent Role Play for Open-World Games
Paper
• 2312.17653
• Published
• 33
Extending LLMs' Context Window with 100 Samples
Paper
• 2401.07004
• Published
• 16
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper
• 2312.16862
• Published
• 31
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale
Pretraining Corpus for Math
Paper
• 2312.17120
• Published
• 28
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant
for Mobile Devices
Paper
• 2312.16886
• Published
• 22
Human101: Training 100+FPS Human Gaussians in 100s from 1 View
Paper
• 2312.15258
• Published
• 10
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with
Refined Data Generation
Paper
• 2312.14187
• Published
• 49
Reasons to Reject? Aligning Language Models with Judgments
Paper
• 2312.14591
• Published
• 18
AppAgent: Multimodal Agents as Smartphone Users
Paper
• 2312.13771
• Published
• 54
Time is Encoded in the Weights of Finetuned Language Models
Paper
• 2312.13401
• Published
• 20
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Paper
• 2312.13789
• Published
• 15
TinyGSM: achieving >80% on GSM8k with small language models
Paper
• 2312.09241
• Published
• 39
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
• 2312.07987
• Published
• 41
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper
• 2312.07910
• Published
• 16
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Paper
• 2312.06674
• Published
• 8
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
• 2312.06550
• Published
• 57
Beyond Human Data: Scaling Self-Training for Problem-Solving with
Language Models
Paper
• 2312.06585
• Published
• 29
Context Tuning for Retrieval Augmented Generation
Paper
• 2312.05708
• Published
• 16
Evaluation of Large Language Models for Decision Making in Autonomous
Driving
Paper
• 2312.06351
• Published
• 6
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
• 2312.03818
• Published
• 34
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
• 2312.04461
• Published
• 62
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper
• 2312.04474
• Published
• 34
Pearl: A Production-ready Reinforcement Learning Agent
Paper
• 2312.03814
• Published
• 15
OneLLM: One Framework to Align All Modalities with Language
Paper
• 2312.03700
• Published
• 24
LivePhoto: Real Image Animation with Text-guided Motion Control
Paper
• 2312.02928
• Published
• 18
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on
Open-Source Large Language Models
Paper
• 2312.02969
• Published
• 14
Training Chain-of-Thought via Latent-Variable Inference
Paper
• 2312.02179
• Published
• 10
Magicoder: Source Code Is All You Need
Paper
• 2312.02120
• Published
• 82
Segment and Caption Anything
Paper
• 2312.00869
• Published
• 20
Paper
• 2312.00860
• Published
• 10
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
Dolphins: Multimodal Language Model for Driving
Paper
• 2312.00438
• Published
• 15
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
• 2311.13600
• Published
• 47
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
Model
Paper
• 2311.13231
• Published
• 28
Exponentially Faster Language Modelling
Paper
• 2311.10770
• Published
• 119
Orca 2: Teaching Small Language Models How to Reason
Paper
• 2311.11045
• Published
• 77
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language
Model-based Agents in Real-world Systems
Paper
• 2311.11315
• Published
• 7
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper
• 2311.10775
• Published
• 9
ProAgent: From Robotic Process Automation to Agentic Process Automation
Paper
• 2311.10751
• Published
• 10
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper
• 2311.10702
• Published
• 19
SelfEval: Leveraging the discriminative nature of generative models for
evaluation
Paper
• 2311.10708
• Published
• 17
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
• 2311.10093
• Published
• 58
ML-Bench: Large Language Models Leverage Open-source Libraries for
Machine Learning Tasks
Paper
• 2311.09835
• Published
• 11
Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models
Paper
• 2311.08692
• Published
• 13
Unifying the Perspectives of NLP and Software Engineering: A Survey on
Language Models for Code
Paper
• 2311.07989
• Published
• 26
Instruction-Following Evaluation for Large Language Models
Paper
• 2311.07911
• Published
• 22
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads
to Answers Faster
Paper
• 2311.08263
• Published
• 16
The ART of LLM Refinement: Ask, Refine, and Trust
Paper
• 2311.07961
• Published
• 11
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Paper
• 2311.06772
• Published
• 35
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper
• 2311.06495
• Published
• 12
Lumos: Learning Agents with Unified Data, Modular Design, and
Open-Source LLMs
Paper
• 2311.05657
• Published
• 30
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper
• 2311.05556
• Published
• 87
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
• 2311.05437
• Published
• 51
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper
• 2311.04934
• Published
• 32
Can LLMs Follow Simple Rules?
Paper
• 2311.04235
• Published
• 13
Levels of AGI for Operationalizing Progress on the Path to AGI
Paper
• 2311.02462
• Published
• 37
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
• 2311.03285
• Published
• 31
CogVLM: Visual Expert for Pretrained Language Models
Paper
• 2311.03079
• Published
• 27
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper
• 2311.02103
• Published
• 20
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
Paper
• 2311.02303
• Published
• 12
CoVLM: Composing Visual Entities and Relationships in Large Language
Models Via Communicative Decoding
Paper
• 2311.03354
• Published
• 7
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper
• 2311.00272
• Published
• 11
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper
• 2311.00176
• Published
• 9
Learning From Mistakes Makes LLM Better Reasoner
Paper
• 2310.20689
• Published
• 29
Does GPT-4 Pass the Turing Test?
Paper
• 2310.20216
• Published
• 17
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper
• 2310.20624
• Published
• 13
Paper
• 2310.20707
• Published
• 11
LoRAShear: Efficient Large Language Model Structured Pruning and
Knowledge Recovery
Paper
• 2310.18356
• Published
• 24
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language
Modeling Likewise
Paper
• 2310.19019
• Published
• 9
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper
• 2310.17631
• Published
• 35
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
• 2310.16795
• Published
• 27
InstructExcel: A Benchmark for Natural Language Instruction in Excel
Paper
• 2310.14495
• Published
• 2
Auto-Instruct: Automatic Instruction Generation and Ranking for
Black-Box Language Models
Paper
• 2310.13127
• Published
• 12
ToolChain*: Efficient Action Space Navigation in Large Language Models
with A* Search
Paper
• 2310.13227
• Published
• 15
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper
• 2310.13385
• Published
• 10
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper
• 2310.12823
• Published
• 36
Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection
Paper
• 2310.11511
• Published
• 78
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper
• 2310.09263
• Published
• 40
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper
• 2310.08659
• Published
• 27
Prometheus: Inducing Fine-grained Evaluation Capability in Language
Models
Paper
• 2310.08491
• Published
• 57
Lemur: Harmonizing Natural Language and Code for Language Agents
Paper
• 2310.06830
• Published
• 33
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
Reasoning
Paper
• 2310.03731
• Published
• 29
DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
Paper
• 2310.03714
• Published
• 37
SmartPlay : A Benchmark for LLMs as Intelligent Agents
Paper
• 2310.01557
• Published
• 13
VMamba: Visual State Space Model
Paper
• 2401.10166
• Published
• 40
Medusa: Simple LLM Inference Acceleration Framework with Multiple
Decoding Heads
Paper
• 2401.10774
• Published
• 59
ActAnywhere: Subject-Aware Video Background Generation
Paper
• 2401.10822
• Published
• 13
Rambler: Supporting Writing With Speech via LLM-Assisted Gist
Manipulation
Paper
• 2401.10838
• Published
• 9
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
Generating with Multimodal LLMs
Paper
• 2401.11708
• Published
• 30
Large Language Models are Superpositions of All Characters: Attaining
Arbitrary Role-play via Self-Alignment
Paper
• 2401.12474
• Published
• 36
Orion-14B: Open-source Multilingual Large Language Models
Paper
• 2401.12246
• Published
• 14
Small Language Model Meets with Reinforced Vision Vocabulary
Paper
• 2401.12503
• Published
• 32
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language
Models
Paper
• 2401.12522
• Published
• 12
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
• 2401.13601
• Published
• 48
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
Diffusion Models for Virtual Try-All
Paper
• 2401.13795
• Published
• 68
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
• 2401.14196
• Published
• 71
Genie: Achieving Human Parity in Content-Grounded Datasets Generation
Paper
• 2401.14367
• Published
• 8
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published
• 51
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
• 2401.15947
• Published
• 53
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
Perception
Paper
• 2401.16158
• Published
• 20
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning
Paper
• 2401.16013
• Published
• 26
SymbolicAI: A framework for logic-based approaches combining generative
models and solvers
Paper
• 2402.00854
• Published
• 22
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
• 2402.07456
• Published
• 46
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts
Models
Paper
• 2402.07033
• Published
• 19
ChemLLM: A Chemical Large Language Model
Paper
• 2402.06852
• Published
• 30
LiRank: Industrial Large Scale Ranking Models at LinkedIn
Paper
• 2402.06859
• Published
• 12
AutoMathText: Autonomous Data Selection with Language Models for
Mathematical Texts
Paper
• 2402.07625
• Published
• 16
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published
• 109
Generative Representational Instruction Tuning
Paper
• 2402.09906
• Published
• 54
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper
• 2402.09727
• Published
• 38
How to Train Data-Efficient LLMs
Paper
• 2402.09668
• Published
• 43
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
• 2402.10193
• Published
• 21
DreamMatcher: Appearance Matching Self-Attention for
Semantically-Consistent Text-to-Image Personalization
Paper
• 2402.09812
• Published
• 16
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper
• 2402.10176
• Published
• 38
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows
Paper
• 2402.10379
• Published
• 31
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large
Language Models
Paper
• 2402.10524
• Published
• 23
Large Language Models as Zero-shot Dialogue State Tracker through
Function Calling
Paper
• 2402.10466
• Published
• 18
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video
Editing
Paper
• 2402.10294
• Published
• 27
OpenCodeInterpreter: Integrating Code Generation with Execution and
Refinement
Paper
• 2402.14658
• Published
• 84
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
• 2402.14083
• Published
• 47
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
• 2402.14289
• Published
• 20
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper
• 2402.14261
• Published
• 10
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
Aria Everyday Activities Dataset
Paper
• 2402.13349
• Published
• 31
Coercing LLMs to do and reveal (almost) anything
Paper
• 2402.14020
• Published
• 13
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper
• 2402.13720
• Published
• 7
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Paper
• 2402.00159
• Published
• 65
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published
• 47
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper
• 2402.01622
• Published
• 38
Nomic Embed: Training a Reproducible Long Context Text Embedder
Paper
• 2402.01613
• Published
• 15
Training-Free Consistent Text-to-Image Generation
Paper
• 2402.03286
• Published
• 67
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published
• 141
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published
• 28
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing
Paper
• 2402.02583
• Published
• 8
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper
• 2402.03620
• Published
• 117
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
Tasks
Paper
• 2402.04248
• Published
• 32
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper
• 2402.03766
• Published
• 15
Vision Superalignment: Weak-to-Strong Generalization for Vision
Foundation Models
Paper
• 2402.03749
• Published
• 15
Multi-line AI-assisted Code Authoring
Paper
• 2402.04141
• Published
• 10
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published
• 50
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper
• 2402.04615
• Published
• 44
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Paper
• 2402.04379
• Published
• 8
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Paper
• 2402.04858
• Published
• 15
Grandmaster-Level Chess Without Search
Paper
• 2402.04494
• Published
• 69
More Agents Is All You Need
Paper
• 2402.05120
• Published
• 57
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper
• 2402.05140
• Published
• 23
An Interactive Agent Foundation Model
Paper
• 2402.05929
• Published
• 30
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Paper
• 2402.05930
• Published
• 39
Training Generative Question-Answering on Synthetic Data Obtained from
an Instruct-tuned Model
Paper
• 2310.08072
• Published
• 1
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for
Language Models
Paper
• 2402.13064
• Published
• 50
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language
Models
Paper
• 2402.10986
• Published
• 81
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper
• 2402.11131
• Published
• 42
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
• 2402.12226
• Published
• 45
Paper
• 2402.12219
• Published
• 17
Rethinking Data Selection for Supervised Fine-Tuning
Paper
• 2402.06094
• Published
• 1
What Makes Good Data for Alignment? A Comprehensive Study of Automatic
Data Selection in Instruction Tuning
Paper
• 2312.15685
• Published
• 16
SelectLLM: Can LLMs Select Important Instructions to Annotate?
Paper
• 2401.16553
• Published
• 3
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for
Instruction Fine-Tuning
Paper
• 2402.04833
• Published
• 5
A Systematic Survey of Prompt Engineering in Large Language Models:
Techniques and Applications
Paper
• 2402.07927
• Published
• 2
Simple linear attention language models balance the recall-throughput
tradeoff
Paper
• 2402.18668
• Published
• 20
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
Diffusion Model
Paper
• 2402.17412
• Published
• 23
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
Autonomous Agents for Desktop and Web
Paper
• 2402.17553
• Published
• 25
Training-Free Long-Context Scaling of Large Language Models
Paper
• 2402.17463
• Published
• 24
FuseChat: Knowledge Fusion of Chat Models
Paper
• 2402.16107
• Published
• 39
StructLM: Towards Building Generalist Models for Structured Knowledge
Grounding
Paper
• 2402.16671
• Published
• 27
Seamless Human Motion Composition with Blended Positional Encodings
Paper
• 2402.15509
• Published
• 14
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
• 2403.03163
• Published
• 98
Finetuned Multimodal Language Models Are High-Quality Image-Text Data
Filters
Paper
• 2403.02677
• Published
• 18
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal
Datasets
Paper
• 2403.03194
• Published
• 15
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
• 2403.07508
• Published
• 77
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
• 2403.07816
• Published
• 44
Simple and Scalable Strategies to Continually Pre-train Large Language
Models
Paper
• 2403.08763
• Published
• 51
Adapting Large Language Models via Reading Comprehension
Paper
• 2309.09530
• Published
• 82
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper
• 2403.09919
• Published
• 21
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published
• 72
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document
Understanding
Paper
• 2403.12895
• Published
• 32
TnT-LLM: Text Mining at Scale with Large Language Models
Paper
• 2403.12173
• Published
• 20
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic
Prompt Compression
Paper
• 2403.12968
• Published
• 25
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper
• 2403.18421
• Published
• 23
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Paper
• 2406.06282
• Published
• 39
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated
Parameters
Paper
• 2406.05955
• Published
• 27
RegMix: Data Mixture as Regression for Language Model Pre-training
Paper
• 2407.01492
• Published
• 40
ColPali: Efficient Document Retrieval with Vision Language Models
Paper
• 2407.01449
• Published
• 51
MIRAI: Evaluating LLM Agents for Event Forecasting
Paper
• 2407.01231
• Published
• 18
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for
Sparse Architectural Large Language Models
Paper
• 2407.01906
• Published
• 46
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End
Modeling with LM Knowledge Distillation
Paper
• 2408.00205
• Published
• 5
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy
Curvature of Attention
Paper
• 2408.00760
• Published
• 7
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Paper
• 2407.21705
• Published
• 27
Towards Achieving Human Parity on End-to-end Simultaneous Speech
Translation via LLM Agent
Paper
• 2407.21646
• Published
• 18
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal
Domain
Paper
• 2407.19584
• Published
• 66
Self-Training with Direct Preference Optimization Improves
Chain-of-Thought Reasoning
Paper
• 2407.18248
• Published
• 33
Meta-Rewarding Language Models: Self-Improving Alignment with
LLM-as-a-Meta-Judge
Paper
• 2407.19594
• Published
• 21
Wolf: Captioning Everything with a World Summarization Framework
Paper
• 2407.18908
• Published
• 32
Vript: A Video Is Worth Thousands of Words
Paper
• 2406.06040
• Published
• 28
AppWorld: A Controllable World of Apps and People for Benchmarking
Interactive Coding Agents
Paper
• 2407.18901
• Published
• 35
GTA: A Benchmark for General Tool Agents
Paper
• 2407.08713
• Published
• 17
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from
User's Casual Sketches
Paper
• 2408.04567
• Published
• 26
ShortCircuit: AlphaZero-Driven Circuit Design
Paper
• 2408.09858
• Published
• 17
Factorized-Dreamer: Training A High-Quality Video Generator with Limited
and Low-Quality Data
Paper
• 2408.10119
• Published
• 17
Automated Design of Agentic Systems
Paper
• 2408.08435
• Published
• 40
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge
Discovery
Paper
• 2409.05591
• Published
• 31
Top-nσ: Not All Logits Are You Need
Paper
• 2411.07641
• Published
• 24
LLM4SR: A Survey on Large Language Models for Scientific Research
Paper
• 2501.04306
• Published
• 35
Autonomy-of-Experts Models
Paper
• 2501.13074
• Published
• 44
Process Reinforcement through Implicit Rewards
Paper
• 2502.01456
• Published
• 62
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
• 2502.18449
• Published
• 75
Paper
• 2502.14855
• Published
• 7
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through
Reflective Puzzle Solving
Paper
• 2502.20238
• Published
• 23
LongRoPE2: Near-Lossless LLM Context Window Scaling
Paper
• 2502.20082
• Published
• 36
CODESYNC: Synchronizing Large Language Models with Dynamic Code
Evolution at Scale
Paper
• 2502.16645
• Published
• 21
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published
• 131
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large
Reasoning Models
Paper
• 2505.21765
• Published
VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
Paper
• 2506.22694
• Published
• 3
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published
• 68