Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 32
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 189
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 75
Set Block Decoding is a Language Model Inference Accelerator Paper • 2509.04185 • Published Sep 4 • 52
LLM-based Optimization of Compound AI Systems: A Survey Paper • 2410.16392 • Published Oct 21, 2024 • 16
Advances in Speech Separation: Techniques, Challenges, and Future Trends Paper • 2508.10830 • Published Aug 14 • 14
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation Paper • 2508.12040 • Published Aug 16 • 14
Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge Paper • 2508.08777 • Published Aug 12 • 15
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 129
Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation Paper • 2508.13745 • Published Aug 19 • 1
mSCoRe: a Multilingual and Scalable Benchmark for Skill-based Commonsense Reasoning Paper • 2508.10137 • Published Aug 13 • 2
Leuvenshtein: Efficient FHE-based Edit Distance Computation with Single Bootstrap per Cell Paper • 2508.14568 • Published Aug 20 • 2
Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis Paper • 2508.15754 • Published Aug 21 • 4
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published Aug 15 • 8
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published Aug 20 • 22