VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification Paper • 2604.01569 • Published 3 days ago • 8
Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning Paper • 2604.02007 • Published 3 days ago • 5
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published 8 days ago • 151
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 3 days ago • 77
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 3 days ago • 24
Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers Paper • 2604.01128 • Published 3 days ago • 11
Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers Paper • 2604.01128 • Published 3 days ago • 11
Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants Paper • 2604.00842 • Published 3 days ago • 8 • 2
Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants Paper • 2604.00842 • Published 3 days ago • 8
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 3 days ago • 20
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 3 days ago • 24