Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published 9 days ago • 39
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published 9 days ago • 39
OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published Nov 25, 2025 • 12
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20, 2025 • 67
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models Paper • 2510.11683 • Published Oct 13, 2025 • 14
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models Paper • 2510.11683 • Published Oct 13, 2025 • 14 • 2
LLaDA-8B-BGPO Collection Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models • 4 items • Updated Oct 11, 2025 • 4
LLaDA-8B-BGPO Collection Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models • 4 items • Updated Oct 11, 2025 • 4