Xingtai Lv's picture

1 19 1

Xingtai Lv

XingtaiHF

·

AI & ML interests

LLM

Recent Activity

liked a model 13 days ago

deepseek-ai/DeepSeek-Math-V2

upvoted a paper 22 days ago

P1: Mastering Physics Olympiads with Reinforcement Learning

upvoted a paper 2 months ago

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

View all activity

Organizations

upvoted a paper 22 days ago

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published 22 days ago • 132

upvoted 2 papers 2 months ago

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published Oct 3 • 97

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Paper • 2509.25123 • Published Sep 29 • 20

upvoted 5 papers 3 months ago

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 114

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10 • 660

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 189

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Paper • 2509.09674 • Published Sep 11 • 80

Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4 • 75

upvoted 2 papers 4 months ago

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 97

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28 • 82

upvoted 2 papers 6 months ago

RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23 • 31

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 131

upvoted a paper 8 months ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120

upvoted a paper 9 months ago

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14 • 28

upvoted 2 papers 10 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 166

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 61

upvoted 2 papers 12 months ago

Sparse Low-rank Adaptation of Pre-trained Language Models

Paper • 2311.11696 • Published Nov 20, 2023 • 2

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 41