SEGAgentRL

non-profit

AI & ML interests

We target improved agent reinforcement learning in terms of stability (S), efficiency (E), and generalization (G).

Recent Activity

dwenlong submitted a paper 21 days ago

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

dwenlong submitted a paper about 2 months ago

For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs

dwenlong authored a paper 3 months ago

Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents

View all activity

Collections 1

models 10

SEGAgentRL/LLDS-A-GRPO-Llama3.2-3B-Base-MA

Reinforcement Learning • 4B • Updated Jan 16 • 2

SEGAgentRL/LLDS-A-GRPO-Qwen2.5-3B-Ins

Reinforcement Learning • 3B • Updated Jan 15 • 3

SEGAgentRL/LLDS-R-GRPO-Qwen2.5-3B-Base

Reinforcement Learning • 3B • Updated Jan 15 • 4 • 1

SEGAgentRL/LLDS-R-GSPO-Qwen2.5-3B-Ins

Reinforcement Learning • 3B • Updated Jan 15 • 5 • 1

SEGAgentRL/LLDS-A-GSPO-Qwen2.5-3B-Ins

Reinforcement Learning • 3B • Updated Jan 15 • 4 • 1

SEGAgentRL/LLDS-R-GRPO-Qwen2.5-3B-Ins

Reinforcement Learning • 3B • Updated Jan 15 • 4 • 1

SEGAgentRL/LLDS-A-GRPO-Qwen2.5-3B-Base

Reinforcement Learning • 3B • Updated Jan 15 • 4

SEGAgentRL/LLDS-A-GRPO-Qwen2.5-3B-Base-MA

Reinforcement Learning • 3B • Updated Jan 15 • 5 • 1

SEGAgentRL/LLDS-A-GRPO-Qwen2.5-7B-Base

Reinforcement Learning • 8B • Updated Jan 15 • 5 • 2

SEGAgentRL/LLDS-A-GRPO-Qwen2.5-7B-Ins

Reinforcement Learning • 8B • Updated Jan 15 • 4 • 2

datasets 0

None public yet