4 4 1

Zhaopeng Tu

zptu

http://www.zptu.net

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

Deep Research: A Systematic Survey

commented on a paper 29 days ago

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

authored a paper 29 days ago

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

View all activity

Organizations

None yet

upvoted a paper 6 days ago

Deep Research: A Systematic Survey

Paper • 2512.02038 • Published 15 days ago • 61

commented a paper 29 days ago

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Paper • 2511.04962 • Published Nov 7 • 52 •

authored 11 papers 29 days ago

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Paper • 2505.14681 • Published May 20 • 10

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Paper • 2503.16024 • Published Mar 20 • 1

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Paper • 2505.23754 • Published May 29 • 15

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Paper • 2505.13445 • Published May 19

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Paper • 2507.03112 • Published Jul 3 • 32

CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards

Paper • 2507.17147 • Published Jul 23

The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models

Paper • 2503.02875 • Published Mar 4 • 1

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Paper • 2509.09675 • Published Sep 11 • 28

VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization

Paper • 2505.10917 • Published May 16

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

Paper • 2509.26514 • Published Sep 30 • 3

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Paper • 2511.04962 • Published Nov 7 • 52

commented 2 papers 29 days ago

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Paper • 2511.04962 • Published Nov 7 • 52 •

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Paper • 2511.04962 • Published Nov 7 • 52 •

upvoted a paper 2 months ago

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

Paper • 2509.26514 • Published Sep 30 • 3

commented a paper 2 months ago

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

Paper • 2509.26514 • Published Sep 30 • 3 •

authored a paper 3 months ago

SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning

Paper • 2509.16548 • Published Sep 20

upvoted a paper 5 months ago

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Paper • 2507.03112 • Published Jul 3 • 32

authored a paper 7 months ago

Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms

Paper • 2505.20322 • Published May 23 • 14

Zhaopeng Tu

AI & ML interests

Recent Activity

Organizations

zptu's activity