HaluMem: Evaluating Hallucinations in Memory Systems of Agents Paper • 2511.03506 • Published Nov 5 • 93
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published Apr 14 • 85
Internal Consistency and Self-Feedback in Large Language Models: A Survey Paper • 2407.14507 • Published Jul 19, 2024 • 46