ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering Paper • 2510.09351 • Published Oct 10, 2025
AgREE: Agentic Reasoning for Knowledge Graph Completion on Emerging Entities Paper • 2508.04118 • Published Aug 6, 2025
A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents Paper • 2602.08964 • Published Feb 9 • 1
EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling Paper • 2510.11170 • Published Oct 13, 2025 • 3
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering Paper • 2503.14996 • Published Mar 19, 2025 • 3
Steering Large Language Models for Machine Translation Personalization Paper • 2505.16612 • Published May 22, 2025 • 6
Mergenetic: a Simple Evolutionary Model Merging Library Paper • 2505.11427 • Published May 16, 2025 • 14
Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results Paper • 2504.13677 • Published Apr 18, 2025 • 1
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces Paper • 2503.05283 • Published Mar 7, 2025 • 4
QE4PE: Word-level Quality Estimation for Human Post-Editing Paper • 2503.03044 • Published Mar 4, 2025 • 6
MERGE$^3$: Efficient Evolutionary Merging on Consumer-grade GPUs Paper • 2502.10436 • Published Feb 9, 2025 • 1
Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions Paper • 2408.05212 • Published Aug 10, 2024
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction Paper • 2501.12979 • Published Jan 22, 2025 • 1
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published Jan 17, 2025 • 10
ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering Paper • 2410.05077 • Published Oct 7, 2024 • 5
Echoes from Alexandria: A Large Resource for Multilingual Book Summarization Paper • 2306.04334 • Published Jun 7, 2023 • 2