DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture Paper • 2509.19274 • Published Sep 23 • 2
ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering Paper • 2508.07321 • Published Aug 10 • 2