MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper β’ 2508.20453 β’ Published Aug 28 β’ 63
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs Dec 4, 2024 β’ 80