ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Published in NeurIPS 2025, 2025

Recommended citation: S Huang, L Yang, Y Song, S Chen, L Cui, Z Wan, Q Zeng, Y Wen, K Shao. (2025). "ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning." NeurIPS 2025.