ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Published in NeurIPS 2025, 2025
Recommended citation: S Huang, L Yang, Y Song, S Chen, L Cui, Z Wan, Q Zeng, Y Wen, K Shao. (2025). "ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning." NeurIPS 2025.
