USTBench Reveals Limits of AI in Urban Spatiotemporal Reasoning

Urban Planning

USTBench Reveals Limits of AI in Urban Spatiotemporal Reasoning

Evaluating large language models' spatiotemporal reasoning in urban tasks

From

Arxiv

Large language models (LLMs) are promising tools for supporting urban decision-making, but their reasoning abilities in spatiotemporal tasks remain unclear. USTBench is a new benchmark designed to evaluate these models' detailed reasoning in urban settings.

Why it matters: Understanding LLMs' spatiotemporal reasoning helps improve urban planning, traffic management, and smart city applications.

The big picture: USTBench evaluates LLMs across four key reasoning dimensions: understanding, forecasting, planning, and reflection with feedback.

Stunning stat: USTBench includes 62,466 structured question-answer pairs to rigorously test LLMs in diverse urban scenarios.

The stakes: Current LLMs struggle with long-term planning and adapting reflectively in dynamic urban environments, limiting real-world effectiveness.