DSI-Bench: A Benchmark for Dynamic Spatial Intelligence

1Zhejiang University    2Alibaba Group    3Shanghai AI Lab
* Equal contribution.
Dynamic Spatial Intelligence Overview
Dynamic Spatial Intelligence. Unlike static settings, dynamic scenarios involve evolving spatial relationships among the observer, observed objects, and the environment. Humans can intuitively perceive such changes in spatial relations, whereas Vision-Language Models (VLMs) often exhibit hallucinations and biases in dynamic spatial reasoning due to semantic misleadness and coupled motion understanding.
Motion Distribution in DSI-Bench
Left: Task distribution in DSI-Bench; Middle: Observer Motion distribution in DSI-Bench; Right: Observed Motion distribution in DSI-Bench.

Failure Cases

Model Performance

The table below presents the performance of 14 models on DSI-Bench under two evaluation protocols: Sample-wise (each augmented video treated independently) and Group-wise (fraction of original video groups with ≥3 correct predictions among variants).

Models Object-Scene Observer-Scene Observer-Object Overall
Fixed-Obs. Dyn-Obs. Static-Sce. Dyn-Sce. Distance Orientation
Sample-wise Evaluation
Random 25.00% 25.00% 25.00% 25.00% 25.00% 25.00% 25.00%
Gemini-2.5-Pro45.54%44.76%54.89%40.39%69.75%47.94%46.90%
Nova-Pro-V138.92%37.33%34.57%28.64%46.01%15.29%34.06%
Qwen2.5-VL-32B37.16%37.54%29.46%34.06%58.15%32.35%36.73%
Qwen2.5-VL-72B41.35%41.92%33.26%34.56%58.15%39.71%39.61%
Seed-1.645.27%45.15%35.00%35.52%54.17%41.47%41.38%
Seed-1.6-Vision45.54%42.87%50.76%39.21%79.71%38.53%45.70%
GPT-4o42.43%39.65%37.61%28.78%55.98%32.35%37.23%
GPT-543.37%36.73%39.13%34.61%73.55%40.59%40.14%
InternVL3.5-8B39.73%37.97%26.20%32.60%62.14%29.12%36.41%
InternVL3.5-38B42.02%37.63%34.57%34.70%67.39%37.65%39.10%
InternVL3.5-30BA3B42.70%39.13%35.43%32.10%60.14%34.12%38.24%
InternVL3.5-241BA30B44.06%43.51%37.28%33.88%61.78%30.88%40.59%
VGGT----35.55%22.50%------
SpatialTrackerV246.35%50.42%41.96%38.98%27.09%--42.43%
Group-wise Evaluation
Random 0.05% 0.05% 0.05% 0.05% 0.05% 0.05% 0.05%
Gemini-2.5-Pro21.62%18.21%42.61%21.86%64.49%31.76%27.13%
Nova-Pro-V113.52%10.82%18.26%8.56%38.41%5.88%13.29%
Qwen2.5-VL-32B10.81%10.48%14.78%16.39%50.72%17.65%16.40%
Qwen2.5-VL-72B12.44%10.31%15.65%10.93%46.38%35.29%15.43%
Seed-1.616.76%13.40%15.22%10.93%42.75%25.88%16.11%
Seed-1.6-Vision18.38%13.23%39.13%21.49%74.64%30.59%25.33%
GPT-4o14.06%10.31%24.78%9.29%45.65%20.00%15.49%
GPT-517.84%13.23%26.96%17.49%70.29%25.88%21.88%
InternVL3.5-8B19.46%12.03%13.04%19.13%50.72%10.59%18.08%
InternVL3.5-38B22.16%12.03%18.26%11.84%58.70%28.24%18.25%
InternVL3.5-30BA3B22.70%12.03%22.61%18.21%43.48%23.53%19.44%
InternVL3.5-241BA30B16.76%14.60%19.57%11.48%49.28%18.82%17.41%
VGGT----31.30%16.58%------
SpatialTrackerV238.96%42.39%39.13%35.70%5.98%--35.72%

Citation

@misc{zhang2025dsibenchbenchmarkdynamicspatial,
      title={DSI-Bench: A Benchmark for Dynamic Spatial Intelligence}, 
      author={Ziang Zhang and Zehan Wang and Guanghao Zhang and Weilong Dai and Yan Xia and Ziang Yan and Minjie Hong and Zhou Zhao},
      year={2025},
      eprint={2510.18873},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.18873}, 
}