Stepping VLMs onto the Court : Benchmarking Spatial Intelligence in Sports

Yuchen Yang^1,2,*, Yuqing Shao^4,2,*, Duxiu Huang^5,*, Linfeng Dong^6,2,*, Yifei Liu^7,2, Suixin Tang⁴, Xiang Zhou⁴, Yuanyuan Gao^8,2, Wei Wang², Yue Zhou⁹, Xue Yang³, Yanfeng Wang³, Xiao Sun², Zhihang Zhong^3,†

¹Fudan University ²Shanghai Artificial Intelligence Laboratory ³Shanghai Jiao Tong University ⁴East China University of Science and Technology ⁵Southeast University ⁶Zhejiang University ⁷Beihang University ⁸Hong Kong University of Science and Technology ⁹East China Normal University
^*Indicates Equal Contribution ^†Corresponding Author

arXiv GitHub 🤗 CourtSI 🤗 CourtSI-Bench 🤗 CourtSI-Ext 🤗 CourtSI-Qwen3-VL-8B

TL;DR We introduce CourtSI and CourtSI-Bench, the first large-scale dataset and benchmark dedicated to spatial intelligence in sports, facilitated by a semi-automatic data engine.

Abstract

Sports have long attracted broad attention as they push the limits of human physical and cognitive capabilities. Amid growing interest in spatial intelligence for vision-language models (VLMs), sports provide a natural testbed for understanding high-intensity human motion and dynamic object interactions. To this end, we present CourtSI, the first large-scale spatial intelligence dataset tailored to sports scenarios. CourtSI contains over 1M QA pairs, organized under a holistic taxonomy that systematically covers spatial counting, distance measurement, localization, and relational reasoning, across representative net sports including badminton, tennis, and table tennis. Leveraging well-defined court geometry as metric anchors, we develop a semi-automatic data engine to reconstruct sports scenes, enabling scalable curation of CourtSI. In addition, we introduce CourtSI-Bench, a high-quality evaluation benchmark comprising 3,686 QA pairs with rigorous human verification. We evaluate 25 proprietary and open-source VLMs on CourtSI-Bench, revealing a remaining human–AI performance gap and limited generalization from existing spatial intelligence benchmarks. These findings indicate that sports scenarios expose limitations in spatial intelligence capabilities captured by existing benchmarks. Further, fine-tuning Qwen3-VL-8B on CourtSI improves accuracy on CourtSI-Bench by 23.5 percentage points. The adapted model also generalizes effectively to CourtSI-Ext, an evaluation set built on a similar but unseen sport, and demonstrates enhanced spatial-aware commentary generation. Together, these findings demonstrate that CourtSI provides a scalable pathway toward advancing spatial intelligence of VLMs in sports.

🧪 Data

Data Viewer Browse QA examples from CourtSI-Bench

1 / 1

C Category

Q Question

A Answer

CourtSI 1M

CourtSI-Bench 3,686

Relational Reasoning

Ball-Zone Ball-Player Camera-Player Player-Zone Player-Player Player-Line

Distance Measurement

Camera-Object Height Object-Line Object-Object

Spatial Counting

Ball Player

Localization

Object

🏆 Leaderboard

^*If you would like to add your model's performance to the leaderboard, please contact the author.

^*[parsed] denotes results extracted from model outputs using an LLM with notable instruction compliance issues.

Models▲▼	Overall▲▼		Distance Measurement					Counting			Loc.		Relational Reasoning
Models▲▼	Overall▲▼		Cam.-Obj.▲▼	Height▲▼	Obj.-Line▲▼	Obj.-Obj.▲▼		Ball▲▼	Player▲▼		Obj.▲▼		Ball-Zone▲▼	Ball-Player▲▼	Cam.-Player▲▼	Player-Zone▲▼	Player-Player▲▼	Player-Line▲▼

[Until the release time (2026.3)] 💡 Although spatial intelligence models are specifically fine-tuned, we do not observe consistent improvements over their base models on CourtSI-Bench. This suggests that sports scenarios introduce additional spatial reasoning challenges not sufficiently captured by existing large-scale spatial intelligence benchmarks.

🔧 Semi-Automatic Data Engine

💡 By leveraging court geometry and incorporating human-in-the-loop supervision, we develop a data engine for accurate and world-grounded reconstruction in sport scenarios.

BibTeX

@misc{yang2026CourtSI,
      title={Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports},
      author={Yuchen Yang and Yuqing Shao and Duxiu Huang and Linfeng Dong and Yifei Liu and Suixin Tang and Xiang Zhou and Yuanyuan Gao and Wei Wang and Yue Zhou and Xue Yang and Yanfeng Wang and Xiao Sun and Zhihang Zhong},
      year={2026},
      eprint={2603.09896},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.09896}, 
}