Date : March 7th, 2026
Time : 8:30 AM to 12:00 PM
Location : TBD
Tutorial Description
The increasing availability of geospatial data from heterogeneous modalities, including aerial and satellite imagery, ground-level views, and textual descriptions, has made cross-view geo-localization a critical research area with applications in autonomous navigation, urban monitoring, and augmented reality. Despite progress, challenges remain in handling extreme viewpoint variations, scaling across diverse domains, and integrating multimodal information. Recent developments in multimodal learning and Generative AI, such as Large Multimodal Models (LMMs), have introduced new paradigms for geo-localization. LMMs enable more generalized cross-view matching by incorporating language as an additional modality, supporting tasks such as text-based geo-localization, scene description, and multimodal reasoning. These capabilities not only improve performance but also expand the scope of cross-view geo-localization to broader multimodal applications. This tutorial will provide a comprehensive overview of these developments, highlighting the latest methodologies, datasets, and open research directions that are shaping the future of cross-view geo-localization.
Organizers
- Chen Chen, University of Central Florida, Orlando, FL, USA
- Safwan Wshah, University of Vermont, Burlington, VT, USA
- Xiaohan Zhang, University of Vermont, Burlington, VT, USA
Speakers
- Weijia Li, Sun Yat-Sen University, Guangzhou, China
- Zimin Xia, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Schedule
-
8:30 - 8:40: Welcome
-
8:40 - 9:20: Cross-View Geo-localization: Past, Present, and Future
Speaker: Chen Chen
-
9:20 - 10:00: Cross-View Geo-localization with Natural Language Descriptions
Speaker: Weijia Li
-
10:00 - 10:20: Coffee Break
-
10:20 - 11:00: From Retrieval to Precision: Fine-Grained Cross-View Geo-Localization
Speaker: Zimin Xia
-
11:00 - 11:40: Cross-View Association, Reasoning, and Explainability
Speaker: Safwan Wshah
-
11:40 - 12:00: Panel Discussion
Covered Publications
- Ye, Junyan, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, and Weijia Li. "Where am I?Cross-View Geo-localization with Natural Language Descriptions." arXiv preprint arXiv:2412.17007 (2024).
- Xia, Zimin, and Alexandre Alahi. "FG^ 2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching."In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 6362-6372. 2025.
- Xia, Zimin, Chenghao Xu and Alexandre Alahi. “Loc2: Interpretable Cross-View Localization via Depth-Lifted LocalFeature Matching.” arXiv preprint arXiv:2509.09792.
- Zhang, Xiaohan, Tavis Shore, Chen Chen, Oscar Mendez, Simon Hadfield, and Safwan Wshah. "VICI: VLM-Instructed Cross-view Image-localisation." arXiv preprint arXiv:2507.04107 (2025).
- Ye, Junyan, Zhutao Lv, Weijia Li, Jinhua Yu, Haote Yang, Huaping Zhong, and Conghui He. "Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network." In European Conference on Computer Vision, pp. 74-90. Cham:Springer Nature Switzerland, 2024.
- Arrabi, Ahmad, Xiaohan Zhang, Waqas Sultani, Chen Chen, and Safwan Wshah. "Cross-view meets diffusion: Aerialimage synthesis with geometry and text guidance." In 2025 IEEE/CVF Winter Conference on Applications of ComputerVision (WACV), pp. 5356-5366. IEEE, 2025.
- Zhu, Sijie, Taojiannan Yang, and Chen Chen. "Vigor: Cross-view image geo-localization beyond one-to-one retrieval."In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3640-3649. 2021.
- Ye, Junyan, Jun He, Xiang Zhang, Yi Lin, Honglin Lin, Conghui He, and Weijia Li. "Satellite Image Synthesis FromStreet View With Fine-Grained Spatial Textual Guidance: A novel framework." IEEE Geoscience and Remote SensingMagazine (2025).
- Zhang, Xiaohan, Xingyu Li, Waqas Sultani, Chen Chen, and Safwan Wshah. "GeoDTR+: Toward generic cross-viewgeolocalization via geometric disentanglement." IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
- Wilson, Daniel, Xiaohan Zhang, Waqas Sultani, and Safwan Wshah. "Image and object geo-localization." International Journal of Computer Vision 132, no. 4 (2024): 1350-1392.
- Zhang, Xiaohan, Xingyu Li, Waqas Sultani, Yi Zhou, and Safwan Wshah. "Cross-view geo-localization via learning disentangled geometric layout correspondence." In Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 3, pp. 3480-3488. 2023.
- Ye, Junyan, Jun He, Weijia Li, Zhutao Lv, Yi Lin, Jinhua Yu, Haote Yang, and Conghui He. "Leveraging BEV Paradigmfor Ground-to-Aerial Image Synthesis." arXiv preprint arXiv:2408.01812 (2024).
Content: Xiaohan Zhang 2026.
Theme: workshop-template-b by evanwill is built using Jekyll on GitHub Pages. The site is styled using Bootstrap.