Home | WACV 2026 Tutorial

Date : March 7th, 2026

Time : 8:30 AM to 12:00 PM US Mountain Time

Location : AZ Ballroom Salon 9

Tutorial Description

The increasing availability of geospatial data from heterogeneous modalities, including aerial and satellite imagery, ground-level views, and textual descriptions, has made cross-view geo-localization a critical research area with applications in autonomous navigation, urban monitoring, and augmented reality. Despite progress, challenges remain in handling extreme viewpoint variations, scaling across diverse domains, and integrating multimodal information. Recent developments in multimodal learning and Generative AI, such as Large Multimodal Models (LMMs), have introduced new paradigms for geo-localization. LMMs enable more generalized cross-view matching by incorporating language as an additional modality, supporting tasks such as text-based geo-localization, scene description, and multimodal reasoning. These capabilities not only improve performance but also expand the scope of cross-view geo-localization to broader multimodal applications. This tutorial will provide a comprehensive overview of these developments, highlighting the latest methodologies, datasets, and open research directions that are shaping the future of cross-view geo-localization.

Organizers

Chen Chen, University of Central Florida, Orlando, Florida, USA
Safwan Wshah, University of Vermont, Burlington, Vermont, USA
Xiaohan Zhang, University of Vermont, Burlington, Vermont, USA

Speakers

Weijia Li, Shenzhen International Graduate School, Tsinghua University, China
Zimin Xia, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Schedule (US Mountain Time)

8:30 - 8:40: Welcome
8:40 - 9:20: Cross-View Geo-localization: Past, Present, and Future

Speaker: Chen Chen

📄 Slides
9:20 - 10:00: Cross-View Geo-Localization: From Image Retrieval to Multi-Modal Reasoning

Speaker: Weijia Li

📄 Slides
10:00 - 10:20: Coffee Break
10:20 - 11:00: From Retrieval to Precision: Fine-Grained Cross-View Geo-Localization

Speaker: Zimin Xia

📄 Slides
11:00 - 11:40: Cross-View Association, Reasoning, and Explainability

Speaker: Safwan Wshah
11:40 - 12:00: Panel Discussion

Covered Publications

Ye, Junyan, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, and Weijia Li. "Where am I?Cross-View Geo-localization with Natural Language Descriptions." arXiv preprint arXiv:2412.17007 (2024).
Xia, Zimin, and Alexandre Alahi. "FG^ 2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching."In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 6362-6372. 2025.
Xia, Zimin, Chenghao Xu and Alexandre Alahi. “Loc2: Interpretable Cross-View Localization via Depth-Lifted LocalFeature Matching.” arXiv preprint arXiv:2509.09792.
Zhang, Xiaohan, Tavis Shore, Chen Chen, Oscar Mendez, Simon Hadfield, and Safwan Wshah. "VICI: VLM-Instructed Cross-view Image-localisation." arXiv preprint arXiv:2507.04107 (2025).
Ye, Junyan, Zhutao Lv, Weijia Li, Jinhua Yu, Haote Yang, Huaping Zhong, and Conghui He. "Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network." In European Conference on Computer Vision, pp. 74-90. Cham:Springer Nature Switzerland, 2024.
Arrabi, Ahmad, Xiaohan Zhang, Waqas Sultani, Chen Chen, and Safwan Wshah. "Cross-view meets diffusion: Aerialimage synthesis with geometry and text guidance." In 2025 IEEE/CVF Winter Conference on Applications of ComputerVision (WACV), pp. 5356-5366. IEEE, 2025.
Zhu, Sijie, Taojiannan Yang, and Chen Chen. "Vigor: Cross-view image geo-localization beyond one-to-one retrieval."In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3640-3649. 2021.
Ye, Junyan, Jun He, Xiang Zhang, Yi Lin, Honglin Lin, Conghui He, and Weijia Li. "Satellite Image Synthesis FromStreet View With Fine-Grained Spatial Textual Guidance: A novel framework." IEEE Geoscience and Remote SensingMagazine (2025).
Zhang, Xiaohan, Xingyu Li, Waqas Sultani, Chen Chen, and Safwan Wshah. "GeoDTR+: Toward generic cross-viewgeolocalization via geometric disentanglement." IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
Wilson, Daniel, Xiaohan Zhang, Waqas Sultani, and Safwan Wshah. "Image and object geo-localization." International Journal of Computer Vision 132, no. 4 (2024): 1350-1392.
Zhang, Xiaohan, Xingyu Li, Waqas Sultani, Yi Zhou, and Safwan Wshah. "Cross-view geo-localization via learning disentangled geometric layout correspondence." In Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 3, pp. 3480-3488. 2023.
Ye, Junyan, Jun He, Weijia Li, Zhutao Lv, Yi Lin, Jinhua Yu, Haote Yang, and Conghui He. "Leveraging BEV Paradigmfor Ground-to-Aerial Image Synthesis." arXiv preprint arXiv:2408.01812 (2024).

Content: Xiaohan Zhang 2026.

Theme: workshop-template-b by evanwill is built using Jekyll on GitHub Pages. The site is styled using Bootstrap.

Tutorial Description

Organizers

Speakers

Schedule (US Mountain Time)

8:30 - 8:40: Welcome

8:40 - 9:20: Cross-View Geo-localization: Past, Present, and Future

9:20 - 10:00: Cross-View Geo-Localization: From Image Retrieval to Multi-Modal Reasoning

10:00 - 10:20: Coffee Break

10:20 - 11:00: From Retrieval to Precision: Fine-Grained Cross-View Geo-Localization

11:00 - 11:40: Cross-View Association, Reasoning, and Explainability

11:40 - 12:00: Panel Discussion

Covered Publications