From Retrieval to Precision: Fine-Grained Cross-View Geo-Localization
Abstract:
Over the past decade, cross-view geo-localization has been commonly formulated as an image retrieval task, where the location of a ground-level image is approximated by the center of the retrieved aerial image tile. However, when fine-grained localization is required, such as in autonomous navigation or AR/VR applications, retrieval-based methods would need densely overlapping aerial tiles, which is highly inefficient. This talk explores an emerging direction that moves beyond image retrieval, aiming instead to directly estimate the precise camera pose (position and azimuth) within a reference aerial image of the local surroundings. Furthermore, it will discuss how projection geometry, local feature matching, and depth cues contribute to fine-grained cross-view geo-localization and highlight recent advances and open challenges in this evolving field.