Cross-View Geo-localization with Natural Language Descriptions

Speaker: Weijia Li

Time: 9:20 - 10:00

Download Slides
Abstract:

Cross-view geo-localization identifies the locations of street-view images by matching them with geo-tagged satellite images or OSM. However, most existing studies focus on image-to-image retrieval, with fewer addressing text-guided retrieval, a task vital for applications like pedestrian navigation and emergency response. This talk will delve into the recently proposed task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text descriptions. We will introduce the newly curated dataset and its employment of a scene text generation approach that leverages the annotation capabilities of Vision Language Models to produce high-quality scene text descriptions with localization details. We will also cover the most novel text-based retrieval localization method that tackles this challenging problem.