i-manager Publications

Identification of objects, scenes, and landmarks from real images by integrating YOLOv8 and CLIP models, and plotting landmarks on a map

Pratap Singh*

Periodicity:October - December'2025

Abstract

The main ideation of this paper is to know the location where the image had taken from the image with exif and without exif meta data and plot on the map. As we know with the increasing volume of digital imagery shared through online platforms, there is a growing research interest in identifying the geographical origin of an image even when no embedded geotags or Exif metadata are available. This work introduces a novel approach for estimating image locations from image EXIF Meta data if exits or from visual content, eliminating the totally dependent on GPS data. The proposed framework combines object detection and semantic understanding to infer spatial information from contextual features within an image. Using the YOLOv8 model, key elements such as landmarks, objects and scene are first detected. These visual cues are then interpreted through the CLIP (Contrastive Language–Image Pretraining) model, which maps both image and text features into a shared embedding space. By applying cosine similarity between image and textual location embeddings, the system identifies the most plausible location description that corresponds to the given image.

Keywords

YOLO (You Only Look Once), CLIP (Contrastive Language–Image Pretraining) model, Computer Vision, AI, GIS,

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

Identification of objects, scenes, and landmarks from real images by integrating YOLOv8 and CLIP models, and plotting landmarks on a map

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: