The goal of this paper is to create an object detector model that can detect objects for visually impaired people and other commercial users by detecting it at a certain distance. Existing object detection algorithms required a huge amount of training data, which took longer and was extremely complex. This is also a difficult task. As a result, it presents a computer vision paradigm for converting an object to text by importing a pre-trained CAFFEMODEL (a machine learning model created by Caffe) framework dataset model, and the texts are further converted to speech. This method allows the detection of multiple objects on the same screen. It helps in real-time object detection. This paper discusses the concept, methodology, and system architecture for the implementation of the system in combination with the obtained intermediate results and analyzes the tools used in the proposed system. This system can then be implemented in any other system. Portable gadgets that detect objects at a certain distance from visually impaired people and transmit a voice signal.