Individuals who experience a significant loss of vision or blindness often face challenges navigating the world. In such scenarios, there is often a dependence on various types of support or aids to enable individuals to move through their surroundings. Examples of this include guide dogs and assistive white canes. Those who suffer from blindness also rely on a heightened sense of auditory processing. Despite these solutions, navigating the modern world is challenging from a safety perspective.
In recent years, computer vision algorithms have been developed to interpret and report the contents of image streams. Applications of these algorithms have often focused on automated labeling of data (image analysis and audio processing etc.) within vast media databases; however, they can be leveraged to operate with live camera streams. In this research, the authors present an overview of a proof-of-concept technology stack to implement a danger detection system for individuals with significant visual impairment. The system integrates an advanced cloud driven approach employing a fine-tuned YOLO model for hazard detection, performs obstacle avoidance, segmentation processes for scene understanding, text recognition through optical character recognition (OCR), and visual question answering (VQA) for user queries. These technologies collectively address critical issues such as dynamic hazard recognition, spatial awareness, and personalized assistance.