YOLO Demystified

YOLO’s working mechanism, strengths, limitations and potential

Sagun Raj Lage
4 min readJan 19

--

Photo by Gertrūda Valasevičiūtė on Unsplash

You Only Look Once (YOLO) is one of the most popular and powerful real-time object detection algorithms that was released in 2016. It is fast as it can process images and detect objects at the rate of 45 frames per second. Fast YOLO, which is a smaller version of this system, delivers an even better rate of 155 frames per second [1]. This means that this system can efficiently detect objects not only in images, but also in videos. YOLO has been evolving since its release and with the increase in its versions, its speed and accuracy have also been increasing. The latest official version of YOLO is YOLOv8 that was published in 2023 and it is “a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. [2]”

YOLO uses convolutional neural networks to various significant objects in an image and draws bounding boxes around the objects with their class labels. It has a different working mechanism as compared to the traditional object-detection algorithms. While the traditional approach was to detect high-scoring regions of an image after applying the detection model at multiple locations and scales, YOLO applies a single neural network to the whole image [3]. Then, the neural network splits the image into regions, detects the objects in those regions and returns an image with bounding boxes around the detected objects and the predicted class of the object along with the confidence of the model over the predicted class. As the name suggests, the algorithm analyzes the image ‘only once’ to detect the objects and their locations in the image.

The working mechanism of YOLO to detect objects consists of three steps. As the first step, the system takes the input image, resizes it to 448 × 448. Then, a single convolutional neural network runs on the image and draws many potential bounding boxes around the detected objects with their predicted classes and probability scores. Then, as the last step, the non-max suppression algorithm is used in each of the detected objects to select the bounding box with the highest probability score and remove all the other boxes that overlap with the bounding box with the highest probability…

--

--

Sagun Raj Lage

Author of Getting Started with WidgetKit (2021) | Research Assistant at University of Houston-Victoria | iOS Engineer | Full Stack Engineer