YOLO Demystified

YOLO’s working mechanism, strengths, limitations and potential

Sagun Raj Lage
4 min readJan 19, 2023

--

Photo by Gertrūda Valasevičiūtė on Unsplash

You Only Look Once (YOLO) is one of the most popular and powerful real-time object detection algorithms that was released in 2016. It is fast as it can process images and detect objects at the rate of 45 frames per second. Fast YOLO, which is a smaller version of this system, delivers an even better rate of 155 frames per second [1]. This means that this system can efficiently detect objects not only in images, but also in videos. YOLO has been evolving since its release and with the increase in its versions, its speed and accuracy have also been increasing. The latest official version of YOLO is YOLOv8 that was published in 2023 and it is “a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. [2]”

YOLO uses convolutional neural networks to various significant objects in an image and draws bounding boxes around the objects with their class labels. It has a different working mechanism as compared to the traditional object-detection algorithms. While the traditional approach was to detect high-scoring regions of an image after applying the detection model at multiple locations and scales, YOLO applies a single neural network to the whole image [3]. Then, the neural network splits the image into regions, detects the objects in those regions and returns an image with bounding boxes around the detected objects and the predicted class of the object along with the confidence of the model over the predicted class. As the name suggests, the algorithm analyzes the image ‘only once’ to detect the objects and their locations in the image.

The working mechanism of YOLO to detect objects consists of three steps. As the first step, the system takes the input image, resizes it to 448 × 448. Then, a single convolutional neural network runs on the image and draws many potential bounding boxes around the detected objects with their predicted classes and probability scores. Then, as the last step, the non-max suppression algorithm is used in each of the detected objects to select the bounding box with the highest probability score and remove all the other boxes that overlap with the bounding box with the highest probability score [4]. In this way, an image containing the detected objects each surrounded by single bounding box and its predicted class is returned by the system.

YOLO is a fast object detection algorithm that runs its neural network on an image to predict detections, without requiring a complex process [5]. In addition to that, its predictions are more reliable as during the object detection, in addition to the appearance of the detected object, it also considers the context of the detected object in the image, to predict its class. Plus, YOLO can perform efficiently even when it is fed unexpected inputs as it is a generalized algorithm — meaning that it performs equally well on both training and testing datasets.

Limitations

Although YOLO is a powerful algorithm, it has some limitations. It does not perform well while detecting groups of small objects (like a flock of birds) in an image. Also, another limitation of YOLO is that it cannot predict well when the aspect ratio of the input image is unusual. Plus, YOLO is less accurate in identifying the location of an object and drawing the bounding box around it in an image as compared to slower object detection algorithms like Fast R-CNN. In other words, it has a higher localization error [1].

Conclusion

To conclude, YOLO is one of the most powerful algorithms in the field of computer vision. The best part of this algorithm is that it efficiently can detect objects in real-time in the wild. It means that this algorithm can be implemented to detect objects in videos being recorded by surveillance cameras in real-time to detect people in dangerous or restricted areas, to automate inspection tasks etc. Similarly, in an era when technologies like self-driving cars are being introduced, fast and efficient algorithms like YOLO can help the cars to detect various signs, pedestrians, and other vehicles. Plus, it can also contribute to fields like medical science to detect diseases or their signs in various scans and agriculture to monitor animals, to keep their counts etc. In industries, technologies like YOLO can save a lot of resources needed for quality control. So, YOLO has opened the doors for incorporating computer vision in day-to-day life applications and it has unlocked potential for further research, development, and improvement of the algorithm.

References

[1] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[2] “Ultralytics YOLOv8 Docs,” Ultralytics, 2023. [Online]. Available: https://docs.ultralytics.com/.

[3] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” ArXiv, vol. abs/1804.02767, 2018.

[4] A. Singh, “Selecting the Right Bounding Box Using Non-Max Suppression (with implementation),” Analytics Vidhya, 3 August 2020. [Online]. Available: https://www.analyticsvidhya.com/blog/2020/08/selecting-the-right-bounding-box-using-non-max-suppression-with-implementation/.

[5] J. Redmon, “YOLO: Real-Time Object Detection,” 2018. [Online]. Available: https://pjreddie.com/darknet/yolo/.

If you found this post useful and would like to support me, please “buy me a coffee.”

--

--

Sagun Raj Lage

Author of Getting Started with WidgetKit (2021) | Research Associate at UGA Savannah River Ecology Laboratory | iOS Engineer | Full Stack Engineer