This is a preview and has not been published.

Comparison of Faster R-CNN and YOLOv5 for Overlapping Objects Recognition




Computer vision, Convolutional neural network, Faster r-cnn, Kitchen utensils, Overlapping object recognition, Yolo


Classifying an overlapping object is one of the main challenges faced by researchers who work in object detection and recognition. Most of the available algorithms that have been developed are only able to classify or recognize objects which are either individually separated from each other or a single object in a scene(s), but not overlapping kitchen utensil objects. In this project, Faster R-CNN and YOLOv5 algorithms were proposed to detect and classify an overlapping object in a kitchen area.  The YOLOv5 and Faster R-CNN were applied to overlapping objects where the filter or kernel that are expected to be able to separate the overlapping object in the dedicated layer of applying models. A kitchen utensil benchmark image database and overlapping kitchen utensils from internet were used as base benchmark objects. The evaluation and training/validation sets are set at 20% and 80% respectively. This project evaluated the performance of these techniques and analyzed their strengths and speeds based on accuracy, precision and F1 score. The analysis results in this project concluded that the YOLOv5 produces accurate bounding boxes whereas the Faster R-CNN detects more objects. In an identical testing environment, YOLOv5 shows the better performance than Faster R-CNN algorithm. After running in the same environment, this project gained the accuracy of 0.8912(89.12%) for YOLOv5 and 0.8392 (83.92%) for Faster R-CNN, while the loss value was 0.1852 for YOLOv5 and 0.2166 for Faster R-CNN. The comparison of these two methods is most current and never been applied in overlapping objects, especially kitchen utensils.


Download data is not yet available.


Bashiri FS, LaRose E, Badger JC, D’Souza R M, Yu Z, Peissig P. Object Detection to Assist Visually Impaired People: A Deep Neural Network Adventure. Adv. in Vis. Comp., ISVC. 2018: 500-510.

Panchal P, Prajapati G, Patel S, Shah H, Nasriwala J. A Review of Object Detection and Tracking Methods. Int J Res Emerg Sci. Technol. 2015; 2(1): 7-12.

Nguyen K, Huynh NT, Nguyen PC, Nguyen KD, Vo ND, Nguyen TV. Detecting Objects from Space: An Evaluation of Deep-Learning Modern Approaches. Electronics. 2020; 9: 583.

Alganci U, Soydas M, Sertel E. Comparative Research on Deep Learning Approaches for Airplane Detection from Very High-resolution Satellite Images. Remote Sens. 2020; 12(3): 458.

Sharma V. Face Mask Detection Using YOLOv5 for COVID-19. MSc. [thesis]. USA: California State University; 2020.

He Y, Zeng H, Fan Y, Ji S, Wu J. Application of Deep Learning in Integrated Pest Management: A Real-Time System for Detection and Diagnosis of Oilseed Rape Pests. Mob. Inf. Syst. 2019; 2019: 1-14.

Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, Real-Time Object Detection. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2016; 779-788.

Arulprakash E, Aruldoss M. A Study on Generic Object Detection with Emphasis on Future Research Directions. J King Saud Univ Comp Info Sci. 2021 Aug 12; Online first.

Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Comp Vis Patt Recog. 2016.

Cai J, Li Y. Realtime Single-stage Instance Segmentation Network Based on Anchors. Comput Electr Eng. 2021; 95: 107464.

Granada R, Monteiro J, Barros R C, Meneguzzi F. A Deep Neural Architecture for Kitchen Activity Recognition. The Thirtieth International Flairs Conference. 22-24 May 2017: 56-61.

Kim KY, Kim Y, Park J, Kim YS. Real-Time Performance Evaluation Metrics for Object Detection and Tracking of Intelligent Video Surveillance Systems. Asia Pac J Contemp Educ Commun Technol. 2016: 173-179.

Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv: Comput Vis Patt Recog. 2020 Apr 23; 1-17.

Redmon J, Farhadi A. YOLO9000: Better, Faster, Stronger. Proc IEEE Comput. Soc Conf Comput Vis Pattern Recognit. 2018; 1-9.

Bernardin K, Elbs Er, Stiefelhagen R. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. Eurasip J Image Video Process. 2008: 246309.

Redmon J, Farhadi A. YOLOv3: An Incremental Improvement. arXiv:1804.02767v1 [cs.CV]. 2018 Apr 8: 1-6.

Carranza-García M, Torres-Mateo J, Lara-Benítez P, García-Gutiérrez J. On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles using Camera Data. Remote Sens. 2021; 13(1): 89.

Renjie X, Haifeng L, Kangjie L, Lin C, Yunfei L. A Forest Fire Detection System Based on Ensemble Learning. Forest. 2021; 12: 217.

Tian Z, Shen C, Chen H, He T. FCOS: Fully Convolutional One-Stage Object Detection. Proc. of the IEEE/CVF. Int Conf. Comput. Vis. 2019: 9627-9636.

Abdulmunem I A, Harba E S, Harba H S. Advanced Intelligent Data Hiding Using Video Stego and Convolutional Neural Networks. Baghdad Sci J. 2021; 18(4): 1317.

Asroni A, Ku-Mahamud K R, Damarjati, C, Slamat H B. Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network. Baghdad Sci J. 2021; 18(2(Suppl.)): 0925-936.

Drid K, Allaoui M, Kherfi ML. Object Detector Combination for Increasing Accuracy and Detecting More Overlapping. Objects.Image and Signal Processing. 2020: 290-296.

Hassan NF, Abdulrazzaq HI. Pose Invariant Palm Vein Identification System using Convolutional Neural Network. Baghdad Sci J. 2018; 15(4): 0502-509.