Object detection has long been an application of computer vision plus machine learning to detect certain classes of semantic objects in digital images and videos.
What is Object Detection?
Object detection technology usually refers to detecting the position and corresponding category of an object in an image. It is a basic task, and image segmentation, object tracking, key point detection, etc. Object detection is a computer technology related to computer vision and image processing. A well-studied area of object detection includes face detection and peer detection. Object detection has applications in many fields, including computer vision, image retrieval, and video surveillance.
In object recognition, it is necessary to distinguish what objects are in the picture, the input is the picture, and the output is the category label and probability. The object detection algorithm not only needs to detect what objects are in the picture but also outputs the outer frame (x, y, width, height) of the object to locate the position.
Object detection is to accurately find the location of the object in each picture and mark the category of the object. The size of the object varies widely, the angle and posture of the object are uncertain, and it can appear anywhere in the picture, not to mention that the object can be of multiple categories.
The Difference Between Image Classification, Object Detection, and Image Segmentation:
- Image classification: The input image often contains only one object, and the purpose is to determine what object each image is. It is an image-level task, which is relatively simple and has the fastest development.
- Object detection: There are often many objects in the input image, and the purpose is to determine the location and category of the object, which is a core task in computer vision.
- Image segmentation: The input is like object detection, but it is necessary to determine which category each pixel belongs to, which belongs to the pixel-level classification. There are many connections between image segmentation and object detection tasks, and models can learn from each other.
The Difference Between Traditional and Deep Learning Object Detection:
- Traditional object detection: Traditional object detection, before deep learning is not involved, is usually divided into three stages: region selection, feature extraction, and feature classification.
- Region selection: Select the position of the object that may appear in the image. Since the position and size of the object are not fixed, the traditional algorithm usually uses the sliding windows algorithm, but this algorithm will have many redundant frames and the computational complexity is high.
- Feature extraction: After the object position is obtained, a manually designed extractor is usually used for feature extraction. The quality of feature extraction is not high because the extractor contains fewer parameters and is less robust by human design.
- Feature classification: Classify the features obtained in the previous step, usually using classifiers such as SVM and AdaBoost.
- Object Detection with Deep Learning: The large number of parameters of the deep neural network can extract features with better robustness and semantics, and the classifier performance is better.
The Difference Between Object Detection and Other Computer Vision Problems:
The difference between computer vision and image recognition classification is that image recognition classification provides localization positioning operations, while object detection includes two tasks localization and classification. In the real world, more object detection techniques can be used, because the photos taken are of high complexity and there may be multiple target objects. The identification classification task can only identify one of the more significant ones, and the object detection task can identify multiple ones.
Going a step further from object detection, it is desirable not only to find objects in the image but also to find the pixel mask for each detected object, a problem called Instance Segmentation.
Object Detection Performance Indicators for Object Detection:
- Intersection over Union (IoU): The IoU measures the overlap between the candidate frame selected by the model and the actual frame, which also represents the accuracy of the candidate frame selected by the model. It is a value between 0 and 1. Usually, a threshold is set in practice to set the accuracy of the candidate frame selected by the model. Candidate boxes that do not reach the iou threshold are discarded. The most used threshold is 0.5, that is, if loU>0.5, it is considered true detection, otherwise it is considered as false detection.
- mean Average Precision (mAP): MAP = Sum of mean precisions of all classes divided by all classes, i.e., the mean of mean precisions of all classes in the dataset. In binary classification, the mean precision (AP) measure is a summary of the precision-recall curve, and MAP = the sum of mean precision over all classes / all classes, which is the average of the mean precisions of all classes in the dataset. Taking the average means that mAP avoids detecting strong in some classes and weak in others.
mAP is usually computed for a fixed IoU, but many bounding boxes can increase the number of candidate boxes. Calculates the mean of mAP of variable IoU values to penalize many candidate boxes with misclassification.
The Main Algorithm of Object Detection:
- Traditional target detection algorithm: Cascade + HOG/DPM + Haar/SVM and many improvements and optimizations of the above methods.
- Deep learning algorithm: Target detection algorithms are mainly based on deep learning models, which can be divided into two categories:
- Two-stage detection algorithm: Divide the detection problem into two stages, first generate region proposals, and then classify the candidate regions (generally need to refine the position). The typical representative of this type of algorithm is the R-CNN algorithm based on region proposal, such as R -CNN, Fast R-CNN, Faster R-CNN, etc.
- One-stage detection algorithm: There is no need for the region proposal stage, and the category probability and position coordinate value of the object is directly generated, which are more typical algorithms such as YOLO and SSD.
The performance indicators of the target detection model are detection accuracy and speed. For accuracy, the target detection should consider the positioning accuracy of the object, not just the classification accuracy. In general, the two-stage algorithm has an advantage in accuracy, while the one-stage algorithm has a speed advantage. 91ÊÓƵ¹ÙÍøever, with the development of research, both types of algorithms have been improved and improved.