Blog Details

img

Google AI’s New Object Detection Competition.

Just a few days ago Google AI launched an object detection competition on Kaggle called the Open Images Challenge. It’s great to see since the computer vision community hasn’t had such a new massive competition in a while.

For several years, ImageNet was the “gold standard” competition in computer vision. Many teams competed every year to get the lowest error rate on the ImageNet dataset. Thanks to deep learning, we’ve recently seen massive advances in the task of image recognition, even surpassing human level accuracy.

ImageNet was a huge competition, with 1000 different classes and 1.2 Million training images! The sheer scale of the data is really what made ImageNet so challenging. A very important thing that we have gotten out of doing such large scale competitions (in addition of course to learning how to classify images very well) is the feature extractors that we can use for other tasks. Feature extraction networks pre-trained on ImageNet are used in many other computer vision tasks including object detection, segmentation, and tracking.

In addition, the general style or design of the network is often adopted for these other tasks. For example, shortcut connections were originally used in the 2015 winning ImageNet entry, and have since been used in the vast majority of CNNs in computer vision! This is a great thing, when we can work on one, simpler task and it has huge carryovers to more complex but related tasks.

Google AI’s new object detection competition, hosted on Kaggle, is a step in that positive direction. Thus far, the COCO detection challenge has been the big one for object detection. But, in comparison to ImageNet, it’s quite small. COCO only has 80 categories and 330K images. It’s not nearly as complex as what you would see in the real world. Many practitioners often find object detection in the wild to be extremely challenging. At least ImageNet had a large enough dataset and enough classes that it became very useful for pre-training, and using the networks for transfer learning. Perhaps with a large enough dataset our object detectors can become just as great for transfer learning.