Silicon Valley is easily one of my favourite shows. It has impeccable comedic timing, a splendid cast and is the perfect caricature for all things tech. One of my favourite scenes from the show is when one of the aspiring entrepreneurs, Jian Yang, develops a seemingly useless app which classifies food as “hot dog” or “not hot dog”.
I will do no justice describing the scene, so here is a video.
Warning: The video contains coarse language.
Non-surprisingly, there already exists many versions of SeeFood. No doubt, this is a billion dollar idea. My boyfriend and I decided to take a stab at it to gauge the level of difficulty of making a simple binary classifier. You can find our repo here.
Collecting the Data
The first step is to find enough images of hot dogs and “not hot dogs” to teach our classifier what a hot dog looks like.
Using Selenium, we were able to put together a simple script which crawls Google Images for hot dogs. After a couple of hours, we had 2,000 images of hot dogs downloaded.
To create the “not hot dog” class, we sampled a variety of images from ImageNet. Having a large variety of these images is important because it prevents the classifier from making decisions based solely on colour, size or shape.
The next step is what Jian Yang was complaining about. We manually looked through both groups to ensure that all images were properly classified. This step was incredibly tedious and made us very hungry.
Transfer Learning with Inception V3
Transfer learning is a machine learning technique which leverages knowledge learned from one model onto another. It saves us the time and effort of training our own classifier and makes up for the lack of data. For this exercise, we used the Inception V3 model pre-trained on ImageNet.
Inception V3 already has a hot dog class, but we decided to fine tune the model anyways to strengthen the classifier. Unlike the original Inception V3, the new classifier will only distinguish between hot dogs and “not hot dogs”.
To do this, we removed the soft-max layer of the pre-trained model (the layer which predicts the probabilities) and replaced it with a new layer that only predicts the two classes we care about.
Testing the Model
This step was tricky. We didn’t want to test the classifier with images that were potentially used to train Inception V3 or with the images we scraped. This means that no hot dog image found on the internet was safe.
It did, however, give us an excuse to buy and eat some hot dogs.
Out of the original images used for testing, Inception V3 classified 9/10 hot dogs correctly while our classifier classified 10/10 hot dogs correctly. Both classifiers were able to identify the “not hot dog”s with 100% accuracy.