Intro
Ok, all you barista reading, you don't need to worry about your job being replaced by a robot barista just yet. However, the power of the law of accelerating returns is not to be underestimated. Just a year about a year and a half ago, Bosh unveiled at CES a robotic system that would make your coffee using a fully automatic machine and print your name on the cup. At this year's CES, a new robot from Beijing showed that it can make a cappuccino using a traditional espresso machine with a steaming wand completely on its own (save the cleaning). In this article you will see the results of what remarkable results you can achieve by retraining Google's convolutional neural network, Inception on just over 1000 coffee and tea images.
Methods
The images I used come from five drink categories on ImageNet: hot chocolate, drip coffee, cappuccino, tea and turkish coffee. I had to weed out some broken files and images that surprisingly were clearly did not fit their category. I could have also downloaded latte pictures but saw the data set was filled with pictures that also could have passed as cappuccino from my judgement. The distinction between the two is somewhat arbitrary as they are both essentially just espresso and steamed milk.
Results
Here I downloaded some novel images to see how the classifier stacks up. I expected the network to struggle to distinguish between tea and drip coffee but it did surprisingly well:
Final test accuracy = 77.4% (N=106)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Discussion
Before training on five drinks, I had additionally trained Inception on just two categories: images of coffee and a random selection of "not coffee" images from ImageNet. The test results were very high (96.7%). However, when fed non-coffee images of drinks such as tea, hot chocolate or even an empty cup, the network was easily fooled. Perhaps there is some advantage in training two networks independently - one to identify a drink in the image and susequently pass it through another network to classify what kind of drink. But it would be much simpler to add "not coffee" as another category along with adequate images to the five drinks network.
Conclusion
If a robot barista is going to make specialty coffee and talk about coffee with its customers, it should definitely look at drinks and be able to tell what they are. There is still work to be done on this subject but we are making remarkable progress.