A Virtual Robot Barista is Learning to See what You Are Drinking

Re-training Tensorflow's Inception to classify coffee and tea images

Posted by Nicholas Schmidt on August 11, 2017

Intro

  Ok, all you barista reading, you don't need to worry about your job being replaced by a robot barista just yet. However, the power of the law of accelerating returns is not to be underestimated. Just a year about a year and a half ago, Bosh unveiled at CES a robotic system that would make your coffee using a fully automatic machine and print your name on the cup. At this year's CES, a new robot from Beijing showed that it can make a cappuccino using a traditional espresso machine with a steaming wand completely on its own (save the cleaning). In this article you will see the results of what remarkable results you can achieve by retraining Google's convolutional neural network, Inception on just over 1000 coffee and tea images.

Methods

  The images I used come from five drink categories on ImageNet: hot chocolate, drip coffee, cappuccino, tea and turkish coffee. I had to weed out some broken files and images that surprisingly were clearly did not fit their category. I could have also downloaded latte pictures but saw the data set was filled with pictures that also could have passed as cappuccino from my judgement. The distinction between the two is somewhat arbitrary as they are both essentially just espresso and steamed milk.

Results

  Here I downloaded some novel images to see how the classifier stacks up. I expected the network to struggle to distinguish between tea and drip coffee but it did surprisingly well:

  
Final test accuracy = 77.4% (N=106)
  
https://raw.githubusercontent.com/166inter/166inter.github.io/master/test_photos/german_test.jpg
                            
python label_image.py five_drinks/cappucino_test.jpg 
cappuccino (score = 0.74192)
hot chocolate (score = 0.15428)
tea (score = 0.06182)
turkish coffee (score = 0.02311)
drip coffee (score = 0.01887)

Image not found
                            
python label_image.py five_drinks/drip_coffee_test.jpg 
drip coffee (score = 0.60740)
turkish coffee (score = 0.19728)
tea (score = 0.11440)
hot chocolate (score = 0.07358)
cappuccino (score = 0.00735)

Image not found
                            
python label_image.py five_drinks/drip_coffee_test2.jpg 
drip coffee (score = 0.58516)
tea (score = 0.31279)
hot chocolate (score = 0.04909)
turkish coffee (score = 0.03188)
cappuccino (score = 0.02107)

Image not found
                            
python label_image.py five_drinks/hot_chocolate_test.jpg 
hot chocolate (score = 0.64988)
tea (score = 0.27281)
cappuccino (score = 0.04756)
drip coffee (score = 0.02354)
turkish coffee (score = 0.00621)

Image not found
                            
python label_image.py five_drinks/hot_chocolate_test2.jpg 
drip coffee (score = 0.50001)
hot chocolate (score = 0.37443)
turkish coffee (score = 0.06313)
tea (score = 0.04177)
cappuccino (score = 0.02066)

Image not found
                            
python label_image.py five_drinks/hot_chocolate_test3.jpg 
cappuccino (score = 0.53695)
hot chocolate (score = 0.30773)
drip coffee (score = 0.07558)
tea (score = 0.04336)
turkish coffee (score = 0.03638)

Image not found
                            
python label_image.py five_drinks/tea_test.jpg 
tea (score = 0.71377)
hot chocolate (score = 0.21653)
turkish coffee (score = 0.03549)
drip coffee (score = 0.03029)
cappuccino (score = 0.00393)

Image not found
                            
python label_image.py five_drinks/tea_test2.jpg 
tea (score = 0.87573)
hot chocolate (score = 0.08557)
drip coffee (score = 0.02440)
turkish coffee (score = 0.01111)
cappuccino (score = 0.00318)

Image not found
                            
python label_image.py five_drinks/tea_test3.jpg 
tea (score = 0.73230)
drip coffee (score = 0.13145)
turkish coffee (score = 0.08367)
hot chocolate (score = 0.03333)
cappuccino (score = 0.01925)

Image not found
                            
python label_image.py five_drinks/turkish_coffee_test.jpg
hot chocolate (score = 0.52486)
turkish coffee (score = 0.19466)
tea (score = 0.14934)
cappuccino (score = 0.06875)
drip coffee (score = 0.06238)

Image not found
                            
python label_image.py five_drinks/turkish_coffee_test2.jpg
turkish coffee (score = 0.89664)
cappuccino (score = 0.05632)
hot chocolate (score = 0.02193)
tea (score = 0.01747)
drip coffee (score = 0.00763)

Image not found
                            
python label_image.py five_drinks/turkish_coffee_test3.jpg
hot chocolate (score = 0.42020)
turkish coffee (score = 0.26819)
drip coffee (score = 0.14916)
tea (score = 0.08913)
cappuccino (score = 0.07332)

Discussion

  Before training on five drinks, I had additionally trained Inception on just two categories: images of coffee and a random selection of "not coffee" images from ImageNet. The test results were very high (96.7%). However, when fed non-coffee images of drinks such as tea, hot chocolate or even an empty cup, the network was easily fooled. Perhaps there is some advantage in training two networks independently - one to identify a drink in the image and susequently pass it through another network to classify what kind of drink. But it would be much simpler to add "not coffee" as another category along with adequate images to the five drinks network.

Conclusion

  If a robot barista is going to make specialty coffee and talk about coffee with its customers, it should definitely look at drinks and be able to tell what they are. There is still work to be done on this subject but we are making remarkable progress.