A Virtual Robot Barista is Learning to See what You Are Drinking

Intro

Ok, all you barista reading, you don't need to worry about your job being replaced by a robot barista just yet. However, the power of the law of accelerating returns is not to be underestimated. Just a year about a year and a half ago, Bosh unveiled at CES a robotic system that would make your coffee using a fully automatic machine and print your name on the cup. At this year's CES, a new robot from Beijing showed that it can make a cappuccino using a traditional espresso machine with a steaming wand completely on its own (save the cleaning). In this article you will see the results of what remarkable results you can achieve by retraining Google's convolutional neural network, Inception on just over 1000 coffee and tea images.

Methods

The images I used come from five drink categories on ImageNet: hot chocolate, drip coffee, cappuccino, tea and turkish coffee. I had to weed out some broken files and images that surprisingly were clearly did not fit their category. I could have also downloaded latte pictures but saw the data set was filled with pictures that also could have passed as cappuccino from my judgement. The distinction between the two is somewhat arbitrary as they are both essentially just espresso and steamed milk.

Results

Here I downloaded some novel images to see how the classifier stacks up. I expected the network to struggle to distinguish between tea and drip coffee but it did surprisingly well:

  
Final test accuracy = 77.4% (N=106)

	`python label_image.py five_drinks/cappucino_test.jpg cappuccino (score = 0.74192) hot chocolate (score = 0.15428) tea (score = 0.06182) turkish coffee (score = 0.02311) drip coffee (score = 0.01887)`
	`python label_image.py five_drinks/drip_coffee_test.jpg drip coffee (score = 0.60740) turkish coffee (score = 0.19728) tea (score = 0.11440) hot chocolate (score = 0.07358) cappuccino (score = 0.00735)`
	`python label_image.py five_drinks/drip_coffee_test2.jpg drip coffee (score = 0.58516) tea (score = 0.31279) hot chocolate (score = 0.04909) turkish coffee (score = 0.03188) cappuccino (score = 0.02107)`
	`python label_image.py five_drinks/hot_chocolate_test.jpg hot chocolate (score = 0.64988) tea (score = 0.27281) cappuccino (score = 0.04756) drip coffee (score = 0.02354) turkish coffee (score = 0.00621)`
	`python label_image.py five_drinks/hot_chocolate_test2.jpg drip coffee (score = 0.50001) hot chocolate (score = 0.37443) turkish coffee (score = 0.06313) tea (score = 0.04177) cappuccino (score = 0.02066)`
	`python label_image.py five_drinks/hot_chocolate_test3.jpg cappuccino (score = 0.53695) hot chocolate (score = 0.30773) drip coffee (score = 0.07558) tea (score = 0.04336) turkish coffee (score = 0.03638)`
	`python label_image.py five_drinks/tea_test.jpg tea (score = 0.71377) hot chocolate (score = 0.21653) turkish coffee (score = 0.03549) drip coffee (score = 0.03029) cappuccino (score = 0.00393)`
	`python label_image.py five_drinks/tea_test2.jpg tea (score = 0.87573) hot chocolate (score = 0.08557) drip coffee (score = 0.02440) turkish coffee (score = 0.01111) cappuccino (score = 0.00318)`
	`python label_image.py five_drinks/tea_test3.jpg tea (score = 0.73230) drip coffee (score = 0.13145) turkish coffee (score = 0.08367) hot chocolate (score = 0.03333) cappuccino (score = 0.01925)`
	`python label_image.py five_drinks/turkish_coffee_test.jpg hot chocolate (score = 0.52486) turkish coffee (score = 0.19466) tea (score = 0.14934) cappuccino (score = 0.06875) drip coffee (score = 0.06238)`
	`python label_image.py five_drinks/turkish_coffee_test2.jpg turkish coffee (score = 0.89664) cappuccino (score = 0.05632) hot chocolate (score = 0.02193) tea (score = 0.01747) drip coffee (score = 0.00763)`
	`python label_image.py five_drinks/turkish_coffee_test3.jpg hot chocolate (score = 0.42020) turkish coffee (score = 0.26819) drip coffee (score = 0.14916) tea (score = 0.08913) cappuccino (score = 0.07332)`

Discussion

Before training on five drinks, I had additionally trained Inception on just two categories: images of coffee and a random selection of "not coffee" images from ImageNet. The test results were very high (96.7%). However, when fed non-coffee images of drinks such as tea, hot chocolate or even an empty cup, the network was easily fooled. Perhaps there is some advantage in training two networks independently - one to identify a drink in the image and susequently pass it through another network to classify what kind of drink. But it would be much simpler to add "not coffee" as another category along with adequate images to the five drinks network.

Conclusion

If a robot barista is going to make specialty coffee and talk about coffee with its customers, it should definitely look at drinks and be able to tell what they are. There is still work to be done on this subject but we are making remarkable progress.