What is British or German styling and can a machine tell the difference?

Intro

I was walking home the other day this week, looking at the various cars on the street German, Japanese, American and so on and it got me thinking. What is it that I find more visually appealing about German and Japanese cars? Why do I react differently to cars from certain countries? Is there something fundamental about British styling, Italian styling, and so on? Google not too long ago open-sourced a pre-trained, 40+ layer deep neural network called Inception. Alongside, there is a tutorial, Tensorflow for Poets, which shows how to use this network and train the last layer so you can have it classify between your very own image datasets!

Methods

I followed the tutorial and then tried it again with a data set of about 500 images, about a fifth from each of five car origins: Germany, Japan, United States, Italy and Great Britain. There are some Google Chrome extensions you can use to bulk download Google Images results. I had to clean up what I downloaded and delete some photos that were out of place. The photos are about 10-20 kb in size, which makes them faster to train with.

Results

While training with flower data set given in the tutorial yieled around 90% test accuracy, the network achieved just 73.1% on my much smaller dataset of car photos. Keep in mind, if one were to make random guesses, they would most likely average 20% accuracy. Below are some additional test results for some novel downloaded images, 2 from each car origin. The network does make some funny predictions, but other predictions are very good:

  
2017-07-22 04:57:55.607929: Step 490: Train accuracy = 76.0%
2017-07-22 04:57:55.608473: Step 490: Cross entropy = 0.764508
2017-07-22 04:57:55.916217: Step 490: Validation accuracy = 58.0% (N=100)
2017-07-22 04:57:58.682838: Step 499: Train accuracy = 74.0%
2017-07-22 04:57:58.683158: Step 499: Cross entropy = 0.778337
2017-07-22 04:57:58.989198: Step 499: Validation accuracy = 60.0% (N=100)
Final test accuracy = 73.1% (N=52)

	`label_image.py german_test.jpg italian (score = 0.25074) german (score = 0.22876) japanese (score = 0.21431) american (score = 0.15670) british (score = 0.14949)`
	`label_image.py japanese_test.jpg american (score = 0.27312) german (score = 0.25250) british (score = 0.21902) italian (score = 0.18549) japanese (score = 0.06987)`
	`python label_image.py british_test.jpg british (score = 0.53528) american (score = 0.20071) german (score = 0.15943) italian (score = 0.08448) japanese (score = 0.02009)`
	`python label_image.py italian_test.jpg italian (score = 0.58346) german (score = 0.19087) american (score = 0.08351) japanese (score = 0.07983) british (score = 0.06233)`
	`python label_image.py american_test.jpg german (score = 0.47028) british (score = 0.25817) american (score = 0.23781) italian (score = 0.02941) japanese (score = 0.00434)`
	`label_image.py german_test_2.jpg german (score = 0.45688) british (score = 0.23318) italian (score = 0.15506) american (score = 0.12687) japanese (score = 0.02800)`
	`label_image.py japanese_test_2.jpg japanese (score = 0.91580) german (score = 0.03405) american (score = 0.01899) italian (score = 0.01780) british (score = 0.01336)`
	`python label_image.py british_test_2.jpg german (score = 0.46645) british (score = 0.23532) italian (score = 0.12662) american (score = 0.08663) japanese (score = 0.08498)`
	`python label_image.py italian_test_2.jpg italian (score = 0.58812) british (score = 0.13577) japanese (score = 0.12287) german (score = 0.12016) american (score = 0.03308)`
	`python label_image.py american_test_2.jpg german (score = 0.48871) british (score = 0.36794) american (score = 0.08926) italian (score = 0.02975) japanese (score = 0.02435)`

Discussion

This neural network is not going to outsmart a car fanatic quite yet. However, for a network trained on around 500 images, and all hyperparameters left untouched, it is quite impressive that it can guess better than 3x what random guessing would result in. Having cars from a very large time span added to the network's difficulty in making predictions. Design languages are not mutually exclusive from origin to origin so it can be difficult even for a car fanatic to make some distinctions. For example, the Kia Motors aquired a German designer to design the Kia Optima a few years back and it naturally had some German design cues.

Conclusion

There is certainly a correlation between the country of origin of various car and their respective design cues. The network could not simply read the badges for the various cars, so it had to recognize actual design features. The network did not do a very good job, but there is plenty of room for improvement.

Future work

Since neural networks always work better with more data, adding more training examples would certainly yield better results. The hyperparameters could be tweaked. We could see how a simpler convolutional neural network with fewer layers would fare (this would require going beyond simply retraining Inception and creating our own network). The dataset could also be refined and cleaned to include only cars from a specified era to see if this would give better results.