We now see that these classifiers are not learning what a "cat" is, rather they are learning the types of images in which cats appear - in other words: cat in a context. Change the context and it mis-classifies.
This has cropped up dramatically in some instances. For example, scientists training a neural net to recognize skin cancers discovered they had instead trained a ruler detector -- images of cancerous lesions almost always have a ruler for scale.
Google's "deep dream" experiments also showed that a neural net trained to recognize barbells considered the beefy arm attached to them to be part of the object.
The "obvious" solution seems to be that the neural nets need to segment images into distinct objects and then classify the objects. This is not a trivial problem.
Indeed, that's the "general vision problem" that has stumped AI researchers since 1966, one of the great unsolved "hard" problems of computer science.