Early in March, Google let loose a beta previewing an AI to classify videos – and it only took a month for University of Washington boffins to defeat it. The academics' approach is trivial: all the researchers (led by PhD student Hossein Hosseini) did was inserted a still photo “periodically and at a very low rate into videos …

COMMENTS

Post your comment

House rules Send corrections

Add to 'My topics'

Tuesday 4th April 2017 07:25 GMT Lee D

Because it's not AI.

It's not "seeing" anything.

It's just taking a statistic generated from the data and finding the nearest data point to that statistic in its database and then returning it.

None of this stuff is "AI". This is why web filters are still just huge lists of human-checked websites, rather than something scanning each page to tell the difference between a pair of stockings modelled on the M&S website and something much more nefarious.

Stop calling it "AI".

Hell, stop calling it "recognition".

At best, it's an "algorithm".

37 7 Reply
1. Tuesday 4th April 2017 08:16 GMT Anonymous Coward
  
  I couldn't agree more, we've literally been discussing this in the office. As far as I'm concerned IA doesn't exist yet, not by a long shot. Come back and tell me it exists when a system is flexible, in that it doesn't perform a single repetitive task (an algorithm). I'll believe it exists when I can ask a system to do something it wasn't specifically designed to do, like this video recognition malarkey ordering me a pizza.
  
  3 5 Reply
2. Tuesday 4th April 2017 08:20 GMT LionelB
  
  It's just taking a statistic generated from the data and finding the nearest data point to that statistic in its database and then returning it.
  
  No, it's not doing anything like that. Please find out how deep-learning systems actually work before posting fatuous comments.
  
  At best, it's an "algorithm".
  
  Yes, of course it is. Computers run algorithms - that's what they do. Whether you call it an "AI algorithm" (I wouldn't) or a "pattern recognition algorithm" (I might) is a matter of what you think those terms ought to mean.
  
  15 5 Reply
3. Tuesday 4th April 2017 09:04 GMT Charlie Clark
  
  It's just taking a statistic generated from the data and finding the nearest data point to that statistic in its database and then returning it.
  
  This is patently not the case because it's classifying by an outlier. This tells us a lot about the configuration of the algorithm but not much more. As for testing intelligence: I'm sure it wouldn't be too hard to come up with something similar that could "fool" humans. Indeed there was a Horizon programme devoted to just that several years ago. But in terms of video there is also the classic colour changing card trick.
  
  Heuristics themselves rely on an underlying statistical model or models.
  
  11 0 Reply
4. Tuesday 4th April 2017 09:58 GMT Anonymous Coward
  
  "It's just taking a statistic generated from the data and finding the nearest data point to that statistic in its database and then returning it."
  
  No, it most certainly is not.
  
  6 1 Reply
5. Tuesday 4th April 2017 11:31 GMT John Smith 19
  
  "taking a statistic generated from the data...nearest data point to that statistic in its database"
  
  Except the spoofing suggests that it's not.
  
  By a very wide margin.
  
  5 0 Reply
6. Tuesday 4th April 2017 19:36 GMT The Man Who Fell To Earth
  
  legendary Gorilla researcher Jane Goodall.
  
  Jane Goodall worked on Chimpanzees, not Gorillas.
  
  3 0 Reply
  1. Tuesday 4th April 2017 23:09 GMT Anonymous Coward
    
    Re: legendary Gorilla researcher Jane Goodall.
    
    > Jane Goodall worked on Chimpanzees, not Gorillas.
    
    Oh, you must be talking about the actual Jane Goodall. The legendary one worked on gorillas. :-)
    
    2 0 Reply
Tuesday 4th April 2017 07:28 GMT Syd

Isn't this more of a tiny "bug" than a fundamental problem? Like the old trick of putting misleading stuff in the HTML meta-tags, the defeat is easily defeated - just ignore frames that are out of place?

Of course there will potentially be an arms-race, just as there still is in trying to... er... "optimise" Google search results; but that doesn't mean Google search is fundamentally broken, any more than this is?

5 2 Reply
1. Tuesday 4th April 2017 07:52 GMT Mage
  
  Is it a bug?
  
  No, it's a fundamental flaw in how all so called AI is hyped, presented, marketed. And how it actually works.
  
  If it was a tiny bug it could be fixed.
  
  I see this on AI junk mail filters.
  
  9 3 Reply
  1. Tuesday 4th April 2017 08:14 GMT Lee D
    
    Re: Is it a bug?
    
    If you have to insert an explicit rule, it's not AI. It's a human-written heuristic.
    
    If the machine can't learn on its own, it's not AI either. It's a human-controlled heuristic.
    
    If you have to spend your life telling it "Oh, and look out for this explicit thing that you get wrong", then you may as well just write a list of rules.
    
    And the exact problem with these "deep learning" machine algorithm things is that you can't just say "Oh, take this into account", because they aren't written that way, they've learn from the data.
    
    No, you have to go back, create test cases for every imaginable scenario, spend years training it on all of them and hope it picks up on what it was doing wrong. And then someone comes up with, say, picture-in-picture which confuses it again. Back to square one retraining on that too.
    
    So you can't use it for, say... crowd-based facial recognition (as is often advertised as a use case for such things), or self-driving vehicle cameras, because it could flag ANYTHING at any point just by being sufficiently distracted - even with NO knowledge of its underlying training or algorithms. And you can't train it on every possible scenario well enough that someone trying to catch it out can't just make it flip.
    
    Imagine telling even a 2-year-old that they're going to need to win the toy by getting the video right. And you show them a video with a thousand frames of tigers, and one frame of an Audi. Would you really ever expect them to press "Car" instead of "Animal"? This system is no better trained than a 2-year-old human, in that case, who can do a lot more besides.
    
    10 6 Reply
    1. Tuesday 4th April 2017 08:31 GMT LionelB
      
      Re: Is it a bug?
      
      If you have to insert an explicit rule, it's not AI. It's a human-written heuristic.
      
      You might well argue, though, that natural (human) intelligence is a massive mish-mash of heuristics, learning algorithms and expedient hacks assembled and honed over evolutionary time scales.
      
      That's what general (i.e., non-domain-specific) AI is up against - and yes, it's hard, and we're nowhere near yet. And of course it's hyped - what isn't? Get over it, and maybe even appreciate the incremental advances. Or better still, get involved and make it better. Sneering never got anything done.
      
      8 1 Reply
      1. Tuesday 4th April 2017 09:12 GMT DropBear
        
        Re: Is it a bug?
        
        So your reply to "homoeopathy is not medicine" is "write a new treatise on it and make it better!" Yup, got it.
        
        1 10 Reply
        
        Tuesday 4th April 2017 09:24 GMT LionelB
        
        Re: Is it a bug?
        
        So your reply to "homoeopathy is not medicine" is "write a new treatise on it and make it better!" Yup, got it.
        
        Sorry, but that's a fantastically lame "analogy".
        
        12 1 Reply
        
        Tuesday 4th April 2017 17:37 GMT DropBear
        
        Re: Is it a bug?
        
        "Sorry, but that's a fantastically lame "analogy"."
        
        So is the notion that current "AI"-anything has anything to do with intelligence.
        
        1 1 Reply
        
        Wednesday 5th April 2017 13:08 GMT LionelB
        
        Re: Is it a bug?
        
        @DropBear
        
        LionelB wrote earlier:
        
        That's what general (i.e., non-domain-specific) AI is up against - and yes, it's hard, and we're nowhere near yet.
        
        IOW, I don't entirely* disagree with you. I just thought your analogy was crap.
        
        *OTOH, I don't think "real" AI (whatever that means) is unattainable - always a duff idea to second-guess the future (cf. previous unattainables, like heavier-than-air flight, or putting humans on the moon). Basically, I don't believe that there are magic pixies involved in natural (human) intelligence.
        
        0 0 Reply
        
        Tuesday 4th April 2017 21:57 GMT John Brown (no body)
        
        Re: Is it a bug?
        
        "Sorry, but that's a fantastically lame "analogy"."
        
        Yeah. What this thread needs is a car analogy. Preferably an Audi one.
        
        1 0 Reply
    2. Tuesday 4th April 2017 12:36 GMT LionelB
      
      Re: Is it a bug?
      
      No, you have to go back, create test cases for every imaginable scenario, ...
      
      Sorry, no. You seem to have a total misconception as to how machine-learning in general, and "deep-learning" (a.k.a. multi-layer, usually feed-forward) networks in particular, function. You seem to have latched onto the bogus idea that a machine learning algorithm needs to have "seen" every conceivable input it might encounter in order to be able to classify it.
      
      In reality, the entire point of machine-learning algorithms is to be able to generalise to inputs it hasn't encountered in training. The art (and it's not quite a science, although some aspects of the process are well-understood) of good machine-learning design is to tread the line between poor generalisation (a.k.a. "overfitting" the training data) and poor classification ability (a.k.a. "underfitting" the training data).
      
      It's a hard problem - and while the more (and more varied) the training data and time/computing resources available, the better performance you can expect, I'd be the last person to claim that deep-learning is going to crack general AI. Far from it. But it can be rather good at domain-specific problems, and as such I suspect will become a useful building-block of more sophisticated and multi-faceted systems of the future.
      
      After all, a rather striking (if comparatively minor) and highly domain-specific aspect of human intelligence is our astounding facial recognition abilities. But then we have the benefit of millions of years of evolutionary "algorithm design" behind those abilities.
      
      4 1 Reply
  2. Tuesday 4th April 2017 08:27 GMT LionelB
    
    Re: Is it a bug?
    
    If it was a tiny bug it could be fixed.
    
    What makes you so sure it can't be fixed? FWIW, I suspect it is probably not a "tiny" bug, but may not actually be that hard to fix (off the top of my head I can imagine, for example, a training regime which omits random frames, or perhaps a sub-system which recognises highly uncharacteristic frames, which might mitigate the problem).
    
    This research may well turn out to be rather useful to Google (although I'd also be slightly surprised if they weren't aware of something similar already).
    
    3 0 Reply
    1. Tuesday 4th April 2017 09:27 GMT iTheHuman
      
      Re: Is it a bug?
      
      I assume you mean "emits"? As long as backpropagation applies the right weight, it will.
      
      It does remind me of the ai version of a honeypot.
      
      A paper was making the rounds recently that described a universal method to fool, iirc, cnn, by applying a distortion field over an image. The images remained recognizable by humans. What seemed to be forgotten was that the algorithm is now a part of new training sets which will make the systems more robust.
      
      1 0 Reply
2. Tuesday 4th April 2017 10:01 GMT Anonymous Coward
  
  "Isn't this more of a tiny "bug" than a fundamental problem?"
  
  Probably. It'll likely be some artefact of the use of frames rather than the concept of continuous video, coupled with artificial objects being *much* easier to learn and later classify than natural ones. Particularly natural ones that have spent millions of years evolving to resist being seen.
  
  It's an interesting attack vector, going after the feature extraction rather than the learning or classification phases. Not a new one, but probably the single most vulnerable one, given that ANNs are black boxes.
  
  3 0 Reply
Tuesday 4th April 2017 07:59 GMT Ralph B

Googlebomb v2

So, if you replace one frame in 50 of a Trump video with goatse ... ?

3 1 Reply
1. Tuesday 4th April 2017 13:03 GMT Swarthy
  
  Re: Googlebomb v2
  
  Why would you do that? I mean, the original source is a flagrant arse-hole, what would adding goatse accomplish?
  
  2 0 Reply
Tuesday 4th April 2017 08:05 GMT Ben1892

Isn't that working as intended?

Isn't the idea that a human doesn't have to watch hours of footage to find possibly nefarious images inserted into seemingly innocuous videos ? the machine learning algorithm just needs to be taught that videos can be about more than one topic, because humans are sneaky like that

5 0 Reply
Tuesday 4th April 2017 08:33 GMT cb7

The algorithm is clearly flawed

The real problem is why the algorithm places such a heavy weighting on what effectively amounts to only 2% of the footage.

8 0 Reply
1. Tuesday 4th April 2017 09:03 GMT VinceH
  
  Re: The algorithm is clearly flawed
  
  And/or on what it recognises more easily and more quickly. Possibly.
  
  i,e, if it was flipped - so using the example described in the article, if it was a video about a car, with the odd frame of a tiger inserted, I suspect it would correctly identify it as a video about a car. It needs to be programmed (I won't say taught at this stage) to use the predominant subject matter.
  
  3 0 Reply
2. Tuesday 4th April 2017 11:40 GMT John Smith 19
  
  "why the algorithm..such a heavy weighting on..only 2% of the footage."
  
  Got it in a nutshell.
  
  These "deep learning"* algorithms are meant to be "robust" which implies not deflected by minor disturbances. IE "I am 98% confident this video is about cats, but there is a 2% chance it's got something to do with cars." So 95% certain it's about a cars is an Epic Fail.
  
  *which makes the work sound so much more insightful than multi-layer neural network, which actually explains what (AFAIK) all of these algorithms actually are, and therefor suggests people could actually work out how they work.
  
  2 0 Reply
  1. Tuesday 4th April 2017 12:12 GMT LionelB
    
    Re: "why the algorithm..such a heavy weighting on..only 2% of the footage."
    
    @John Smith 19
    
    Yes, deep-learning networks (usually) are just multi-layer networks - but that doesn't imply that "people could actually work out how they work". It's notoriously hard to figure out the logic (in a form comprehensible to human intuition) of how multi-layer networks arrive at an output. I believe the so-called "deep-dreaming" networks were originally devised as an aid to understanding how multi-layer convolutional networks classify images, roughly by "running them in reverse" (yes, I know it's not quite as straightforward as that).
    
    2 1 Reply
    1. Tuesday 4th April 2017 19:44 GMT Adrian 4
      
      Re: "why the algorithm..such a heavy weighting on..only 2% of the footage."
      
      it's possible that the algorithm weights frames according to its confidence in those frames. So if the tiger frames were muddied by surrounding distractions but the car shots were clear and contrasty, it would have more confidence in those images than the tiger.
      
      This isn't really a bug, it's a complication in the tuning. Especially when given input deliberately intended to deceive. Human recognition is also prone to deception - so called 'optical illusions' - and is far more difficult to correct.
      
      0 0 Reply
3. Tuesday 4th April 2017 23:16 GMT Anonymous Coward
  
  Re: The algorithm is clearly flawed
  
  > The real problem is why the algorithm places such a heavy weighting on what effectively amounts to only 2% of the footage.
  
  Same way the rest of us watch a soporific ten minute YouTube video because there was a scantily clad female on the thumbnail preview?
  
  1 0 Reply
Tuesday 4th April 2017 08:46 GMT Anonymous Coward

Car adverts

Sounds like a typical car advert to me - big cat, prowling around on rocks, with about 1 in 50 frames showing an actual vehicle.

14 0 Reply
Tuesday 4th April 2017 08:59 GMT Anonymous Coward

Reminds me of a story I read of a "learning AI" system designed for the army to recognize tanks in the battlefield - trained with lots of photos of Salisbury plain on a day when there were tank exercises against a set of photos of idectical locations on a day with no tanks present. AI system seemed to learn to detect tanks with high level of accuracy - however when they tried it on a live demo it was hopeless. After investigation they realized the day they took photos with tanks was sunny and the day without was overcast so the system had actually learnt to detect presence of clouds

9 0 Reply
1. Tuesday 4th April 2017 09:10 GMT Charlie Clark
  
  I remember watching an old Horizon on neural networks about this. However, since then the technology has, and our understanding of it, has involved significantly. Not that it's infallible, as this article demonstrates only too well, but whether it can be used in situations and perform at least as well as humans for the same task. This is already the case with still image recognition: video is considerably harder but correctly identifying the brand of the car in the "blipvert" is pretty impressive and, incidentally, tells us a lot about the training data that has been used: Google is going to make more money from brands than tigers…
  
  2 0 Reply
Tuesday 4th April 2017 09:25 GMT anthonyhegedus

Artificial stupidity is much harder to fake than artificial intelligence.

2 0 Reply
1. Tuesday 4th April 2017 18:39 GMT Blank Reg
  
  There's already more than enough real stupidity in the world, no need to go creating synthetic versions.
  
  0 0 Reply
Tuesday 4th April 2017 09:50 GMT Anonymous Coward

So and advertising agency

being fooled by subliminal advertising.

How ironic.

5 0 Reply
Tuesday 4th April 2017 10:17 GMT GingerOne

It's a beta!

The clue is in the title. Jeez people, chill. Google released a beta and some other independent people found a flaw. This is literally the whole fucking point of a beta!!!

3 1 Reply
1. Tuesday 4th April 2017 13:07 GMT Swarthy
  
  Re: It's a beta!
  
  Except that this is Google. "Beta" is their word for "Release".
  
  0 0 Reply
Tuesday 4th April 2017 11:20 GMT Robert Carnegie

For advertisements

There are a lot of advertisement videos where the product that the ad actually is for only appears for a few frames of the video, if at all. Not necessarily with cars, ads tend to show the car driving around or sometimes unrealistically leaping like an acrobatic gazelle, but Google has probably decided that it is a car advert.

1 0 Reply
Tuesday 4th April 2017 11:43 GMT John Smith 19

One day

Law enforcement types will realize Person of Interest is not a documentary.

0 0 Reply
Tuesday 4th April 2017 12:22 GMT Anonymous Coward

Alternative viewpoint

It has actually correctly identified some data that had been concealed in an otherwise innocuous video.

Imagine if terrists put bomb making tutorials into a film by interspersing 1 frame in 50... 49 frames of, say Debbie Does Dallas to 1 frame of How To Make A Bomb. A 2 minute video embedded in a 2 hour video.

This machine learning / AI / pattern recognition / whatever would pick that up, which is surely a good thing.

1 0 Reply
Tuesday 4th April 2017 13:02 GMT Draco

Let's think about how Google might have implemented this.

(1) Google already automatically classifies images, so it is reasonable to assume they would try to leverage / reuse their image classifiers.

(2) Since video is simply a bunch of still images, it is reasonable to assumed Google simply takes stills from the video and passes it to their existing (and trained) image classifiers.

(3) It is pointless to process every-single-frame in the video because that would be prohibitively expensive and there really isn't much change from frame to frame.

(4) Google, probably, selects only the key frames (I-frames) for classification. (Depending on computation cost, Google may drop key frames if they are similar to other key frames - why classify the same, or very similar image, over and over again. Of course, this depends on whether image classification is more expensive than comparing two key frames).

(5) It should be obvious that every inserted image is an I-frame, so it WILL be classified.

(6) Google has some algorithm (or neural net) that tries to boil down the contents of a video (several thousands or millions of images) into a single classification. Clearly, if you have film walking about in a city, you will have cars, buildings, people, trees, etc. Google's classifier has to come back with a single answer. This is, probably, weighted by the confidence of the original classifications. Cars, buildings, laptops, and food plates probably have a higher level of classification confidence.

I imagine that Google will, over time, tweak the final classification to give more weight to duration of a single classification rather than confidence of classification (or perhaps some admixture of the two).

On the other hand, I could be blowing smoke since I have absolutely no idea how Google is doing this, but is the way I would approach it.

(A pint, because we could spend hours arguing over how this is or should be implemented - or even if it should be implemented at all.)

4 0 Reply
1. Tuesday 4th April 2017 13:08 GMT Anonymous Coward
  
  Re: Let's think about how Google might have implemented this.
  
  I would also be willing to lay money on this being an I-frame thing. It's a sensible, obvious optimisation to the ML pipeline, leveraging the work done by the video compression to extract the "key" information. Easily accounts for the surprisingly high levels of confidence in the predictions - the NN is going to "see" almost nothing but the inserted image.
  
  2 0 Reply
2. Tuesday 4th April 2017 14:20 GMT I am the liquor
  
  Re: Let's think about how Google might have implemented this.
  
  Good point. Although they've only changed 2% of the frames in the video, they'll have changed a lot more than 2% of the information.
  
  1 0 Reply
Tuesday 4th April 2017 16:31 GMT Andrew Jones 2

So.... the system is designed to accurately spot computers more than anything else is the takeaway I get from this. The "AI" is clearly designed to seek out other possible AI friends...... Not chilling at all...... honest.....

0 0 Reply
Tuesday 4th April 2017 18:44 GMT Blank Reg

Troll Detector

"AI to spot trolls is easily defeated by typos, deliberate misspellings of offensive words, or bad punctuation"

So how about marking the posts with typos and bad grammar as trolling. Maybe then people will be forced to learn the difference between their, there and they're.

2 0 Reply
Tuesday 4th April 2017 22:33 GMT FelixReg

Did you assume my gender?

We can't assume the purpose of the ML these guys used was to find the tiger. What if the purpose was to find the anomalous Audi? This is a general problem philosophers have been working on since forever.

Think: optical illusions. Is it possible to build a machine that's without such illusions?

1 0 Reply
Wednesday 5th April 2017 01:15 GMT JeffyPoooh

"A.I. is hard."

To be fair, human eyes can be fooled by zillions of different optical illusions.

Difference is, if we (humans) walked over towards "the Audi", we might realize something was wrong when it growled at us.

2 0 Reply
Wednesday 5th April 2017 01:26 GMT JeffyPoooh

The amazing thing is this...

It's a Audi.

So it would have been four inches away from whatever it was following. Thus the Audi grill emblem would have been obscured by the vehicle in front, the one suffering 'The Full Audi'. Thus the AI determined it was an Audi just by the tiny gap to the vehicle in front. Therefore the Googly AI has been watching Top Gear reruns.

Next conclusion is that Google self-driving cars can't see tigers.

2 0 Reply
Thursday 20th April 2017 22:19 GMT ReflectOnLight

I think it worked correctly.

If there was a picture of something different every x number of frames then I would argue that there were two videos here. One of the lion and one of the car both being played back to the AI. That it classified the lower frequency one correctly seems to indicate it worked somewhat correctly given that it should give one answer. A human may not see the car because of the persistence of vision aspect of the human eye and its low bandwidth but the AI would see it. And with the picture of the car being identical in all cases, see it better than it saw something that continually moved around and changed shape like the lion. Effectively, the lion was variable noise, the car a constant if infrequent feature.

0 0 Reply