So..
Jump starting the (inevitable) subversion of iPhone X security then?
With the Internet already groaning under the weight of d!ck pics and facial scanning increasingly used instead of passwords, do we really want AI to turn flat images into accurate 3D renderings of their content? A group of boffins from the University of Nottingham and Kingston University think so, and alarmingly so do 400,000 …
When the X was introduced I saw an article quoting a guy works with 3D facial recognition in the high end security market. He said the iPhone X appeared to have the necessary sensors to tell the difference not only between a perfectly printed 3D mask and a real face, but between a living face and the head of a corpse. Its not just the eyes that can help tell the difference, but also translucence of living skin and areas of different temperature due to blood flow, etc.
He said that the software to do this is very complex so he doubted they'd be able to do it from day one, but that Apple could improve it over time. So by the time someone is able to use this software to create a perfect 3D mask with pig eyes or something, it may not fool Face ID (or if it does, won't fool it for long)
He said that the software to do this is very complex so he doubted they'd be able to do it from day one, but that Apple could improve it over time
There's also the fact that computational needs go up sharply with resolution, and from what I've read so far, this is one that Apple has addressed quite comprehensibly. I just wish they hadn't done this animoji rubbish, maybe it's just me but it actually took away from the achievement.
Front views of faces show that the two sides are usually asymmetric - one thinner than the other. Using a mirror image of each half produces two different faces.
Does this process manage to find reasonable clues about this in a particular face? It seems unlikely if they are working on just a profile rather than a three-quarter angle.
You mean above the neck.
I'd have used lighting / highlight / shadow cues for a particular face coupled to a database of ethnic head shapes related to face outline (chin shape), eye shape etc. Morph the best match database head to fit position of cheeks, eyes, mouth, nose and face outline etc.
I don't see why ML or CNN is needed.
Why doesn't it do the hair line / hair interface better?
> I don't see why ML or CNN is needed
Because there isn't enough information in one photograph to create a 3D mesh using the traditional software methods (which calculate depth of features from points on multiple photographs). The ML is required because to create a convincing 3D mesh from an image the software must make assumptions about the way that people's faces work (eg Are dark pixels shadow or areas of darker skin pigmentation?), assumptions gleaned from prior 'knowledge' of human faces.
Mage is kinda right. You can run a lighting model backwards to infer the geometry of a scene without CNN special sauce. That pretty much astronomy, but people have been doing it with photos of daily life.
I think ML is just, in essence, deriving a more optimal model. Instead of a naive phong model it derives a complicated set of cases. Again, look at astronomy, where if a particular patch of light with a particular colour flashes in a particular way then we know it's distance. If the patches of light around it are a particular form, then they will be about the same distance away, and so on.
> You can run a lighting model backwards to infer the geometry of a scene
It's that very process of *inferring* which requires prior 'knowledge', if the input data is limited. With two photos of a scene for input it is easy for software to distinguish between a flat plane (a billboard advertising a car) and a 3D object (a real car) - it doesn't require prior 'knowledge' of cars, merely planes. One single photo doesn't give you simple path.
When we create a lighting model of a scene we use HDR 24bit images of that scene - basically that means that each pixel can be brighter or darker than what can printed or shown a moniter so that lightbulbs aren't merely white like paper is white, but *bright*. There are systems that attempt to infer limited geometry from a HDR image, but merely as regards the position of light and shadow - easy enough if lightbulb pixels hold values of an order of magnitude greater than floor or sofa pixels. The results are often a good enough approximation. Normal web images give a far cruder approximation.
If this software can do that from one 2D photo, just think how much better of an image future software could make from TWO or more 2D photos. Enough to fool, say, an iphone X with supposed failsafe ID technology.
That's the thing with that iphone X - Apple say there's a 1 in a million chance of it being fooled by the wrong face, as opposed to 1 in 50,000 chance of the fingerprint reader giving a false positive. BUT there's a 100% chance that two of my photos are easily accessible by any member of the public, and a very small chance that my fingerprint is.
It is hard to know exactly what they mean by 1 in a million. Do they mean there are 7,000 people in the world who look enough like me to unlock my phone? Or does this have to do with how the algorithm works? Obviously there has to be some 'fuzz' built in or if you wake up with puffy eyes you wouldn't look like you anymore and your phone would refuse to unlock.
Perhaps the algorithm that converts your face data into a "mathematical representation" effectively works like a perfect hash table with a million entries. In that case the one in a million false positives could be totally random - your second cousin Eddie who everyone says looks exactly like you might not be a match, but some rural Chinese grandma is.
> If this software can do that from one 2D photo, just think how much better of an image future software could make from TWO or more 2D photos. Enough to fool, say, an iPhone X with supposed failsafe ID technology
Using multiple photos is a very different way to infer depth. Even with two photos, accuracy of inferred depth would never be close to that detected by a system projecting its own reference grid.
Also, say you did have a Weta-quality virtual head stored digitally - how do you create the hardcopy required to fool the phone's sensors? Some sort of lightfield projector? Isn't it just easier to threaten the owner with a wrench?
This post has been deleted by its author