back to article Google, YouTube, Twitter tell face-rec upstart Clearview to stop harvesting people's content – that's their job

Google, YouTube, and Twitter have sent cease-and-desist demands to Clearview, ordering the controversial startup to stop scraping people's photos from their websites to train its facial-recognition software. Last month, it emerged Clearview built a tool to match people's faces to their internet identities: you can show its …

  1. JohnFen Silver badge

    Well, that settles that

    > the startup’s founder and CEO Hoan Ton-That believed his New-York-City-based biz has a “First Amendment right to public information."

    It's rare that we get such a strong and overt indication that a company is just straight-up evil.

  2. IGotOut
    Megaphone

    Ok...

    So how exactly are they scraping and matching the images?

    With all this bullshit AI Google et al keep spouting, they seem unable to spot a bot trawling millions of and millions of profiles? I don't buy it at all.

    1. diodesign (Written by Reg staff) Silver badge

      Re: Ok...

      It's fairly trivial facial recognition - it's the scale and the source of the training data that's causing people to kick off.

      Here's how I'd do it. You take 3 billion pairs of images scraped from online profiles and URLs to those profiles. You train a convolutional neural network – or a series of networks – to map images to their source profiles. To make life easier, assign each profile an ID number. Thus a particular face will map to ID 1000, another to ID 1001, and so on.

      So when you show it a face, it predicts what the correct profile ID should be. Thus if you show it an image it hasn't seen before, it will try to map it to the closest matching face and its profile ID. You then turn that ID number into a profile. You now have a suggested identity for that input face.

      The neural network can output profile ID numbers with a confidence value, so a face could return ID 1001 90% confidence, ID 3000 70% confidence, ID 2000 10% confidence, etc. Just take the highest two or three.

      Depending on the training and input data preparation, the training process and the network architecture, it'll be accurate or not very accurate.

      As for the scraping: buy a lot of cloud instances and parallelize your curl fetches, crawling webpages, building a graph network.

      C.

      PS: Cache a copy of the page per URL so that if profiles disappear online, you still have a copy. These pages will contain stuff like names, personal info, links to other profiles owned by the person, etc

    2. wolfetone

      Re: Ok...

      Even in this day and age of Cloudflare or whatever service that likes to promote themselves as bot-killers/preventers, it's still incredibly easy to circumvent these services and scrape a website to your heart's content undetected.

  3. whoseyourdaddy

    What? Someone *OTHER* than the Russian government is doing this?

    Meh. I'm watermarking a copyright on on everything I share anyway. Figured people were dumb for using all the "isn't this cool?" filters on FB/Insta that age you 20 years, make you popular, add influencer creds, etc.

    "I guess you figured all this happens inside your phone.. LOL. Guess what?"

  4. Kevin McMurtrie Silver badge

    Google asking a company to not scrape everything? Is this the Chewbacca defense?

  5. Pascal Monett Silver badge

    Well, if he puts it like that, he just might have a point

    "So if it's public and it's out there and could be inside Google search engine, it can be inside ours as well."

    I've gotta say, I find it difficult to argue with that. If I can find that photo through Google, why can't I find through someone else ?

    I don't like what he's doing, but he does seem to have an argument.

    1. Graham Cobb

      Re: Well, if he puts it like that, he just might have a point

      I think there is a difference.

      Google reverse image search is really a lookup. It shows you pages where that image appears. An indexing function.

      Clearview is creating a profile of the person and is processing all the images it can find of the person in order to recognise an unknown image and show you the profile - not just pages containing the image.

      The technology involved may be similar but the purpose and effects are different. In GDPR it is the processing, and the purpose, which are important, not just the data.

  6. This post has been deleted by its author

  7. Anonymous Coward
    Anonymous Coward

    Google, YouTube, and Twitter have sent cease-and-desist demands to Clearview

    yeah, don't forget to lock them stable doors, eh.

    but hey, you can't say now that these big, nasty, evil google etc, do NOT care about your concerns, eh?

    1. SVV Silver badge

      Re: Google, YouTube, and Twitter have sent cease-and-desist demands to Clearview

      Do I seem to remember a fuss about something called Google News? Which merrily harvests "other people's content" for its own use and profit? I'm not surprised they've only sent cease and desist letters, rather than filed a lawsuit, as the company are only doing exactly what Google does. They would lose that lawsuit badly, and they know it.

      However, the who-cares-about-the-consequences attitude of the guy who has set up this Big Brother Inc is just another symptom of the decade we are entering, where computer surveillance for profit will get out of hand, and often go badly wrong, and we won't be able to stop it because hey, are you some sort of commie that hates capitalism or something..................

      1. matt 83

        Re: Google, YouTube, and Twitter have sent cease-and-desist demands to Clearview

        Isn't the difference here that in clearview's case they are accessing the pictures specifically in breach of the T&Cs.

        In the case of Google News, all the news sites wanted their content accessed by Google so it could be indexed and found in search results. They just didn't want Google to do anything more than display a bare link (no snippet of the content included). The fuss was about whether it was fair for Google to reuse small sections of a news story if you wanted them to index that news story. Stopping Google from harvesting your content is just a robots.txt away. It's just that most sites that disable Google's indexing find they're the ones who suffer the most.

  8. Plest

    Great

    I love how even if you've locked down your Hue network to a local subnet, they can still drive-by and infect a node independently and then infect the network. So the only answer is to switch off all your lights and sit in the dark until they patch this? Nice one!

    I have my Hue system on a closed network but with auto-updates and yet I notice my genuine Hue bulbs still have not had any patches since January last year, the hub was updated 22-Jan this year. So despite this problem being known about for a week or two Philips still have not issued any workable patches.

    I love the convenience of having software controlled lights, saving money my leccy bill is my main driver, but do I need to now start installing firewalls and isolating my lights onto their own VPN, looks like it.

    1. Anonymous Coward
      Anonymous Coward

      Re: Great

      You replied to the wrong article, mate!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020