Let me get this straight
Information is published on website, properly blocked by robots.txt. Google does not index it, but websites shady crawlers do. Google indexes these websites.
Now Google is supposed to go through its database, check for each website whether the information on it is the same as on the website it is not supposed to crawl in the first place, then remove it. Hmmmm...