Back in the early 2000's when working for a major IT firm dealing with mainstream news and early social media, I created and the company patented a series of spiders to crawl the web looking for public profiles on websites. The thing was designed to self-learn using some rudimentary NLP. Data collected was processed so profiles across multiple sites could be linked to identify individuals based on common profile elements (username, profile pictures, declared location and interests etc, and some basic writing style analysis using NLP). User profile combinations were scored on how likely it was that two or more profiles were the same individual.
The best example was where the tool found one person on 44 different web sites, including CV sites and this guy's various interests. All the data mined was openly public and not restricted in any way besides your standard robots.txt file.
The project was never released onto the market. The Legal department decided it was 'too legally grey'. But it was demoed to several of the company's customers, including government departments and think tanks.
I wouldn't be surprised if services exist today that do the same thing.