Is the way you select the data an algorithm? I thought it was just about collecting all the possible data and then taking some subsamples with random sampling.

There's a huge body of work on ML training methods. It's not "just about" anything short enough to put in a forum post.

You could spend several days just reading Adrian Colyer's summaries of ML-related papers in the morning paper archives. This is a field which has been around for decades and has been very active for the past one.

(Also, I'll note that random sampling is an algorithm.)

