"But can they do it ... without the owner being present while creating the mask?"
You don't need to make a clay mask on their physical face.
You (literally) can glean much 3d information from a single photo inferred through shading and reflectivity. And with more photos, including side shots, allows further reduction in error.
This puts anybody with publicly available photos at risk - felons with mug shots on record, actors, models, politicians, Mark Zuckerberg (https://www.theverge.com/2017/9/18/16327906/3d-model-face-photograph-ai-machine-learning), Mr or Ms anybody with a bunch of photos on facebook.
But still, the victims would need to be targeted with planning - it's not a huge risk for someone who phone got pick pocketed at random.