A small number (71 ) of users can but often do not give good correlation.
When using a small number, those tend to have similar demographics/attributes.
Who were the testers?
Were they Apple Users, Windows Users, Linux...?
What were there ages?
The article specifies that the finding are not real-life.
"To get comparable, interpretable results from this experiment, we had to ask users to do very focused, short tasks on a single page. In real life, users don’t do tasks that way. They arrive to your site, and don’t know who you are or what you do. They navigate to pages, and don’t know for sure that they’ll find what they’re looking for there. They explore offerings and options."
To call it official, is less than scientific or honest. It is more likely, the writer of the article is not a fan (more likely a hater) of the flat design.