Alexa data set is poor
Not least because it contains a lot of duplicate hosts but also because it's not very representative httparchive recently switched to using the CrUX dataset, which is both more representative because Google has all those websites providing the anonymised data, and doesn't have duplicate hosts. This data set puts https at around 75% websites.