They can afford post it notes. Doesn't take a genus to work that one out.
Students who get good grades have better passwords than their less academically successful peers, though this finding should be considered alongside several caveats. JV Roig, consulting director and software developer at Asia Pacific College (APC) in the Philippines, wanted to find out whether school smarts had any bearing on …
Thursday 10th May 2018 19:54 GMT Anonymous Coward
Friday 11th May 2018 08:30 GMT jvroig
Re: Pass the salt...
Hi, I'm JV, the paper author.
The methodology is described in detail in the paper, but the cliffs are:
1.) The process is no different to how a normal login works: only hashes are stored in a database, so whatever you typed is just hashed first before being compared in the authentication database. This is how any modern login system never really needs to process a plaintext password, other than to hash it to start the actual comparison.
2.) In our case specifically, since we control internal systems, whenever a user submits a password to login, we basically just hash it according to Troy Hunt's list standard (SHA-1). That SHA-1 of the password is then compared to Troy Hunt's list. If we get a match, then we know it's a breached/compromised password. We don't know what password that was, though. To make sure we handle all data ethically, we never store plaintext passwords within the research infrastructure. We also don't even store usernames - we also did some sort of hashing of usernames themselves, so that even the user identities are anonymized within the research databases (yes, despite not even storing passwords in the first place)
In general, what we did was really no different than what the NIST June 2017 guidelines mandate. We just figured that process would contain interesting information, so we just tried to see what interesting data can be uncovered.
Friday 11th May 2018 10:35 GMT d3rrial
Saturday 12th May 2018 11:38 GMT jvroig
Re: Pass the salt...
This is JV, the paper author.
Yes, indeed SHA-1 is broken. We don't use SHA-1 for storing the ACTUAL passwords. In fact, Asia Pacific College **DOES NOT STORE PASSWORDS AT ALL** (caps only to emphasize, not shout).
We manage to function despite not storing any password by simply off-loading that whole problem to Microsoft - we're a Microsoft Showcase School, and our single-sign-on back-end rely on Microsoft's Active Directory hosted on Office 365 (cloud). In effect, Microsoft's own infra, and none of ours, ends up storing actual passwords, in whatever form.
So what was the role of SHA-1 in the experiment? It was merely to be able to match Troy Hunt's list. He gave away a list of 320M (now 500M+) breached passwords. But to make sure that list is next-to-impossible to weaponize, he didn't release them in plaintext: instead, every password in that list has been transformed to its SHA-1 hash. So his list, ultimately, is a list of SHA-1 hashes.
On our end, then, when a user logs in, we just pass the password through Microsoft's single-sign-on API. For this research, we added an extra thing - after passing to Microsoft and getting a confirmation of user validity, that same password we just passed is hashed using SHA-1, then sent to our research infrastructure that hosts Troy Hunt's list of SHA-1 "passwords". If there's a match, we know the password is compromised, even though we never end up knowing what that actual password is.
I hope this clears it up. The confusion is understandable. The paper itself has this detail, but of course it's probably too technical to be included in the media article above.
Friday 11th May 2018 17:00 GMT anothercynic
Saturday 12th May 2018 11:38 GMT jvroig
Re: Pass the salt...
No, we haven't really been using that. The entire research activity that produced this paper (and the more expansive paper currently under conference consideration and peer review) has mostly been centered around matching with Troy Hunt's breach corpus and the NIST "match-against-bad-passwords" guideline.
We'll definitely consider looking into having that included in the design of follow-up experiments.
I'm very glad this Register article attracted a fellow security researcher. I hope reading the paper itself gave you some new insights into your own job, as your comment here has given us an interesting thing to consider for our future ones! [science++]
Thursday 10th May 2018 20:21 GMT katgod
Smarter people are more secretive? Smarter people have more to hide? Smarter people are more paranoid? There are a number of take aways from this and of course as someone else pointed out if you arrange the data correctly you can prove just about anything, also known as P-haking if I am not mistaken.
Friday 11th May 2018 07:33 GMT Alister
Friday 11th May 2018 10:55 GMT LucreLout
How about: smarter people may have a larger vocabulary or a greater imagination to come up with and then remember more complex passwords?
Or, to shorten that for those without good grades: Smarter people are less dumb.
I guess we can add a lifetime of getting pwned to the list of self inflicted harm stemming from f***ing around at school, along with crap grades, crap education, crap jobs, and crap salaries. I am of course, widely generalising here.
Work hard at school kids, or you'll have to work a helluva lot harder after it.*
* Based again on a sweeping generalisation that manual low skilled jobs are harder than more cerebral career jobs. I've done both, and while manual work was boring, it wasn't specifically hard - you just got paid crap all per hour so had to do a lot of hours.
Thursday 10th May 2018 20:25 GMT Mark 85
Thursday 10th May 2018 20:27 GMT Korev
Thursday 10th May 2018 22:02 GMT The Nazz
Jeremy Kyle on catchup.
Perhaps the less (academically) intelligent ones are watching JK on catchup?
Or, better still, applying to appear on the show online.
Password required. : Must be at least one letter in the range A to Z. Other funny shaped symbols on keys also accepted.
Typical Captcha :
Is this :
a) a Kat
b) a doggie, or
c) summat else, innit.
d) how many close family members have you punched today?
Any key press is acceptable.
Maybe the actual *intelligent* ones are out at work in the cash economy, paying the minimum of tax and living a very nice lifestyle?
Thursday 10th May 2018 22:11 GMT Jay Lenovo
Friday 11th May 2018 05:37 GMT Dodgy Geezer
Friday 11th May 2018 06:57 GMT Chris G
Friday 11th May 2018 08:36 GMT jvroig
Well, it doesn't really work that way (author of the cited paper here)
This paper - which I just uploaded to arXiv instead of any peer-reviewed conference or journal - is just a side effect of a larger-scale effort that centered mostly around compliance to NIST's latest (June 2017) guidelines regarding password handling. A more expansive paper with more relevant statistics (% of compromised password use, length, gender, etc) is currently under consideration and review for an international conference presentation (so I'm not sure this comment section is the best area to expound on it.) The cited paper in this article, compared to that paper current under review, is really more of a curiosity. In fact, the paper itself mentions that - as this article itself quotes at the end.
Aside from that, we (my team) do a lot of other things in cryptography, security, disaster recovery and databases. The cited paper here is pretty much one of the blander results/output we have. That it got covered here is a surprise to me. I had no idea there are journalists who scour arXiv (a pre-print sharing site), and certainly no idea Tom would find it worth writing up.
What this article doesn't share, though, is that the methodology section in the paper also shares a useful thing: a simple way to implement the NIST guidelines of checking against compromised passwords. Not only does it point potential readers to Troy Hunt's password trove, it also lays out the process and potential implementation that can be used to adhere to the new NIST password guidelines.
Friday 11th May 2018 07:14 GMT Potemkine!
Saturday 12th May 2018 10:43 GMT Anonymous Coward
Re: WTF is GPA?
Whilst a high Grade Point Average is to some extent a measure of how smart someone is it's also a very good measure of how good their memory is.
If a persons memory isn't good and they choose a very complex password they won't remember it, get it reset and choose something simpler that they can remember. As such the paper's finding does not come as a massive surprise. The phrase "No shit Sherlock" springs to mind, which incidentally is a weak password despite being 16 characters long and mixed case.
Friday 11th May 2018 09:22 GMT FlamingDeath
I must be a genius
All my passwords are unique pseudo-random alphanumerics, quite lengthy too, heck some even change over time (OTP) and sometimes 2FA or 3FA
I always knew really, I didn't need their confirmation of my genius
It's a shame the weakest link is probably the shonky coding of the web application, because the company fails to properly invest in resources, especially IT security
How many breaches have we seen?
How many have we yet to hear about?
How many will we potentially never hear about?
My lengthy unique pseudo-random alphanumerical passwords don't looks so great now, but its better than having qwerty, or the password being the same as the user name (that one is a favourite of managed services companies BTW) - Convenience!
Friday 11th May 2018 09:36 GMT Sheepykins
Humanity is just inherently lazy.
The question shouldn't be "how complex are they making their passwords?" But rather, "What steps are we taking to ensure the passwords are created to be complex?"
1. Default character limit
2. Add numbers, symbols, and uppercase
3. Rotated at minimum every 3 months
What can we do to improve upon that?
2FA is a good start, personally if I were smart enough I'd create a password creation system that doesnt allow proper words from a dictionary at all.
Friday 11th May 2018 10:44 GMT defiler
I'd create a password creation system that doesnt allow proper words from a dictionary at all.
Which dictionary? And how short do the words have to be to be excluded?
"A"? "I"? How about "is", "at", "to", "on" and "or"?
Not trying to pick holes...
Ah shit, it's Friday. Of course I'm trying to pick holes. But all facetiousness aside, my point about "which dictionary" is still valid.
Also, the more rules you put on a system, the more you reduce the search space for brute-force attacks.
Friday 11th May 2018 14:12 GMT Robert Helpmann??
Lazy, lazy people
1. Default character limit
2. Add numbers, symbols, and uppercase
3. Rotated at minimum every 3 months
What can we do to improve upon that? 2FA is a good start, personally if I were smart enough I'd create a password creation system that doesnt allow proper words from a dictionary at all.
2FA is a really good start. Definitely none of this biometric, my fingerprint is both my UID and my password crap. How about a check by sites that rely on password using a hash comparison much as was done for this study?
As far as not allowing proper words, if you just rely on the math, you could allow it if you stipulated a minimum number of words be used to get the same degree of complexity a more standard password requiring upper, lower, numeric and special characters. You might also have to adjust hashing to avoid collisions due to the greater number of characters involved. An unabridged English dictionary has about 470,000 entries (https://www.merriam-webster.com/help/faq-how-many-english-words). Knocking that down to most common words, let's call it 100,000, still gets you huge variability. More educated people are apt to have a larger vocabulary, but less educated are more likely to misspell words, so from this very loose analysis there is little practical difference in terms of resistance to brute force or dictionary attacks.
A four word pass phrase, assuming any may be capitalized, would yield somewhere around 1.6E21 combinations. Assuming 100 possible characters for use with a more standard style password, it would have to be 10 or 11 characters in length to achieve the same.
Perhaps an interesting follow up on this might be passwords as used by mobile users vs those generated from a regular keyboard.
Friday 11th May 2018 10:10 GMT Pat Harkin
Saturday 12th May 2018 11:37 GMT jvroig
Re: Sample size 1252
This is JV, the author of the paper.
Those things you are asking for is actually in the original paper, linked to in this Register article by Tom.
But I don't want to hash them out here (you can easily look at the paper), because the more central point is this: As this article's parting words go (which literally are just a quote from how the paper itself also ends): these findings are a curiosity, and not the end-all-or-be-all. It's nothing more than an interesting data point for potential series of future experiments.
You shouldn't treat the results here as anything more than "hmmm... interesting, we should do more and possibly bigger experiments like this...", because even the author (that's me) thinks exactly just that. No more, no less.
Friday 11th May 2018 10:58 GMT NonSSL-Login
The article says that 215 students hashes were in Troys database and states this was down to bad/unsafe passwords. Wrong. They are in Hunts databases because they happened to be signed up to websites that got hacked. There is no relation to IQ at all.
Maybe a correlation between how many sites someone signs up to, or quality of site, could be made relating to IQ but that is not what the article says.
Saturday 12th May 2018 23:40 GMT jvroig
(JV Roig here, the author of the cited paper)
re: "The article says that 215 students hashes were in Troys database and states this was down to bad/unsafe passwords. Wrong. They are in Hunts databases because they happened to be signed up to websites that got hacked. There is no relation to IQ at all."
Yes, that's actually a pretty good point, although I would like to say it's a little bit more nuanced than that. Not everyone on the list would have been personally pwned - some have just settled on really bad passwords (especially passwords or patterns that look strong, but really aren't due to predictability).
For example, if their password is outwardly (seemingly) "strong" / "secure" (say, it's long and has some substitutions), the chances of another person thinking of the same password should be statistically nil (infinite combinations vs very finite human users) - unless of course, the chosen "secure" password is actually predictable due to human nature. Microsoft itself actually has some interesting research findings about this, particularly about how users will predictably choose passwords given very strict, arcane password requirements.
If they are in the habit of re-using passwords, then that password (and habit itself) really should still be considered unsafe. A way to think about that would be: a password can be unsafe because of "physical" characteristics (e.g., too short), "psychological" characteristics (e.g., too predictable based on human nature, such as adding "1" at the end of a common password when asked to use at least one number), or "environmental" characteristics (e.g., an otherwise strong password that is now much more vulnerable [weak] due to being used widely in different sites).
So either way, unsafe.
But as mentioned in my opening, yours is a very good insight, thanks. That's exactly why any interesting findings, be they just curiosities at the moment, are shared - so that others can see them and then potentially draw from them interesting insight, such as yours. Further studies, for example, can now explicitly look for, attempt to mitigate, or control for, or simply measure the impact of, that factor. Science at work!
Friday 25th May 2018 12:55 GMT NonSSL-Login
Hi JV, thanks for the reply.
Having read the article I appear to have missed the fact that Troy had shared a database of hash's and the comparison was done against that. Apologies. I usually blame lack of caffeine for my mistakes as otherwise I would never make any *cough* :)
There are many variables in passwords that cross the IQ barrier. Not everyone has a job related to their IQ as ambition and other factors are involved but if you generalise that a manual labourer may have a lower IQ than a director of a company you may expect the labourer to have a weaker password. A fair percentage of the time it's their favorite sports team or player with maybe a capitalised first letter and a number at the end if the signup forces those attributes. Yet a lot of CEO's will also use their sports team as a password too.
There are differences where say a fruit seller on a stall in London may have a football(soccer) related password, a CEO who went to Oxford might have a Rugby related password as social economic groups also play a part.
At the same time both groups may use a password based on a crush/partner/kids or dog or a date of birth.
Both high IQ and low IQ people know they are supposed to have good passwords. Is it down to IQ about who puts the effort in?
Sunday 27th May 2018 06:15 GMT jvroig
Hi Non-SSL Login,
Yeah, that's exactly right. There should be ZERO difference, from my point of view (see my last comment posted before this one, where I explain what my original assumption was when the experiment started).
As I wrote in my last comment, I even pre-wrote the abstract to highlight how the results show that weak passwords stem from issues of disinterest, not intelligence (as expected). It's just that the data (limited by the constraints as noted in the paper) showed otherwise when it finally came in, so I have to write the paper as the data says, instead of what I think it should say. All I could do, though, was note all the constraints at the conclusion, as well as emphasizing that with all the limitations, this is just a curiosity and should only be regarded as a first step in a series of more-refined experiments.
In the end, I'm sure improvements in data size (more people), different environments (different school, different company), and improvements in the measurement tool itself (updated breach corpus, adding localization) will end up showing that intelligence is not really a big factor when it comes to weak passwords or sub-optimal password habits. I'm squarely in the camp that human/psychological issues are the main determinant, hence the updated NIST / Microsoft Research guidelines that are less technical and more user-friendly.
Friday 11th May 2018 14:12 GMT Jtom
Naw. I know an idiot who has the best password protection. He creates an eighteen-character (or the longest permitted length) password of random alpha-numeric, upper/lower case, and special characters, does not maintain a copy of it, and doesn't try to memorize it. Then he resets the password everytime he wants to log in.
Saturday 12th May 2018 08:34 GMT Anonymous Coward
"Then he resets the password everytime he wants to log in."
Provided nobody else has access to his email account, that isn't too insecure.
But how does he log in to his email account?
For a few annoying companies that I trade with perhaps once a year and that want me to set up an account, I admit I often do this. But then the email account I use follows good password practice.
Saturday 12th May 2018 11:38 GMT jvroig
Thanks for sharing that very nice insight, which I'll quote here for user-friendliness to those who would end up reading this comment.
Jtom said: "Naw. I know an idiot who has the best password protection. He creates an eighteen-character (or the longest permitted length) password of random alpha-numeric, upper/lower case, and special characters, does not maintain a copy of it, and doesn't try to memorize it. Then he resets the password everytime he wants to log in."
That is exactly a common way users deal with complex password rules. This is why the NIST's latest guidelines (as well as Microsoft's latest research publication on the matter) both point toward a saner set of rules for the future, such as:
1.) No more arcane requirements.
2.) Let them use ANY character they want, even emojis. No character should be off limits. (Looking at you, old bank systems)
3.) Just demand a minimum length
4.) Check the password against a (regularly-updated) list of the most common bad / breached passwords, and notify users.
Basically, these new rules DECREASE the technical requirements, so that we end up decreasing the mental load on (very uninterested) users. In other words, we're decreasing the technical features in an attempt to increase the human/psychological features. And authentication (passwords, remembering them, typing them...) being a very human-centric activity, users can use all the psychological-friendliness they can get. The end goal? We hope, at least, that users would then have a far easier time actually thinking of good passwords, not write them down on a post-it, and IT's time will be wasted less on recurring "please reset my password" support calls.
We're betting, basically, that accounting for human factors, instead of just solely technical factors like most of the traditional password policies, will make the whole thing much better.
Thanks for the interesting comment, Jtom!
Saturday 12th May 2018 20:08 GMT eionmac
minimum length but what maximum length
These rules seem sane, but how is user to know cut off length of password database field? Do you have a fixed or unfixed maximum length. I say 14 minimum is maximum 40 or 400 or 4000?
Clue I usually use two lines of poetry (mixed up) but each line from a different language so often have 60 to 80 character length easily remembered passwords. I find no information on many sites re maximum length, (Fault of defining maximum is it limits password searches to fixed x to y characters for any attacher's convenience.
Sunday 13th May 2018 07:39 GMT jvroig
Re: minimum length but what maximum length
Hello, eionmac, that's a very interesting question.
Your question in essence is "how will the user know what the MAX LENGTH of an acceptable password is?". Let's call that "Question 1". The way you specifically phrased it, though, leads to another very interesting nuance - the length of the password field itself (within the database), as it relates to the maximum acceptable length of any password. Let's call this "Question 2".
For Question 1, that's something that authenticators / sites / apps / systems should clearly tell the user. As far as NIST is concerned, though, the maximum should be long enough so as to be practically unlimited as far as the vast majority of users are concerned (say, 100-200 characters). The fact that we need to limit them at all is simply to avoid potential DOS (denial-of-service) attacks. This is something the security community in general learned the hard way when the Django framework was found to be DOS-able just by copy-pasting several MBs of text into the password field and submitting. The back-end then has to hash that giant piece of text, possibly creating a huge bottleneck in the server.
For Question 2, it's important to understand that whatever the policy is for the maximum password length a user can have, it has no relation to the length of the password field in the database. This is because you should only be storing the *HASH* of the password - therefore, whether your password was only 5 characters long, or 500 characters long, and whether the salt is 4 characters or 40 characters, in the end the final value to be stored is always the length of the resulting hash.
For sites that don't specify a maximum length, yes, that is a scenario that should certainly be improved. The maximum can easily be set to something like 1000 characters without much threat of an easy DOS (effectively that means we'll just be hashing 1KB of data, which isn't really a problem, but that could depend on your specific algorithms and implementation, as well as the capabilities of your back-end server/s and expected volume of users). Providers/authenticators/services/sites should then give that sort of information clearly. The only reason most sites don't do that right now is, in my view, they have too many complex rules. By the time we move on, and the rules that users need to know are just 2 or 3 things, this situation will probably solve itself.
Saturday 12th May 2018 15:43 GMT John 48
Does a password found on "Have I Been Pwned?" actually indicate *it* was breached?
I may be misunderstanding this, but I was under the impression that if a set of access credentials are listed on Troy's site, it generally indicates that the credentials were collected from a compromised web server or similar. I.e. the presence on the list alone can't be taken as an indication that the credentials themselves were necessarily weak, just that in many cases they were inadequately protected by a site that was hacked.
Sunday 13th May 2018 08:31 GMT jvroig
Just sharing some details from the paper
Hello, Register community.
I'm JV Roig, the paper author. I just spent the past two days answering some of the more interesting comments, and I haven't really had the time (until now) to actually make a post that I wanted - a post that shares some interesting things about the paper itself and how it came about.
First, let me go ahead and share what I believe is the actual interesting findings (which the article does not share). This is the table of results when users with breached passwords are categorized per GPA tier:
GPA tier |
total students here |
students using breached passwords |
% of students here with breached passwords]
3.5 - 4.00 | 39 | 5 | 12.82%
3.0 – 3.49 | 203 | 32 | 15.76%
2.5 – 2.99 | 446 | 66 | 14.80%
2.0 – 2.49 | 464 | 92 | 19.83%
1.5 – 1.99 | 99 | 19 | 19.19%
1.0 – 1.49 | 1 | 1 | 100.00%*
*The single student who had a lower than 1.5 GPA also happened to use an unsafe password. At least he/she is consistent, eh?
This table shows students sorted according to tiers of GPA. The top GPA tier is pretty low at 12.82%, and gets up to 19% near the bottom. The rise isn't perfectly linear, but you can clearly see that the trend goes up.
I wish the article itself showed this (instead of the data snippet that ended up being featured instead), but Tom Claburn (due to timezone difference with me) did not have enough time to reach me before needing to publish. I woke up with this article already done and published, so all I could do was join the forum for some interesting discussions.
****** ABOUT THE EXPERIMENT ***********
This experiment took us over 3 months to finish data collection, since we basically had to let the midterms and finals periods conclude. Those are the peak seasons of system activity (professors submitting grades, students checking grades online, etc), and so most users are captured by our login hooks during those periods.
***** WHAT I ORIGINALLY THOUGHT THE DATA WOULD SAY **********
Another thing I'd like to share is that well before I got data to crunch, I already thought I knew how this would turn out - with absolutely no difference between the highest-GPA group and the lowest-GPA group. In fact, my abstract then (I prepared in advance because I thought I knew what the data would say) concluded with:
"Correlating these with academic performance data from each student’s grade history, the researchers found no relationship between a student’s academic performance and the likelihood of using a weak, compromised password. Our results suggest that weak passwords aren’t because of any level of intelligence, but simple disinterest or lack of awareness. Relevant password policies should be formulated taking this into account."
Once the data was crunched, I had to revise the abstract as necessary.
******** GOING FORWARD *************
I still believe, however, that this result is a fluke due to how much fewer students are in the top GPA tier, and the overall low population of students sampled. There's also concern that the metric used here is too blunt (all these concerns are noted in the paper's conclusion).
A far bigger sample of students (from our sister school) will shed more light to it, which is already a planned follow-up study. In the end, repeated experiments and studies (across more institutions) would likely converge on my original (planned) conclusion - the reasons for weak passwords are more psychological than intellectual.
But, until those studies come in, I really had no choice but to describe the data as it arrived. There's a small difference found here, but this really should be taken as more of a curiosity and a first step in what should be a series of more experiments.