* Posts by jvroig

19 posts • joined 11 May 2018

Boffin: Dump hardware number generators for encryption and instead look within

jvroig

Re: Interesting effect, wrong explanation

Transistor variability as a factor in making results unpredictable is really just to remove the obvious concern of "well, if the target machine is using an i7-7700K, then I can know the possible values from his RNG seeder just by buying an i7-7700K myself!" I call it the "same CPU" loophole, since it kinda makes sense that having the same CPU *should* result in collecting the same timing values (all else being equal, like OS and all other platform stack components).

But that's not so. In the cited Lawrence Livermore National Laboratory paper, they had thousands of identical servers, and not a pair had similar characteristics when they profiled them under similar load.

As for running the same task (again, after making sure it isn't optimized by the compiler, as our point is to "make the CPU work, get running time, rinse&repeat"), there are lots of factors there other than transistor variability. Data locality, cache access, temp, voltage, task scheduling and background tasks, thread migration, dynamic power and frequency scaling... there's a lot at play, and right now it's extremely hard to account for all of them. We just know it works, because we've tested in a wide variety of platforms, from an Arduino Uno microcontroller, Raspberry Pis ,small-core AMD/Intel, big-core AMD/Intel, etc.

The best we could do to make sure we minimize OS noise is to make sure each test is run with the absolute highest priority (nice -20). We also make sure each machine has minimal services running. For machines that are physically accessible, we also made sure to turn off network adapters.

The Arduino Uno is probably the best case. It literally does nothing but the uploaded sketch, which is just that loop over and over, and sending it to a laptop that collects the info. It still works.

Now, I have no doubt there needs to be more work done. If 10 years from now we want that generation of IT people to think "Huh? Why did you old-timers ever have a problem with seeding??? LOL, every CPU is an entropy machine, why did you guys ever need those HWRNGS?" and make OS RNG seeding a problem of the past and actively a non-concern, we should be working on simplifying the work loop (it has to be auditable, extremely so, to deter backdoors and other sorts of "oopsies"), testing on all platforms, and standardizing on the simplest, most auditable, and yet still effective technique across the board (all devices and platforms).

That's where I hope this research is headed. I want the simplest way of gathering entropy, so simple that it's auidtable in one scan of an eyeball, even on live environments. And I want this simplest way to apply, mostly identically, across all devices, from embedded, IoT, phones, laptops, large servers, everything. That's the blue sky scenario. When our tech gets to the point that seeding the OS RNG requires nothing but the CPU itself, and it only ever needs to do one standard algorithm across the board, then we've hit nirvana. Anyone who audits the seeding will expect the same thing everywhere, so it's harder to get confused, and therefore harder for anyone to get away with a backdoor in the seeding process. And if we rely on just any CPU (which, by definition, is what makes a device a device), then we know all our devices will get access to stronger cryptography. If we demand manufacturers to add diodes, HWRNGs, or come up with their own "safe" implementation of a noise source, you know they just won't do it (cost factor) or they'll screw it up (we can't have people roll their own crypto, that's why we need standards and research)

jvroig

Re: Does nobody do a literature search anymore?

You actually forgot one (which was less popular, less disseminated): Maxwell by Sandy Harris: https://www.kernel.org/doc/ols/2014/ols2014-harris.pdf

I saw all four: HAVEGE (2002 research from IRISA), haveged (Gary Wuertz implementation), Maxwell, and Jitter Entropy. I knew about HAVEGE & haveged from the start. I only learned about Maxwell and Jitter Entropy later on in the research. (Hi, I'm the cited paper author)

The main problem I have (and other researchers too - see, for example the Maxwell paper) with HAVEGE / haveged is that it's too complex (at least perceived), and seems to require specific CPU features and tuning for archs.

Jitter Entropy is a lot better, more recent, and actively maintained. It just does things that aren't necessary. In my view, that's why it's great for Linux, but will prevent it from scaling across all types of devices and platforms. (Also, Jitter Entropy MUST be compiled without optimization too. Stephan Mueller was pretty clear about that here: http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html)

The conference paper pre-print is very limited in details due to the page limit. However, I write more about the paradigm, key guiding principles, and implementation design of my work in the accompanying research website: https://research.jvroig.com/siderand I also deal quickly with key differences from HAVEGE/haveged, Maxwell, and Jitter Entropy.

Also, what else do you know works not just in C/C++ (because C/C++ has close-to-metal features that allow direct memory manipulation, like in HAVEGE/Jitter Entropy), but even in languages like PHP, Ruby, Python3, with a wealth of data? As far as I found, nothing else. Doing micro-benchmark timing aggregation is a straightforward way to guarantee platform-agnosticism, making implementations for any purpose simple and auditable.

Also, what else works that doesn't require a high-performance (nanosecond-precision) timer? Again, nothing else that I could find - not HAVEGE / haveged, Maxwell, or Jitter Entropy. In fact, my research so far works even for an Arduino Uno, which has an extremely simple processor (16 MHz only) and a very low res timer (4-microsecond precision only), showing a collection rate of 3,000 bits per second.

jvroig

Re: Hey El Reg Peeps, Paper Author Here

Hey hammarbtyp,

I'm looking into that now. The main site with my blog (https://jvroig.com) doesn't have a problem, so it looks like only the subdomains are borked. They're all supposed to have Let's Encrypt certificates.

I'll check into my cpanel to see what's wrong. This is a very low traffic site, so it's only the entry-level Siteground plan. They're supposed to have this done automatically (which is why the main site is ok), but perhaps there's more involved configuration needed for subdomains.

UPDATE: Before actually posting this message, I looked into the panel and it was totally my fault - I forgot to add the "research" subdomain to the Let's Encrypt interface. It's added now.

jvroig

Re: Just tested it

Yeah, it comes with the caveat of "don't compile with optimization".

The whole point of the "algorithm" is just to measure the time the CPU does work; if you let the compiler remove all the work, then of course there's nothing for us to measure.

If you chose the clock() timer (this is usually the lowest-resolution timer in most modern systems), then that's your worst-case results (again, not counting the optimized ones). Using the nanosecond-level timers will improve your score. If you're in Windows, the jump will be extreme, because for some reason Window's default timer is super low res.

But even with just 75% MFV (most frequent value), you're already golden. Collect 1,000 samples and you've got 400 bits of entropy, more than enough for seeding. The versions of the POC after the cited code here switched around the SCALE and SAMPLES settings - I found it was more efficient to lower the scale (how many times to loop before measuring) and increase the samples (how many measurements to take).

Even an Arduino Uno (measly 16MHz CPU with a low-res 4-microsecond-precision timer) gets to collect 3,000 bits of entropy per second. That's already the super low-end of results.

Anyway, all these and more are in the updated supplementary site: http://research.jvroig.com/siderand/

jvroig

Re: Just tested it

(Hey, JV Roig here, cited paper author)

Or, more simply, just specifically compile without optimization.

Another good alternative (for systems that support it) is use a non-complied language - I tested prototypes on Python3, Ruby and PHP as well, and they run as-is with no need to worry about any optimizing compilers.

jvroig

Yeah, that's exactly a limitation for the C prototype, and any C implementation of it for production.

(JV Roig here, cited paper author).

This isn't a limitation of my design. It's a C thing, and even Stephan Mueller's Jitter Entropy has the same caveat to never compile with optimizations.

However, I do have prototypes in other languages (Python3, Ruby, PHP), and those need no such hand-holding. They just run as is. (The siderand webpage that Tom linked contains all the prototypes and the measurement tools)

In fact, as of today, if you were to ask me what the ideal implementation would be in systems that support it, I'd choose Python. It's not significantly slower (we only need to seed rarely), and it makes the code directly and easily inspectable and auditable even in live environments.

Of course, embedded devices are limited to whatever their dev environment is (so, embedded C). In such cases, they just have to be careful to not compile the code for the seeder. I wish I could remove that small caveat completely, to avoid "oops!" moments, but so far I don't have a good alternative.

jvroig

Re: Very platform dependant

Hey Red Ted,

JV Roig here, the cited paper author.

Testing on embedded devices is indeed a problem. However, I have tested on a lot of platforms that are a good stand-in to the embedded device market:

-Raspberry Pi 3 (quad core, ARM Cortex A53, 1.2GHz, in-order-execution)

-Raspberry Pi 1 (single core, ARM11, 700 MHz, in-order-execution)

-Arduino Uno (ATmega328p, 16MHz, in-order-execution, low-res timer only [4-microsecond precision])

The worst case for my micro-benchmark timing aggregation technique (or "SideRand" as still cited in this article and paper, but that's the old name) is the Arduino Uno. Yet, even there, gathering 3,000 bits of entropy per second was achieved. So for now, I'm pretty confident that micro-benchmark timing aggregation will work on all sorts of devices, embedded to giant servers.

HWRNG randomness "audits" are unfortunately easily spoofed as these can only measure the output, and not really infer anything about the source. Imagine a malicious HWRNG that merely iterates through the digits of pi, say, 10-20 digits at a time, and hashes that using SHA256 to give you 256-bits of randomness. That'll look super random and pass every test. But, seeing just one output, the adversary who backdoored that to use pi digits can easily predict what the next one is, and what numbers you have gotten previously.

Or imagine Intel's Secure Key HWRNG. The final step it does, if I remember correctly, is AES output. If it's basically AES-CTR DRBG, then it could just be incrementing a hard-coded key (from whatever favorite boogeyman you have, like the NSA or Cobra). It will pass every single test you throw at it, and the NSA can still perfectly predict the output.

In a nutshell, all that a statistical test can tell you is that there aren't any obvious patterns, and that your HWRNG isn't obviously broken. Whether it's actually unpredictable is an entirely different issue.

jvroig

Re: For best performance...

The whole "printf" thing isn't really production code. It's just there to enable the research to gather the entropy and analyze the results using frequency distribution.

I know that part wasn't clear in the article, so just explicitly saying it here now.

The prototypes store and output the way they do because they're only meant to collect and output entropy for experimental verification (not just in C, but also the Python3, Ruby and PHP prototype codes; see http://research.jvroig.com/siderand/ for all prototypes and measurement tools as well). They're all designed just to see how much entropy we can gather across all these types of machines - from Arduino Uno, RPi, small-core x86, to big-core x86 machines.

jvroig

Hey El Reg Peeps, Paper Author Here

Hi El Reg!

This is JV Roig, the cited paper author. Glad to be back! (Some of you may remember me from the Password Research article of Tom a few months ago: https://www.theregister.co.uk/2018/05/10/smart_people_passwords/)

I was going to join the comments section earlier (Tom FYI'd me about this article a day ago), but I was busy updating the supplementary site. It's done. If you visit https://research.jvroig.com/siderand now, it contains a lot of new information that deals with the problem better.

I've separated the concepts into two more palatable sections - first is the paradigm ("Virtual Coin Flip"), and the specific implementation ("Micro-Benchmark Timing Aggregation", which replaces the SideRand name). Please give that page a visit. Not only does it have a good discussion of those two concepts, it also contains all experimental data - for those of you interested in checking it out.

It also deals more with previous work such as HAVEGE/haveged, Jitter Entropy, and Maxwell, particularly in how they adhere to my trust criteria for sources of randomness, and key differences with my MBTA.

A note on reproducibility. The C code, by nature, must not be optimized. Remember, we are trying to make the CPU do work. The optimization removes this work, so there's nothing for us to measure. This is a C limitation, and you'll find that this is exactly also necessary for Stephan Mueller's Jitter Entropy.

However, you'll find that the PHP, Ruby and Python prototypes don't need such handholding. Download those prototypes from the webpage, and you can also download the tools I used to gather and profile the resulting entropy. That's all you need to see how much entropy it can gather in your system. And of course, none of these are production codes - I imagine these will mostly be intact and similar to actual production-grade code, but primarily these are prototypes for gathering data to measure how much entropy is there.

A final note on embedded devices: How confident am I that this works even on embedded? 100% confident. It's not included in the pre-print you've read as it still needs to be updated with newer results, but I've tested using a bunch of the original RPi 1 (700MHz ARM11, very old in-order-processing CPU), and it still works.

In fact, I've also tested on an Arduino Uno - that's a microcontroller with a very slow, simple 16MHz processor, and a low res timer (4-microseconds). The optimized MBTA code there was able to extract 3,000 bits of entropy per second. That's overkill for seeding, even in such a worse environment (combo of simple CPU + low res timer).

Bombshell discovery: When it comes to passwords, the smarter students have it figured

jvroig

Hi Non-SSL Login,

Yeah, that's exactly right. There should be ZERO difference, from my point of view (see my last comment posted before this one, where I explain what my original assumption was when the experiment started).

As I wrote in my last comment, I even pre-wrote the abstract to highlight how the results show that weak passwords stem from issues of disinterest, not intelligence (as expected). It's just that the data (limited by the constraints as noted in the paper) showed otherwise when it finally came in, so I have to write the paper as the data says, instead of what I think it should say. All I could do, though, was note all the constraints at the conclusion, as well as emphasizing that with all the limitations, this is just a curiosity and should only be regarded as a first step in a series of more-refined experiments.

In the end, I'm sure improvements in data size (more people), different environments (different school, different company), and improvements in the measurement tool itself (updated breach corpus, adding localization) will end up showing that intelligence is not really a big factor when it comes to weak passwords or sub-optimal password habits. I'm squarely in the camp that human/psychological issues are the main determinant, hence the updated NIST / Microsoft Research guidelines that are less technical and more user-friendly.

jvroig

Just sharing some details from the paper

Hello, Register community.

I'm JV Roig, the paper author. I just spent the past two days answering some of the more interesting comments, and I haven't really had the time (until now) to actually make a post that I wanted - a post that shares some interesting things about the paper itself and how it came about.

First, let me go ahead and share what I believe is the actual interesting findings (which the article does not share). This is the table of results when users with breached passwords are categorized per GPA tier:

[Columns key:

GPA tier |

total students here |

students using breached passwords |

% of students here with breached passwords]

3.5 - 4.00 | 39 | 5 | 12.82%

3.0 – 3.49 | 203 | 32 | 15.76%

2.5 – 2.99 | 446 | 66 | 14.80%

2.0 – 2.49 | 464 | 92 | 19.83%

1.5 – 1.99 | 99 | 19 | 19.19%

1.0 – 1.49 | 1 | 1 | 100.00%*

*The single student who had a lower than 1.5 GPA also happened to use an unsafe password. At least he/she is consistent, eh?

This table shows students sorted according to tiers of GPA. The top GPA tier is pretty low at 12.82%, and gets up to 19% near the bottom. The rise isn't perfectly linear, but you can clearly see that the trend goes up.

I wish the article itself showed this (instead of the data snippet that ended up being featured instead), but Tom Claburn (due to timezone difference with me) did not have enough time to reach me before needing to publish. I woke up with this article already done and published, so all I could do was join the forum for some interesting discussions.

****** ABOUT THE EXPERIMENT ***********

This experiment took us over 3 months to finish data collection, since we basically had to let the midterms and finals periods conclude. Those are the peak seasons of system activity (professors submitting grades, students checking grades online, etc), and so most users are captured by our login hooks during those periods.

***** WHAT I ORIGINALLY THOUGHT THE DATA WOULD SAY **********

Another thing I'd like to share is that well before I got data to crunch, I already thought I knew how this would turn out - with absolutely no difference between the highest-GPA group and the lowest-GPA group. In fact, my abstract then (I prepared in advance because I thought I knew what the data would say) concluded with:

"Correlating these with academic performance data from each student’s grade history, the researchers found no relationship between a student’s academic performance and the likelihood of using a weak, compromised password. Our results suggest that weak passwords aren’t because of any level of intelligence, but simple disinterest or lack of awareness. Relevant password policies should be formulated taking this into account."

Once the data was crunched, I had to revise the abstract as necessary.

******** GOING FORWARD *************

I still believe, however, that this result is a fluke due to how much fewer students are in the top GPA tier, and the overall low population of students sampled. There's also concern that the metric used here is too blunt (all these concerns are noted in the paper's conclusion).

A far bigger sample of students (from our sister school) will shed more light to it, which is already a planned follow-up study. In the end, repeated experiments and studies (across more institutions) would likely converge on my original (planned) conclusion - the reasons for weak passwords are more psychological than intellectual.

But, until those studies come in, I really had no choice but to describe the data as it arrived. There's a small difference found here, but this really should be taken as more of a curiosity and a first step in what should be a series of more experiments.

Regards,

-JV

jvroig

Re: minimum length but what maximum length

Hello, eionmac, that's a very interesting question.

Your question in essence is "how will the user know what the MAX LENGTH of an acceptable password is?". Let's call that "Question 1". The way you specifically phrased it, though, leads to another very interesting nuance - the length of the password field itself (within the database), as it relates to the maximum acceptable length of any password. Let's call this "Question 2".

For Question 1, that's something that authenticators / sites / apps / systems should clearly tell the user. As far as NIST is concerned, though, the maximum should be long enough so as to be practically unlimited as far as the vast majority of users are concerned (say, 100-200 characters). The fact that we need to limit them at all is simply to avoid potential DOS (denial-of-service) attacks. This is something the security community in general learned the hard way when the Django framework was found to be DOS-able just by copy-pasting several MBs of text into the password field and submitting. The back-end then has to hash that giant piece of text, possibly creating a huge bottleneck in the server.

For Question 2, it's important to understand that whatever the policy is for the maximum password length a user can have, it has no relation to the length of the password field in the database. This is because you should only be storing the *HASH* of the password - therefore, whether your password was only 5 characters long, or 500 characters long, and whether the salt is 4 characters or 40 characters, in the end the final value to be stored is always the length of the resulting hash.

For sites that don't specify a maximum length, yes, that is a scenario that should certainly be improved. The maximum can easily be set to something like 1000 characters without much threat of an easy DOS (effectively that means we'll just be hashing 1KB of data, which isn't really a problem, but that could depend on your specific algorithms and implementation, as well as the capabilities of your back-end server/s and expected volume of users). Providers/authenticators/services/sites should then give that sort of information clearly. The only reason most sites don't do that right now is, in my view, they have too many complex rules. By the time we move on, and the rules that users need to know are just 2 or 3 things, this situation will probably solve itself.

jvroig

Hello NonSSL-Login,

(JV Roig here, the author of the cited paper)

re: "The article says that 215 students hashes were in Troys database and states this was down to bad/unsafe passwords. Wrong. They are in Hunts databases because they happened to be signed up to websites that got hacked. There is no relation to IQ at all."

Yes, that's actually a pretty good point, although I would like to say it's a little bit more nuanced than that. Not everyone on the list would have been personally pwned - some have just settled on really bad passwords (especially passwords or patterns that look strong, but really aren't due to predictability).

For example, if their password is outwardly (seemingly) "strong" / "secure" (say, it's long and has some substitutions), the chances of another person thinking of the same password should be statistically nil (infinite combinations vs very finite human users) - unless of course, the chosen "secure" password is actually predictable due to human nature. Microsoft itself actually has some interesting research findings about this, particularly about how users will predictably choose passwords given very strict, arcane password requirements.

If they are in the habit of re-using passwords, then that password (and habit itself) really should still be considered unsafe. A way to think about that would be: a password can be unsafe because of "physical" characteristics (e.g., too short), "psychological" characteristics (e.g., too predictable based on human nature, such as adding "1" at the end of a common password when asked to use at least one number), or "environmental" characteristics (e.g., an otherwise strong password that is now much more vulnerable [weak] due to being used widely in different sites).

So either way, unsafe.

But as mentioned in my opening, yours is a very good insight, thanks. That's exactly why any interesting findings, be they just curiosities at the moment, are shared - so that others can see them and then potentially draw from them interesting insight, such as yours. Further studies, for example, can now explicitly look for, attempt to mitigate, or control for, or simply measure the impact of, that factor. Science at work!

Regards,

-JV

jvroig

Hello Jtom!

Thanks for sharing that very nice insight, which I'll quote here for user-friendliness to those who would end up reading this comment.

Jtom said: "Naw. I know an idiot who has the best password protection. He creates an eighteen-character (or the longest permitted length) password of random alpha-numeric, upper/lower case, and special characters, does not maintain a copy of it, and doesn't try to memorize it. Then he resets the password everytime he wants to log in."

That is exactly a common way users deal with complex password rules. This is why the NIST's latest guidelines (as well as Microsoft's latest research publication on the matter) both point toward a saner set of rules for the future, such as:

1.) No more arcane requirements.

2.) Let them use ANY character they want, even emojis. No character should be off limits. (Looking at you, old bank systems)

3.) Just demand a minimum length

4.) Check the password against a (regularly-updated) list of the most common bad / breached passwords, and notify users.

Basically, these new rules DECREASE the technical requirements, so that we end up decreasing the mental load on (very uninterested) users. In other words, we're decreasing the technical features in an attempt to increase the human/psychological features. And authentication (passwords, remembering them, typing them...) being a very human-centric activity, users can use all the psychological-friendliness they can get. The end goal? We hope, at least, that users would then have a far easier time actually thinking of good passwords, not write them down on a post-it, and IT's time will be wasted less on recurring "please reset my password" support calls.

We're betting, basically, that accounting for human factors, instead of just solely technical factors like most of the traditional password policies, will make the whole thing much better.

Thanks for the interesting comment, Jtom!

Regards,

-JV

jvroig

Re: Pass the salt...

Hello, anothercynic.

No, we haven't really been using that. The entire research activity that produced this paper (and the more expansive paper currently under conference consideration and peer review) has mostly been centered around matching with Troy Hunt's breach corpus and the NIST "match-against-bad-passwords" guideline.

We'll definitely consider looking into having that included in the design of follow-up experiments.

I'm very glad this Register article attracted a fellow security researcher. I hope reading the paper itself gave you some new insights into your own job, as your comment here has given us an interesting thing to consider for our future ones! [science++]

Regards,

-JV

jvroig

Re: Pass the salt...

Hello d3rrail,

This is JV, the paper author.

Yes, indeed SHA-1 is broken. We don't use SHA-1 for storing the ACTUAL passwords. In fact, Asia Pacific College **DOES NOT STORE PASSWORDS AT ALL** (caps only to emphasize, not shout).

We manage to function despite not storing any password by simply off-loading that whole problem to Microsoft - we're a Microsoft Showcase School, and our single-sign-on back-end rely on Microsoft's Active Directory hosted on Office 365 (cloud). In effect, Microsoft's own infra, and none of ours, ends up storing actual passwords, in whatever form.

So what was the role of SHA-1 in the experiment? It was merely to be able to match Troy Hunt's list. He gave away a list of 320M (now 500M+) breached passwords. But to make sure that list is next-to-impossible to weaponize, he didn't release them in plaintext: instead, every password in that list has been transformed to its SHA-1 hash. So his list, ultimately, is a list of SHA-1 hashes.

On our end, then, when a user logs in, we just pass the password through Microsoft's single-sign-on API. For this research, we added an extra thing - after passing to Microsoft and getting a confirmation of user validity, that same password we just passed is hashed using SHA-1, then sent to our research infrastructure that hosts Troy Hunt's list of SHA-1 "passwords". If there's a match, we know the password is compromised, even though we never end up knowing what that actual password is.

I hope this clears it up. The confusion is understandable. The paper itself has this detail, but of course it's probably too technical to be included in the media article above.

Thanks.

-JV

jvroig

Re: Sample size 1252

Hi Pat,

This is JV, the author of the paper.

Those things you are asking for is actually in the original paper, linked to in this Register article by Tom.

But I don't want to hash them out here (you can easily look at the paper), because the more central point is this: As this article's parting words go (which literally are just a quote from how the paper itself also ends): these findings are a curiosity, and not the end-all-or-be-all. It's nothing more than an interesting data point for potential series of future experiments.

You shouldn't treat the results here as anything more than "hmmm... interesting, we should do more and possibly bigger experiments like this...", because even the author (that's me) thinks exactly just that. No more, no less.

Thanks.

-JV

jvroig

Re: So

Well, it doesn't really work that way (author of the cited paper here)

This paper - which I just uploaded to arXiv instead of any peer-reviewed conference or journal - is just a side effect of a larger-scale effort that centered mostly around compliance to NIST's latest (June 2017) guidelines regarding password handling. A more expansive paper with more relevant statistics (% of compromised password use, length, gender, etc) is currently under consideration and review for an international conference presentation (so I'm not sure this comment section is the best area to expound on it.) The cited paper in this article, compared to that paper current under review, is really more of a curiosity. In fact, the paper itself mentions that - as this article itself quotes at the end.

Aside from that, we (my team) do a lot of other things in cryptography, security, disaster recovery and databases. The cited paper here is pretty much one of the blander results/output we have. That it got covered here is a surprise to me. I had no idea there are journalists who scour arXiv (a pre-print sharing site), and certainly no idea Tom would find it worth writing up.

What this article doesn't share, though, is that the methodology section in the paper also shares a useful thing: a simple way to implement the NIST guidelines of checking against compromised passwords. Not only does it point potential readers to Troy Hunt's password trove, it also lays out the process and potential implementation that can be used to adhere to the new NIST password guidelines.

jvroig

Re: Pass the salt...

Hi, I'm JV, the paper author.

The methodology is described in detail in the paper, but the cliffs are:

1.) The process is no different to how a normal login works: only hashes are stored in a database, so whatever you typed is just hashed first before being compared in the authentication database. This is how any modern login system never really needs to process a plaintext password, other than to hash it to start the actual comparison.

2.) In our case specifically, since we control internal systems, whenever a user submits a password to login, we basically just hash it according to Troy Hunt's list standard (SHA-1). That SHA-1 of the password is then compared to Troy Hunt's list. If we get a match, then we know it's a breached/compromised password. We don't know what password that was, though. To make sure we handle all data ethically, we never store plaintext passwords within the research infrastructure. We also don't even store usernames - we also did some sort of hashing of usernames themselves, so that even the user identities are anonymized within the research databases (yes, despite not even storing passwords in the first place)

In general, what we did was really no different than what the NIST June 2017 guidelines mandate. We just figured that process would contain interesting information, so we just tried to see what interesting data can be uncovered.

Thanks,

-JV

Biting the hand that feeds IT © 1998–2019