back to article 'First ever' SHA-1 hash collision calculated. All it took were five clever brains... and 6,610 years of processor time

Google researchers and academics have today demonstrated it is possible – following years of number crunching – to produce two different documents that have the same SHA-1 hash signature. This proves what we've long suspected: that SHA-1 is weak and can't be trusted. This is bad news because the SHA-1 hashing algorithm is used …

    1. Kiwi
      Boffin

      What would have been truly clever if the documents had differed by more than 0.0147%, all that work and to make it work they change next to nothing.

      What if it's an executable or source code, and that 0.0147% change defeats some sort of password checking or other security routine? Maybe it changes a variable (if password=correct set correctpass=1 else correctpass=0) to a constant (set correctpass=1)1? How many bytes would that need?

      1 Ok, probably not even close to any RW code but I hope you get the idea. If not, back to remedial English comprehension for you..

  1. Anonymous Coward
    Anonymous Coward

    Nice work!

    Everyone by now should realize computer/data security is a moving target...

    Out with the old in with the new.

    Long live SHA-256!

    Well...for at least...hopefully...to the end of the decade.

  2. Ken Hagan Gold badge

    Do we need to do anything about old content?

    If we have, say, a legal contract that was timestamped and signed with SHA-1, will it be possible in the future to produce a different contract with the same (past) timestamp? If so, is it possible to defend against these attacks now by counter-signing them today with SHA-256? (I'm thinking that the counter-signature would prove that someone in 2017, whilst the SHA-1 signature was still worth something, vouched for the original contents and *that* counter-signature won't be similarly vulnerable for ... well, a few more years yet?)

    1. Anonymous Coward
      Anonymous Coward

      Re: Do we need to do anything about old content?

      That is a different class of attack: A second pre-image or "Find a Y with the same hash as X".

      A collision is "Find any X and Y with the same hash".

      1. Charles 9

        Re: Do we need to do anything about old content?

        But isn't that what was demonstrated here just now? They copied the first document and then altered it to produce the same hash as the first one?

        1. MJB7

          Re: Do we need to do anything about old content?

          No. They manipulated both first and second document until they got an identical hash.

          1. Kiwi
            FAIL

            Re: Do we need to do anything about old content?

            No. They manipulated both first and second document until they got an identical hash.

            Back when SHA-1 was new and "not in your lifetime" crackable, you signed a contract saying that "in the case of early termination of your contract we will pay you $10,000".

            With a slight bit of re-working, it now says "in the case of early termination of your contract you will pay us $10,000"

            Yes, there could be reason to have concern about old documents being tampered with.

            1. Ken Moorhouse Silver badge

              Re: Do we need to do anything about old content?

              Too late.

              If old content has been stored in the cloud, or on any other kind of backup medium, then it can be recalled for further analysis when the time comes.

    2. Anonymous Coward
      Anonymous Coward

      Re: Do we need to do anything about old content?

      Good questions! I'm not a security professional, merely a senior linux admin, but I can imagine this scenario if you don't mind.

      "will it be possible in the future to produce a different contract with the same (past) timestamp?"

      Yes, unless you wrap the storage of these items in a tight atomic transaction and perhaps have a live auditing task that can see/log a "new" document being added back into the repo using a past creation date, or some other similar flagging of suspicious activity, or you lock down the access using FACLs and audit similarly "bad things" could happen to those documents at rest on disk. And you should also be leveraging any access control abilities from the storage subsystem. With physical access a miscreant could still gain access and enough knowledge of the system to hide this kind of activity, so it's not completely bullet proof, but you see where I'm going with this. In the Linux/Unix world there are commands that can easily set creation/access/modification times to anything within range, most likely throughout the "Unix Epoch," as long as you gain write access to it. FACLs can further reduce this exposure, but at most sites the traditional file permissions are still used. Hint; if you have super important docs like the ones you described, use all the methods above and available, and reduce the number of humans able to reach them, and confine the humans to role based access lists; document creators can only create, readers can only read, modifiers can read/modify but not create, etc. Or archive to an offsite, offline repo if you don't want to rework the document encryption and don't need near-line access. There's a lot you can do to protect data.

      "is it possible to defend against these attacks now by counter-signing them today with SHA-256?"

      Yes, re-encrypting the SHA-1 docs into SHA-256 should be trivial, and you would not need to decrypt the old ones, unless you think that would confuse future caretakers of the document storage systems. In other words, it does not matter if you remove it or not, the outer most layer of encryption protects it, and extra encryptions underneath that are just icing on your security cake. Even if it's rot13, it's another layer to wade through to get the goods.

      1. Red Bren
        Joke

        Re: Do we need to do anything about old content?

        "Even if it's rot13, it's another layer to wade through to get the goods."

        How many layers of ROT13 should I use? At least two...

  3. Tom 7

    Can anyone prove SHA256 is any better?

    You'd think it might be but is it mathematically proven that it wont have collisions within the same range?

    As an aside just proving a 1/2 million character PDF document has not been changed is irrelevant as I doubt its ever been read in the first place - that's potentially >10,000 pages.

    1. streaky

      Re: Can anyone prove SHA256 is any better?

      It's more computationally expensive and the key space is larger so.. yes? Prove is like trying to prove god doesn't exist though..

      1. Elf
        Holmes

        Re: Can anyone prove SHA256 is any better?

        >> Prove is like trying to prove god doesn't exist though.

        ... 'But, says Man, the Babel fish is a dead giveaway, isn't it? It could not have evolved by chance. It proves you exist, and, by your own arguments, you don't. QED.'

      2. Naselus

        Re: Can anyone prove SHA256 is any better?

        "Prove is like trying to prove god doesn't exist though.."

        No, it isn't. Math is math and you can create a proof that something can't happen.

        1. streaky

          Re: Can anyone prove SHA256 is any better?

          You can already mathematically prove it's more expensive and time consuming - and to what degree. Once you physically prove the state of it you've by definition already found a collision and thereby proven that god doesn't exist.

  4. Will Godfrey Silver badge
    Angel

    Phew!

    Only a month ago I went through the pain of changing to SHA 256. When your system is so old even the spiders won't grace it with their webs, it's difficult to remember how you set it up, and what you need to do to update it.

  5. Ken Moorhouse Silver badge

    Any such proof of authenticity...

    ...effectively has a "time-limit" on it.

    This means that if you want a document to be of indisputable provenance, it still needs to be stored traditionally e.g., in a solicitor's safe.

    1. Charles 9

      Re: Any such proof of authenticity...

      Unless, of course, someone cracks the safe and then surreptitiously alters the contents...

      1. Ken Moorhouse Silver badge

        Re: Surreptitiously

        "Unless, of course, someone cracks the safe and then surreptitiously alters the contents..."

        Hmm, having surreptitiously broken into the building, the office, and the safe?

        As the contents of solicitor's offices are not of particular interest to the average burglar, it would arguably be quite easy to guess motives in such an eventuality.

        1. Charles 9

          Re: Surreptitiously

          But if done right, hard to prove. You can create a "he said, she said" situation. And this can be of significance if the document involved is, say, a last will.

    2. danbi

      Re: Any such proof of authenticity...

      The irony being that the digital realm of documents doesn't add much credibility and authentic proof over what was available with stone age documents. Both can be manipulated, after a while -- and we enjoy the wonders of "history", full of fairy tales that rival contemporary fantasy genre...

      No wonder, some older cultures didn't trust written "knowledge".

      1. Charles 9

        Re: Any such proof of authenticity...

        "No wonder, some older cultures didn't trust written "knowledge"."

        But at the same time, it's hard for people like us to believe people once relied on other people's memory, which we now know has plenty of potential to get muddled and messed up, especially with age. And I haven't even touched on deliberate fabrications (eg. one lies, the other swears by it).

        Seems you can't win either way.

  6. This post has been deleted by its author

  7. Andy 73 Silver badge

    Court of law

    If you were relying on a document in a court of law and it had been signed with SHA1, then my assumption is that if two parties produced two documents with the same hash, there would be some examination of the documents. The guess is that the original would be 'clean' and the doctored one would have the doctored text (the subtle addition of the word 'not' where appropriate) PLUS some additional garbage to bring the signatures back into line. As it's astronomically unlikely to produce the same signature when only making a meaningful semantic change to the text, it should be possible for a forensic examination of the documents to identify which is 'clean'.

    However, if SHA1 is only being used in an automated system (Git etc.), then the issue is much more pressing - the assumption being that the system will not be coded to identify a collision, which would be processed and actioned unnoticed.

    1. Ken Hagan Gold badge

      Re: Court of law

      "... PLUS some additional garbage ..."

      The problem with that is that some popular document formats, such as DOC, have such garbage in the original as well.

      (For those too young to remember DOC rather than DOCX, or who have never looked into the actual format, all the original Microsoft Office file formats are layered on top of something called Structured Storage, which is like a miniature file system within a single file. Like any other file system, this divides the container into fixed sized blocks and if an individual stream is not an exact multiple of the block size then there is wasted space within the container. It costs a little time to zero out such blocks, so it isn't always done. In particular, I think WORD had a "fast saves" option that was on by default which would have this effect. Consequently, I rather suspect that the majority of older DOC files have plenty of legitimate garbage that you could use to compensate for your malicious modification.)

      (I believe that DOCX and ODT are also structured formats but the container in this case is a ZIP archive. I don't know if that is similarly blighted with wasted space. Perhaps someone will jump in and tell us. If so, then the vast majority of documents that one might want to sign have plenty of wiggle room for this sort of attack.)

      1. allthecoolshortnamesweretaken

        Re: Court of law

        Ahh, that explains a lot... I always knew that disabling the "fast save" option in, say WinWord 2.0, was a good idea as "fast save" had a tendency to fuck up your files (not something you'd want, especially while working on your thesis), but now I know why.

        As to the the article - "Today, 10 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision," the research team said today. - I can't shake the feeling that maybe another team, possibly based in Anne Arundel County (MD), got there already a couple of years ago, but didn't bother with announcing it.

  8. Crazy Operations Guy

    This is why I use multiple hashes

    Hashes are quick and easy to compute, so there is no reason to not calculate multiple hashes for the same data.

    How much compute power would be required to create a counterfeit document that matches both the original's SHA-1 *and* MD5 hashes.

    Since its so quick and easy to do, I've calculated the MD5, SHA-1, and SHA-256 hashes for everything important on my systems / network.

    1. -tim
      Coat

      Re: This is why I use multiple hashes

      For things that I need to make sure haven't been changed, I tend to use a modified version of an existing hash as well. All modern hashes are seeded using a table. MD5 uses a table of sines, SHA1 uses a values that start out as 0x67 45 23 01 and SHA-256 uses fractional parts of cube roots of primes. Simply swapping the 6 and 7 in SHA-1 and rebuilding the source provides a different hash that should be as cryptographically as secure as SHA-1 yet an attacker would need to know how you modified the seeds as well to compute a colliding document. Swapping the 6 and 7 does produce different hashes for the example files. If you treat your modified table as a secret key, you have that strength but even if you publish your table, that forces an opponent away from precomputed rainbow tables. This is much like seeding in password hashes and since the tables seem to be arbitrary chosen numbers, it shouldn't have the same problems that picking the wrong s-box values does in code like DES.

    2. Ken Hagan Gold badge

      Re: This is why I use multiple hashes

      Matching SHA-1 and MD5 simultaneously might be easier than you (or I) think, else the security pros would have recommended exactly that technique instead of inventing SHA-256.

    3. Tomato42
      Boffin

      Re: This is why I use multiple hashes

      Using two hashes doesn't work to increase security above the security of the stronger one: https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/1472.pdf

      Just use SHA-256, it has been in all cryptographic libraries for over a decade already!

      1. Anonymous Coward
        Anonymous Coward

        Re: This is why I use multiple hashes

        Tomato42@,

        Of course, you are only as secure as the 'Strongest Hash' but doesn't the recording of Multiple Hashes make the calculation of a Hash Collision harder (impossible ???) as you would have to calculate a hash collision for all the Hashes, with the same document.

        (Any mis-match of any of the hashes would flag up a problem, so even if SHA-256 was broken you would still have to match the other hashes.)

        I don't have the math skills to prove whether this is feasible or even possible. !!!

        1. Charles 9

          Re: This is why I use multiple hashes

          The problem is that all hashing functions work on the same fundamental principles, plus there's the Pigeonhole Principle to consider (due to hashes being smaller than their documents, collisions MUST occur). The paper above demonstrates you can correlate multiple hashing functions so that finding a collision for all is as easy as finding a collision for one.

          Now, an alternative proposal may be to chain hashes by hashing the whole document as well as particular segments of the document, producing multiple overlapping checks. The Merkle Tree is an example of this technique, though in this case a fixed-structure hash chain would probably be better-suited and more robust against preimage attacks. The technique is also algorithm-agnostic so can be moved up from SHA-1 to SHA-3 or whatever.

  9. Ken Moorhouse Silver badge

    Concoct forged document

    The point about forging a document and the difficulty of making that forgery convincing is compelling.

    However, with the passage of time and the likelihood of people cracking these algorithms, the likelihood of being able to produce a convincing forgery is increased to the extent that people should never rely on electronic signatures if the life-span of the document's importance exceeds say one or two years.

  10. Anonymous Coward
    Anonymous Coward

    Its interesting to note`

    That some of the earlier versions of Cryptowall used MD5 as well as weaker versions of SHA in their functions early on, before generating the 2048 bit AES key from a hash of various unique identifiers.

    Using this method, it might be possible to reverse engineer these versions to fool the software into undoing the damage by faking a paid ransom key.

    This assumes the following.

    1) you have an identical copy of the machine, byte for byte,

    2) the date, time etc of the machine is also identical

    3) the hardware is also identical eg RAM SPD data, CPUID, LAN address etc.

    This would be the computational equivalent of running time backwards, then copying the key as it is being generated and using this to extract the original files.

    Any takers?

  11. This post has been deleted by its author

  12. Oh Homer
    Headmaster

    Sound and fury, signifying nothing

    Given enough time and resources, all security will be breached. The point of security is not to stop the inevitable, but merely to slow it down long enough for the breach to no longer matter.

    Calculating just a single collision in 6,500 years of CPU time is not exactly what I'd call the death blow for SHA1. That's still significant enough that manufacturing collisions for low-grade material just won't be worth the effort, and even military stuff will be old news by the time you've compromised it.

    Maybe if you could spoof target coordinates in under 60 seconds using a smartphone you might have a problem, or if you'd be inclined to waste 100k+ on the equipment necessary to fiddle with the terms of somebody's 50k contract, but as it stands none of that is likely any time soon, which means the security is doing its job as designed.

    That will not always be the case for SHA1, but it's far too soon to hit the panic button, unless your primary interest is hype not security.

    1. Anonymous Coward
      Anonymous Coward

      Re: Sound and fury, signifying nothing

      "Maybe if you could spoof target coordinates in under 60 seconds using a smartphone you might have a problem, or if you'd be inclined to waste 100k+ on the equipment necessary to fiddle with the terms of somebody's 50k contract, but as it stands none of that is likely any time soon, which means the security is doing its job as designed."

      Unless, of course, that 50k contract has knock-on effects that can propel the total into the millions, in which case you now have good RoI.

    2. tom dial Silver badge

      Re: Sound and fury, signifying nothing

      6,500 CPU-Years is a few weeks of work on a 150,000 unit botnet. It is well not to make too much of numbers that criminals can be downscale easily. For that matter, the US DoD has nearly three quarters of a million civilian employees, nearly all of whom are provided workstations that could, in principle, be recruited for about 100 hours a week, producing 6,500 CPU-Years in a week or so.

      Producing a document tailored to specific semantic change might (or might not) be many orders of magnitude more expensive, but sticking with SHA-1 based on some big numbers seems rather foolish.

  13. WatAWorld

    Is this the same Google that has been unable to implement an automatic update for Android?

    Dumb question, I know, but is the Google that sponsored this the same Google that has been unable to implement an automatic update for Android?

    Maybe instead of publishing "how to hacks" for other people's products and systems they should put some more time and effort into making their own products secure:

    1. Figuring out how to make automatic updates work (like Apple did decades ago) and,

    2. Getting vendors and re-sellers to agree to let that automatic update process work (like Microsoft did a decade ago).

    Or is this some different Google?

    I know this other stuff, zero days in other people's products and systems, is important.

    But JOB ONE for Google should be taking care of its own products, its own customers and making stuff attached to its name secure -- rather than sitting up their in their giant glass house throwing stones.

    1. Tomato42
      Facepalm

      Re: Is this the same Google that has been unable to implement an automatic update for Android?

      you know that google is a large company, and thus it's impossible for all of their employees to work on the same thing? You know, "too many cooks" and all that...

    2. Oh Homer
      Headmaster

      Re: "unable to implement an automatic update for Android"

      Not defending Google, in fact I agree with the sentiment at least, but the fact is there is no such thing as one "Android OS" that Google can deploy to millions of handsets. Instead there are countless proprietary smartphone operating systems based on Android, deployed by various manufacturers to devices that Google has no access rights to, at least not for the purpose of updating the OS.

      And if Google could somehow update the OS remotely, it simply wouldn't be the same custom OS as provided by the vendor, it'd be vanilla Android, missing all the proprietary bits necessary to keep the vendor's proprietary UI and shovelware running, missing the vendor's proprietary drivers necessary to keep the hardware working, and would thus also invalidate your warranty, which is with the vendor, not Google.

      This is the problem with OEM arrangements in general, not just with Android in particular, and is a problem that Apple avoids by being both the OS developer and the hardware vendor.

      Of course another way to avoid this problem would be to abolish all the proprietary junk in both the hardware and OS, and have an open hardware, open specification, open source solution where everyone could apply daily, incremental updates to every part of the system, including apps and core OS components, but then companies like Samsung would whine about losing their "competitive advantage", as we'd all be able to construct our own smartphones from kits in Maplin.

      Actually an even more fundamental reason for all that proprietary junk in smartphones is that the FCC/CEPT/Ofcom mandate that a lot of it must be a black box, incapable of being fiddled with by consumers, but that's another story.

      1. Charles 9

        Re: "unable to implement an automatic update for Android"

        "Of course another way to avoid this problem would be to abolish all the proprietary junk in both the hardware and OS, and have an open hardware, open specification, open source solution where everyone could apply daily, incremental updates to every part of the system, including apps and core OS components, but then companies like Samsung would whine about losing their "competitive advantage", as we'd all be able to construct our own smartphones from kits in Maplin."

        Due to the competitive nature of the market, particularly in mobile, open hardware is not going to happen, as trade secrets and patents (and we're talking hardware here, so their use here is valid) are in play. This is also one reason one can't just make a completely open mobile OS because a lot of mobile hardware is black-boxed to prevent "Giving Information To The Enemy" and the interfaces only come in binary blobs complete with contracts and so on.

  14. YARR
    Boffin

    In the spirit of unit-y...

    I propose that this house agree on the need to establish a definitively quantifiable unit of elapsed processing that is not time or cost based, and to adopt said unit henceforth.

    1. Vath

      Re: In the spirit of unit-y...

      The need is seconded.

      I'd like to point out that although the need is great so too is the task. It is my opinion that measurements of computational exertion can only be divided into the number of operations required or the time required for such operations to complete on a known set of hardware. With this assumption I'd like to make a few suggestions:

      * The Moon Mission (estimated total processing power used in the Apollo 11 mission)

      * The Colossus (estimated total processing power used by the Colossus machine to break enigma during WWII)

      * The BlockChain (estimated total processing power required to mine all 1 million BitCoin)

      These are just my suggestions.

      1. YARR

        Re: In the spirit of unit-y...

        Another possibility could be the computation required to complete the Human Genome Project.

        However a unit should ideally be a determinate amount of processing as opposed to a expected / estimated amount of processing that usually applies to code-cracking algorithms.

  15. bombastic bob Silver badge
    Pirate

    this is how it started with WPA

    and this is how it first started with WPA, which was later shown to be crackable in an extremely short period of time using more sophisticated techniques.

    I expect that now it's been done, someone will invest the effort to reduce the number of calculations down to something a bit more usable, maybe by exploiting other weaknesses that COULD be derived from this proof of concept...

  16. Prst. V.Jeltz Silver badge

    I suggest a new system involving snowflakes

    1. Charles 9
  17. JimmyPage Silver badge
    Stop

    Stop using PDFs ?

    And any other fancy formats ?

    Could you carry on using SHA-1 with bog-standard ASCII documents, like a .TXT ?

    I'd like to see two of those identical in length, with the same SHA-1, where one could be a acceptable version of another.

    1. sysconfig

      Re: Stop using PDFs ?

      That's a very good point you're making there, JimmyPage.

      Since false certificates were part of this discussion, I'd like to see that too. A cert is nothing but a ASCII text document of a very specific format. That should be a lot harder to pull off than using binary blob formats like PDF, which would allow you to hide a lot of stuff quite easily to tweak the hash to your liking.

      Having said that, I'm not defending SHA-1. It was already known that its days are numbered.

      Also, let's not use the term "calculate" when we refer to this stunt Google pulled off. Anything that uses 6500 years of compute time sounds a lot more like trial & error to me... or trial, verify, dismiss, repeat. Not quite a straight forward calculation. So SHA-1 is not really broken; it's just too weak as compute power becomes cheaper.

      EDIT TO ADD, even if wandering off on a tangent: There are better ways to break SSL encryption, regardless of the hash used. How many of the Certificate Authorities that your OS&browser know, do YOU know? How many of them do you personally TRUST? SSL is fundamentally broken by design; unfortunately with no feasible alternative as yet.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like