back to article 'First ever' SHA-1 hash collision calculated. All it took were five clever brains... and 6,610 years of processor time

Google researchers and academics have today demonstrated it is possible – following years of number crunching – to produce two different documents that have the same SHA-1 hash signature. This proves what we've long suspected: that SHA-1 is weak and can't be trusted. This is bad news because the SHA-1 hashing algorithm is used …

  1. Tom 7

    9,223,372,036,854,775,808 sha1 calculations

    to get a collision. Its not ideal but I'd hardly call it a security hole. If they could publish the two documents I'll try not to use them as passwords.

    1. Bruce Hoult

      Re: 9,223,372,036,854,775,808 sha1 calculations

      They have published the two documents, I've downloaded them and confirmed their claims.

      They are 422435 byte PDFs, differ in 62 of those bytes, have the same sha1 hash, and show different contents (the background colour). I haven't checked if there are any other differences in the display.

      1. Preston Munchensonton

        Re: 9,223,372,036,854,775,808 sha1 calculations

        They are 422435 byte PDFs, differ in 62 of those bytes, have the same sha1 hash, and show different contents (the background colour).

        Still not much of an attack, IMO. Unless someone can arbitrarily set the difference to something meaningful, it only proves that its possible to overcome SHA-1. It's still really really REALLY improbable to produce a meaningful difference.

        1. Robert Helpmann??
          Childcatcher

          Re: 9,223,372,036,854,775,808 sha1 calculations

          It's still really really REALLY improbable to produce a meaningful difference.

          Remember the bit in the article about this getting easier over time? That means that SHA-1 is broken and will only get more so until it is trivial to roll it like an old carpet. This attack is just the tip... not what you want to hear if your security depends on this algorithm.

        2. Chris Miller

          Re: 9,223,372,036,854,775,808 sha1 calculations

          You need to read up some of the discussions that took place when similar attacks were first identified for MD5 hashes. This is a serious problem - I could get you to digitally sign a contract; and if I can produce a similar document with the word 'not' inserted (say) and the same SHA-1 hash, you're in trouble. Similarly using SHA-1 to authenticate a valid web server and then make a slight change to the URL, but keep the hash the same ...

          Right now it needs a massive amount of computing power to defeat SHA-1, but the NSA can probably do it already. And these attacks only ever get quicker - not just by Moore's Law, but because someone clever will read the paper and think of a way of doing it 10x faster.

          1. Christoph

            Re: 9,223,372,036,854,775,808 sha1 calculations

            It depends whether you can make a particular required change and still produce a hash collision.

            Yes, it's a problem if you can insert 'not'. It's less of a problem if you can only produce the collision if you insert 'KJ"BIUE_D H£(*ERNY£'

            1. Charles 9

              Re: 9,223,372,036,854,775,808 sha1 calculations

              But if you can insert "not" AND just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted.

              What I want to know if this is more than a collision but a preimage attack (or more severely, a SECOND-preimage attack that found a collision with a specific target).

              1. Preston Munchensonton

                Re: 9,223,372,036,854,775,808 sha1 calculations

                But if you can insert "not" AND just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted.

                While true, the likelihood of doing this AND getting a collision is highly improbable. Yes, it's only a matter of time, as many have pointed out. But then again, our solar system only has a matter of time before the Sun engulfs it. What's important isn't what is eventual. It's the time scale on which it will happen. For now, that time scale is still very far out for turning this finding into something meaningful. I doubt it happens in my lifetime, even with the looming presence of quantum computing around the bend.

                1. theblackhand

                  Re: 9,223,372,036,854,775,808 sha1 calculations

                  Re: "While true, the likelihood of doing this AND getting a collision is highly improbable"

                  Aslong as you can pad the document in addition to making required changes AND making a change results in a financial advantage for you of more thanUS$130,000, I would be reluctant to call this highly improbable.

                  Based on previous hashes, the discovery of collisions has lead to more weaknesses being found, and usually large collision spaces within a hash function that need to be avoided.. What costs ~US$130,000 today will likely cost less than US$10,000 within 5 year.

                  Does it mean we have to throw away existing SHA-1 hashes? Probably not unless the financial incentive to attack them exists.

                  Does it mean we need to start patching SHA-1 hash functions to address discovered weaknesses deploying SHA-2 and later hash functions now for verifying important documents or software versions? Definitely - mainly because if we don't it just doesn't happen * stares at MD5 hashes *

                  Someone mentioned financial websites - all current browsers already require SHA-2 hash functions on certificates since January 2017. The only real exception I am aware of is old and cruddy Java 7 (or earlier) apps that refuse to upgrade or to die.

              2. CAPS LOCK

                "just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted."

                No, you now have a document of a DIFFERENT SIZE. The SHA1 AND the size have to be identical.

                1. Adam 1

                  Re: "just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted."

                  Why does the size have to be identical? As long as the artifact is a believable size you can launch an attack.

                  Imagine an executable file download through some sort of update mechanism that uses sha1 to validate the binary before executing it. No-one will notice that the 64MB upgrade.exe is 25KB larger. But now the attacker has replaced the intended payload with their own. It would be interesting to know about the collision algorithm. Like does it require the two files to be nearly identical? Does a large file take longer to generate a collision?

                  1. CAPS LOCK

                    "Why does the size have to be identical? "

                    In the article the pdfs are the same size. If you are checking for tampering, checking the file size is a lot easier than checking the SHA1.

                    1. Alan W. Rateliff, II
                      Paris Hilton

                      Re: "Why does the size have to be identical? "

                      Just a few thoughts.

                      PDFs are compressed containers, right? That being the case, you can change quite a lot of the payload and all you have to do is ensure the compressed output of your manipulated file has the same size and hash. Being that the input to the hash generator is essentially already manipulated in a manner which essentially obfuscates the source document.

                      Of course, that is a narrow application but still seems practical. Going a similar route, look at any compressed file you might download: .zip, .7z, .tgz, etc. For full confidence a check for all parts should be applied, not just the download but the individual hashes of each and every part. For instance, the hash of the archive, the hash of all parts combined, the hash of each individual file, hunk, etc. Doing so gets heavy and resource expensive rather quickly.

                      More broadly, think of the Open Document formats which are zipped containers, among others.

                      But what of TLS sessions? Consider that a spook or nefarious agency (arguably one in the same) has a packet capture of a TLS-encrypted session signed with a SHA-1 certificate. We already know sessions signed by a certificate generated with poor entropy (the debacle from a few years back) can be undone. $130k is nothing to throw at this for such agencies, maybe even enough to get the GPU calculation requirements down to something more reasonable than a year (how much was paid for the San Bernadino iPhone hack?) Whatever was in that session is at risk, be it an email, web search, forum posting, or penis-enhancement purchase confirmation page.

                      1. Cynic_999

                        Re: "Why does the size have to be identical? "

                        "

                        PDFs are compressed containers, right?

                        "

                        No. They are the very opposite of "compressed". Open a PDF in a text editor and you will see.

                    2. dajames

                      Re: "Why does the size have to be identical? "

                      If you are checking for tampering, checking the file size is a lot easier than checking the SHA1.

                      That assumes that you know what the size should be. If you have two documents with the same apparently correct signature you know that one is a forgery, but you have no idea which.

                      Checking the size does not tell you that -- unless you have some out-of-band mechanism for obtaining the expected size of the genuine document ... and how do you check the veracity of the out-of-band data? Well, it'll be signed, using a hash ...

                      1. tom dial Silver badge

                        Re: "Why does the size have to be identical? "

                        To verify the hash meaningfully, no matter what kind is used, you need to know what the output of your hash computation should be. That is no different from knowing what the size should be. That you might need to get that out-of-band may not be a major objection.

                        1. Graham Cobb Silver badge

                          Re: "Why does the size have to be identical? "

                          @tomdial: I don't think you understand how signing works.

                          1. tom dial Silver badge

                            Re: "Why does the size have to be identical? "

                            I think I do, and I also think the article was primarily about sha-1 sums of data, not the signing of the sha-1 sums.

                        2. dajames

                          Re: "Why does the size have to be identical? "

                          To verify the hash meaningfully, no matter what kind is used, you need to know what the output of your hash computation should be.

                          We're talking about signed PDFs, here. You know exactly what the hash should be, because it is part of a signature block that's been signed with the document author's private key.

                          [The article is about weakness in the hash process, not about any shortcomings of PKI, so the validity of the hash in the signature block can be taken as a given.]

                          That is no different from knowing what the size should be.

                          It's very different, because the size of the document is not part of the signature block.

              3. Anonymous Coward
                Anonymous Coward

                Re: 9,223,372,036,854,775,808 sha1 calculations

                > What I want to know if this is more than a collision but a preimage attack (or more severely, a SECOND-preimage attack that found a collision with a specific target).

                One presumes so.

                If they were just going for birthday collisions, I would think that the chances of getting two random documents which are both valid PDFs with readable text in them is rather low.

                OTOH, the chances of it being a valid Perl script are quite high :-)

            2. Hans 1
              Boffin

              Re: 9,223,372,036,854,775,808 sha1 calculations

              >Yes, it's a problem if you can insert 'not'. It's less of a problem if you can only produce the collision if you insert 'KJ"BIUE_D H£(*ERNY£'

              Hey, you know that, when it comes to PDF, you could place 'KJ"BIUE_D H£(*ERNY£' behind the image or something ...

              Remember, this is bytes, so, you want the hash of 000011100010001xxxxxxx0001000111001100[...] to equal the hash of 00001111101000100100010001000111001100[...], where x can be anything because located behind an image or font color is white or same as background etc ... you name it. This has now been proven to be feasible, expensive, 130k or such, but feasible ... and it will only get cheaper ... in 10 years time, you and I might be able to do it over a weekend for the lulz, better hw, better algo to compute this ... ;-)

              You could also say the random bits you want are for an image with total transparency ... you know, the sky's the limit.

              1. MJB7

                Re: 9,223,372,036,854,775,808 sha1 calculations

                @Hans 1 You are almost right *but* what they can do is get

                000011100010001xxxxxxx0001000111001100[...] to equal the hash of 000011111010001yyyyyyy0001000111001100[...], where x and y can be anything.

                This is the difference between a collision (which we now have), and a second pre-image attack (which we don't have - yet).

              2. Tony Haines

                Re: 9,223,372,036,854,775,808 sha1 calculations

                >"But if you can insert "not" AND just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted."

                > "Hey, you know that, when it comes to PDF, you could place 'KJ"BIUE_D H£(*ERNY£' behind the image or something ..."

                It seems to me this is an argument for deprecating the pdf format (and anything else which has undefined buffers or hidden data) for contracts and anything else which needs to be digitally signed, to be honest.

                If the document was in a form where you couldn't tweak it by tinkering with effectively 'wild' bits, it would be enormously harder to produce a document of the same size.

                Larger documents would still be almost as easy, of course, so you'd need to add an exact filesize to the hash.

          2. Ian Michael Gumby

            @Chris Miller Re: 9,223,372,036,854,775,808 sha1 calculations

            No, sorry, its not that easy.

            For a very long time, its been known that there's a potential for a SHA-1 collision to occur. Its was just that no one found it.

            While a random collision can occur, creating a specific modified document and then to figure out how to create a SHA-1 hash equivalent that is also the same size? Not that easy,

        3. Charles 9

          Re: 9,223,372,036,854,775,808 sha1 calculations

          "Still not much of an attack, IMO. Unless someone can arbitrarily set the difference to something meaningful, it only proves that its possible to overcome SHA-1. It's still really really REALLY improbable to produce a meaningful difference."

          Sometimes, just a few characters can be enough to change the whole meaning of a document, such as the inclusion or exclusion of a single "not".

          1. Anonymous Coward
            Anonymous Coward

            Re: 9,223,372,036,854,775,808 sha1 calculations

            "Sometimes, just a few characters can be enough to change the whole meaning of a document, such as the inclusion or exclusion of a single "not"."

            But again, they have to be meaningful characters you've changed. You have to include/exclude that "not" and *then* run through 9 quintillion variations of meaningless padding at the end of the document and run the sha1 calculations until you get a collision.

            Not saying it's impossible, we've always known that. It's just becoming less-difficult. Which is why we're already migrating to SHA-256.

          2. JMcL

            Re: 9,223,372,036,854,775,808 sha1 calculations

            ...or insertion of 000 for that matter!

          3. Cynic_999

            Re: 9,223,372,036,854,775,808 sha1 calculations

            "

            Sometimes, just a few characters can be enough to change the whole meaning of a document, such as the inclusion or exclusion of a single "not".

            "

            The point is that after inserting the "not" you will have to change a large amount of data in addition in order to achieve the same hash. In a text document those changes will not make any sense, and so the change would be exposed. This can only work on data that can have many changes applied without being noticed - e.g. a BMP or JPG image where lots of bytes can be changed but still leave a recognisable image.

            In a certificate or other small or well-structured file with few bytes that could be changed without invalidating the format, it will not be possible to create a viable but altered file with the same hash.

        4. dajames

          Re: 9,223,372,036,854,775,808 sha1 calculations

          It's still really really REALLY improbable to produce a meaningful difference.

          Not really any more improbable than the demonstration ... if you can make changes to the background colour (say) until you get the hash you want what's to stop you changing the content to whatever you want to say and then changing the background colour so that the hash matches that of the original document? The difficulty must be about the same as the demonstration.

          What I'd like to see is an estimate of the cost of those 9,223,372,036,854,775,808 sha1 calculations and umpty-bazillion hours of GPU time. How much value does a document have to have for it to be worth doctoring it in this way?

          There's nothing much new here apart from the fact that some folk have demonstrated something that was already known to be theoretically possible. We all know that SHA-1 is deprecated and that SHA-2 and SHA-3 are more secure. It's rather sad if it tales an actual demonstration to prove that the theoretical risk is real.

          1. oxfordmale78

            Re: 9,223,372,036,854,775,808 sha1 calculations

            It costs nothing if you use some else's hardware...think botnets.

        5. Anonymous Coward
          Anonymous Coward

          Re: 9,223,372,036,854,775,808 sha1 calculations

          > It's still really really REALLY improbable to produce a meaningful difference.

          No it's not. You make a meaningful difference, and put some hidden padding somewhere to make the SHA1 match the desired value.

          This is the same process as when MD5 was broken - in that case they used it to create a forged SSL certificate. It had a "tumor" in it to fudge the MD5 hash, but was still a valid certificate.

          https://www.win.tue.nl/hashclash/rogue-ca/

          In short, it's is now practical to forge a certificate with an SHA-1 hash. You should be worried - efforts to disable SHA-1 in browsers have not come early enough.

      2. Bruce Hoult

        Re: 9,223,372,036,854,775,808 sha1 calculations

        There are a total of 150 bits different, by the way.

      3. Anonymous Coward
        Anonymous Coward

        Re: 9,223,372,036,854,775,808 sha1 calculations

        But, did they change the account number and sort code of an account to receive gizillions, rather than some random shite. As for certs, your OS already trusts many govt controlled CA, so that claim is pointless.

        Anyone care to share a financial institution with a sha-1 cert?

    2. Mage Silver badge
      Facepalm

      Re: 9,223,372,036,854,775,808 sha1 calculations

      The SHA-1 was already depreciated because it was already known it would be possible soon.

      It will only get easier.

      That's why SHA-256 exists already.

      A special offer weekend deal on AWS just when someone wants to do this?

    3. oxfordmale78

      Re: 9,223,372,036,854,775,808 sha1 calculations

      If someone has botnet of 150,000 nodes, it just takes a couple of days to find a SHA1 collision.

  2. Snowy Silver badge
    FAIL

    What would have been truly clever if the documents had differed by more than 0.0147%, all that work and to make it work they change next to nothing.

    Edit beaten to it by Preston Munchensonton, better said than me have an upvote.

    1. Chris Miller

      The point of a hash function is that even if you change just a single bit, it should give a completely different output. There are far more possible documents than there are hash function outputs, so collisions will always exist - but it should be computationally impossible to find them. This attack proves that it no longer is.

      1. Deltics

        No hash function makes finding collisions computationally /impossible/, the best you can hope for is /practically useless/.

        As in, too impractical to be of any utility to a miscreant.

        As others have said, the point is not the ability to contrive a collision but to establish a means to change a document or file in a way that has any practical result (changing the terms of a contract, introducing functional malware or a backdoor) without changing the byte count in the file (nobody relies on JUST the has, I hope) and still contrive to create a collision.

        AND the result of the change also has to remain undetectable when subject to an eyeball test. i.e. It's no good being able to change the terms of a contract if to achieve that whilst preserving the file size and the hash you have also introduced what is obvious junk content as well.

        Changing the background color of a PDF ? I'm not sure what the possible malicious application of that could be. This doesn't prove that an effective attack is possible, only that a pointless attack is. Surely ?

        I can drive my car into a concrete wall at 100mph and effectively demonstrate that the collision protection offered by my vehicle is inadequate to protect the occupants in the event of such a collision. But that doesn't mean that the collision protection in the vehicle is inadequate.

        1. Trigonoceps occipitalis

          PDF BACKGROUND

          Is it possible that the change of background colour is a part of a suite of changes that hide the addition of "not"?

          1. Chris Miller

            Re: PDF BACKGROUND

            I think* that PDFs (and Word documents and almost any other document format except plain text) have a sufficient number of 'filler' characters that can be set to any value you like without significantly changing the document format. This gives scope for a clever algorithm to produce the necessary collisions.

            *I haven't read the paper, but this is how it worked for MD5

          2. Tom 7

            Re: PDF BACKGROUND

            I have produced PDF documents that produce different output when printed on different printers. And that's without 'low ink', It should be irrelevent to this discussion but if you want a PDF document to say what it should print it and check it yourself.

        2. Hollerithevo

          A date, for instance?

          A delivery date in a contract? A price?

        3. Kiwi
          Boffin

          AND the result of the change also has to remain undetectable when subject to an eyeball test. i.e. It's no good being able to change the terms of a contract if to achieve that whilst preserving the file size and the hash you have also introduced what is obvious junk content as well.

          Take a 1,000 or so word document. One you know well, and as you're in HR have read a half a dozen times today alone with new hires coming into the company. Spot the one small change in the dry legalese that is numbing to the mind of even the most dedicated HR drone.

          A decent text-comparison program should pick it up. The human eyeball probably never will unless you spot something else out, eg a paragraph that should have one word on the last line has 2. A drop of a character elsewhere could fix that.

          Remember that research has shown that we respond more to the shape of words than the actual letters used in the words, and we can speed-read badly spelt documents without even noticing the errors. A human eye will not spot a small change in a large document or bit of source code.

      2. Dan 55 Silver badge

        Well all hashes are going to fail sometime as you convert a really long string of bits into a shorter string of bits.

      3. Adam 1

        > There are far more possible documents than there are hash function outputs

        Known as the Pigeonhole principle. Where the size of input exceeds the size of the hash output, there not only "can be" but "must be" collisions. To those harping on about file size differences, Windows explorer rounds to the closest KB unless you specifically check the details of the properties. For most use cases, having a slightly different file size is unimportant, but it is impressive nonetheless that they could locate a collision even constraining themselves to the valid file format and same file size.

        Designing effective hash functions is really hard. I had to last year and stuffed it big time. It wasn't a cryptographic hash but one that would see hundreds of thousands of our objects get hashed into different buckets for faster dictionary lookups. So basically you had an infinite combination of inputs that had to hash to 32 bits (about 4 billion). I managed to create an accidental swarm to zero which meant that whilst there was good distribution generally, a substantial proportion of real world objects would end up in bucket 0. After fixing it, the worst I saw was in the 3 objects per bucket out of a million range.

        1. Ken Moorhouse Silver badge

          Pigeonhole Principle

          +1 for naming the problem.

          Publishing different hash signatures for the same document, and/or partitioning the document and using the same hash method to "cover" overlapping portions of it* have got to be the only way of minimising the effect of the Pigeonhole Principle without moving to a different scheme which has a higher number of pigeonholes.

          Using this multiplicity technique effectively increases the number of pigeonholes used to "describe" a given mapping, and has a similar effect to triangulating measurements for a map using a theodolite (i.e., reduces the chance of error in positioning).

          With multiple hashes, let's be clear on this: Every single one has to match before provenance can be established. So long as the statistics show a suitably low collision probability for the number of hashes utilised I don't think it matters how secure the underlying crypto used to generate each of those hashes is. However, using different "salts" on the same method is probably inadvisable.

          *Commentators here have mentioned background colouring and formatting obfuscation techniques to render an identical hash: Why not "layer" the hashing mechanism in such a way that the document is one layer and the presentation is a separate layer? This will mean each layer bearing its own hash. Only when the two hashes are identical are we looking at the authentic document. Of course, this method of hash generation would have to be put into the public domain for it to be useful.

          1. Adam 1

            Re: Pigeonhole Principle

            > or partitioning the document and using the same hash method to "cover" overlapping portions of it

            Just be aware that whilst the pigeon hole principle shows that it is mathematically certain that two documents that collide must exist where the input size exceeds the hash output size, it does not follow that two inputs that are smaller than the hash output cannot collide. In a well designed algorithm, it should be both very rare and computationally infeasible to find the other. IE. Nothing short of brute force

            1. Charles 9

              Re: Pigeonhole Principle

              Maybe, but because computer technology continues to improve, brute force gets easier and easier. Imagine if you have a Mirai-class botnet at your disposal and you set them to the task of trying to perform a second-preimage attack.

      4. This post has been deleted by its author

      5. DaLo

        Not just computationally impossible to find them, which may be a big ask as you could always brute force with enough computing power.

        It is key that you cannot premeditatedly change a document and manipulate it to produce the same hash which is why this shows SHA1 as broken.

        You may just find, unlikely but possible, that your picture of your cat creates the same hash as your contract for your house. However no one would expect that your solicitor got you to buy a house based on a cat picture.

        If however you can work out that by changing a little bit of text you can then add extra text/bytes/graphics and work out what they should be to force a collision then it is broken.

        If the only other hashes that exist for a document are all random garbage or completely unconnected to the original document as well as being impossible with current resources to premeditatedly work out then it isn't broken.

    2. Kiwi
      Boffin

      What would have been truly clever if the documents had differed by more than 0.0147%, all that work and to make it work they change next to nothing.

      What if it's an executable or source code, and that 0.0147% change defeats some sort of password checking or other security routine? Maybe it changes a variable (if password=correct set correctpass=1 else correctpass=0) to a constant (set correctpass=1)1? How many bytes would that need?

      1 Ok, probably not even close to any RW code but I hope you get the idea. If not, back to remedial English comprehension for you..

  3. Anonymous Coward
    Anonymous Coward

    Nice work!

    Everyone by now should realize computer/data security is a moving target...

    Out with the old in with the new.

    Long live SHA-256!

    Well...for at least...hopefully...to the end of the decade.

  4. Ken Hagan Gold badge

    Do we need to do anything about old content?

    If we have, say, a legal contract that was timestamped and signed with SHA-1, will it be possible in the future to produce a different contract with the same (past) timestamp? If so, is it possible to defend against these attacks now by counter-signing them today with SHA-256? (I'm thinking that the counter-signature would prove that someone in 2017, whilst the SHA-1 signature was still worth something, vouched for the original contents and *that* counter-signature won't be similarly vulnerable for ... well, a few more years yet?)

    1. Anonymous Coward
      Anonymous Coward

      Re: Do we need to do anything about old content?

      That is a different class of attack: A second pre-image or "Find a Y with the same hash as X".

      A collision is "Find any X and Y with the same hash".

      1. Charles 9

        Re: Do we need to do anything about old content?

        But isn't that what was demonstrated here just now? They copied the first document and then altered it to produce the same hash as the first one?

        1. MJB7

          Re: Do we need to do anything about old content?

          No. They manipulated both first and second document until they got an identical hash.

          1. Kiwi
            FAIL

            Re: Do we need to do anything about old content?

            No. They manipulated both first and second document until they got an identical hash.

            Back when SHA-1 was new and "not in your lifetime" crackable, you signed a contract saying that "in the case of early termination of your contract we will pay you $10,000".

            With a slight bit of re-working, it now says "in the case of early termination of your contract you will pay us $10,000"

            Yes, there could be reason to have concern about old documents being tampered with.

            1. Ken Moorhouse Silver badge

              Re: Do we need to do anything about old content?

              Too late.

              If old content has been stored in the cloud, or on any other kind of backup medium, then it can be recalled for further analysis when the time comes.

    2. Anonymous Coward
      Anonymous Coward

      Re: Do we need to do anything about old content?

      Good questions! I'm not a security professional, merely a senior linux admin, but I can imagine this scenario if you don't mind.

      "will it be possible in the future to produce a different contract with the same (past) timestamp?"

      Yes, unless you wrap the storage of these items in a tight atomic transaction and perhaps have a live auditing task that can see/log a "new" document being added back into the repo using a past creation date, or some other similar flagging of suspicious activity, or you lock down the access using FACLs and audit similarly "bad things" could happen to those documents at rest on disk. And you should also be leveraging any access control abilities from the storage subsystem. With physical access a miscreant could still gain access and enough knowledge of the system to hide this kind of activity, so it's not completely bullet proof, but you see where I'm going with this. In the Linux/Unix world there are commands that can easily set creation/access/modification times to anything within range, most likely throughout the "Unix Epoch," as long as you gain write access to it. FACLs can further reduce this exposure, but at most sites the traditional file permissions are still used. Hint; if you have super important docs like the ones you described, use all the methods above and available, and reduce the number of humans able to reach them, and confine the humans to role based access lists; document creators can only create, readers can only read, modifiers can read/modify but not create, etc. Or archive to an offsite, offline repo if you don't want to rework the document encryption and don't need near-line access. There's a lot you can do to protect data.

      "is it possible to defend against these attacks now by counter-signing them today with SHA-256?"

      Yes, re-encrypting the SHA-1 docs into SHA-256 should be trivial, and you would not need to decrypt the old ones, unless you think that would confuse future caretakers of the document storage systems. In other words, it does not matter if you remove it or not, the outer most layer of encryption protects it, and extra encryptions underneath that are just icing on your security cake. Even if it's rot13, it's another layer to wade through to get the goods.

      1. Red Bren
        Joke

        Re: Do we need to do anything about old content?

        "Even if it's rot13, it's another layer to wade through to get the goods."

        How many layers of ROT13 should I use? At least two...

  5. Tom 7

    Can anyone prove SHA256 is any better?

    You'd think it might be but is it mathematically proven that it wont have collisions within the same range?

    As an aside just proving a 1/2 million character PDF document has not been changed is irrelevant as I doubt its ever been read in the first place - that's potentially >10,000 pages.

    1. streaky

      Re: Can anyone prove SHA256 is any better?

      It's more computationally expensive and the key space is larger so.. yes? Prove is like trying to prove god doesn't exist though..

      1. Elf
        Holmes

        Re: Can anyone prove SHA256 is any better?

        >> Prove is like trying to prove god doesn't exist though.

        ... 'But, says Man, the Babel fish is a dead giveaway, isn't it? It could not have evolved by chance. It proves you exist, and, by your own arguments, you don't. QED.'

      2. Naselus

        Re: Can anyone prove SHA256 is any better?

        "Prove is like trying to prove god doesn't exist though.."

        No, it isn't. Math is math and you can create a proof that something can't happen.

        1. streaky

          Re: Can anyone prove SHA256 is any better?

          You can already mathematically prove it's more expensive and time consuming - and to what degree. Once you physically prove the state of it you've by definition already found a collision and thereby proven that god doesn't exist.

  6. Will Godfrey Silver badge
    Angel

    Phew!

    Only a month ago I went through the pain of changing to SHA 256. When your system is so old even the spiders won't grace it with their webs, it's difficult to remember how you set it up, and what you need to do to update it.

  7. Ken Moorhouse Silver badge

    Any such proof of authenticity...

    ...effectively has a "time-limit" on it.

    This means that if you want a document to be of indisputable provenance, it still needs to be stored traditionally e.g., in a solicitor's safe.

    1. Charles 9

      Re: Any such proof of authenticity...

      Unless, of course, someone cracks the safe and then surreptitiously alters the contents...

      1. Ken Moorhouse Silver badge

        Re: Surreptitiously

        "Unless, of course, someone cracks the safe and then surreptitiously alters the contents..."

        Hmm, having surreptitiously broken into the building, the office, and the safe?

        As the contents of solicitor's offices are not of particular interest to the average burglar, it would arguably be quite easy to guess motives in such an eventuality.

        1. Charles 9

          Re: Surreptitiously

          But if done right, hard to prove. You can create a "he said, she said" situation. And this can be of significance if the document involved is, say, a last will.

    2. danbi

      Re: Any such proof of authenticity...

      The irony being that the digital realm of documents doesn't add much credibility and authentic proof over what was available with stone age documents. Both can be manipulated, after a while -- and we enjoy the wonders of "history", full of fairy tales that rival contemporary fantasy genre...

      No wonder, some older cultures didn't trust written "knowledge".

      1. Charles 9

        Re: Any such proof of authenticity...

        "No wonder, some older cultures didn't trust written "knowledge"."

        But at the same time, it's hard for people like us to believe people once relied on other people's memory, which we now know has plenty of potential to get muddled and messed up, especially with age. And I haven't even touched on deliberate fabrications (eg. one lies, the other swears by it).

        Seems you can't win either way.

  8. This post has been deleted by its author

  9. Andy 73 Silver badge

    Court of law

    If you were relying on a document in a court of law and it had been signed with SHA1, then my assumption is that if two parties produced two documents with the same hash, there would be some examination of the documents. The guess is that the original would be 'clean' and the doctored one would have the doctored text (the subtle addition of the word 'not' where appropriate) PLUS some additional garbage to bring the signatures back into line. As it's astronomically unlikely to produce the same signature when only making a meaningful semantic change to the text, it should be possible for a forensic examination of the documents to identify which is 'clean'.

    However, if SHA1 is only being used in an automated system (Git etc.), then the issue is much more pressing - the assumption being that the system will not be coded to identify a collision, which would be processed and actioned unnoticed.

    1. Ken Hagan Gold badge

      Re: Court of law

      "... PLUS some additional garbage ..."

      The problem with that is that some popular document formats, such as DOC, have such garbage in the original as well.

      (For those too young to remember DOC rather than DOCX, or who have never looked into the actual format, all the original Microsoft Office file formats are layered on top of something called Structured Storage, which is like a miniature file system within a single file. Like any other file system, this divides the container into fixed sized blocks and if an individual stream is not an exact multiple of the block size then there is wasted space within the container. It costs a little time to zero out such blocks, so it isn't always done. In particular, I think WORD had a "fast saves" option that was on by default which would have this effect. Consequently, I rather suspect that the majority of older DOC files have plenty of legitimate garbage that you could use to compensate for your malicious modification.)

      (I believe that DOCX and ODT are also structured formats but the container in this case is a ZIP archive. I don't know if that is similarly blighted with wasted space. Perhaps someone will jump in and tell us. If so, then the vast majority of documents that one might want to sign have plenty of wiggle room for this sort of attack.)

      1. allthecoolshortnamesweretaken

        Re: Court of law

        Ahh, that explains a lot... I always knew that disabling the "fast save" option in, say WinWord 2.0, was a good idea as "fast save" had a tendency to fuck up your files (not something you'd want, especially while working on your thesis), but now I know why.

        As to the the article - "Today, 10 years after of SHA-1 was first introduced, we are announcing the first practical technique for generating a collision," the research team said today. - I can't shake the feeling that maybe another team, possibly based in Anne Arundel County (MD), got there already a couple of years ago, but didn't bother with announcing it.

  10. Crazy Operations Guy

    This is why I use multiple hashes

    Hashes are quick and easy to compute, so there is no reason to not calculate multiple hashes for the same data.

    How much compute power would be required to create a counterfeit document that matches both the original's SHA-1 *and* MD5 hashes.

    Since its so quick and easy to do, I've calculated the MD5, SHA-1, and SHA-256 hashes for everything important on my systems / network.

    1. -tim
      Coat

      Re: This is why I use multiple hashes

      For things that I need to make sure haven't been changed, I tend to use a modified version of an existing hash as well. All modern hashes are seeded using a table. MD5 uses a table of sines, SHA1 uses a values that start out as 0x67 45 23 01 and SHA-256 uses fractional parts of cube roots of primes. Simply swapping the 6 and 7 in SHA-1 and rebuilding the source provides a different hash that should be as cryptographically as secure as SHA-1 yet an attacker would need to know how you modified the seeds as well to compute a colliding document. Swapping the 6 and 7 does produce different hashes for the example files. If you treat your modified table as a secret key, you have that strength but even if you publish your table, that forces an opponent away from precomputed rainbow tables. This is much like seeding in password hashes and since the tables seem to be arbitrary chosen numbers, it shouldn't have the same problems that picking the wrong s-box values does in code like DES.

    2. Ken Hagan Gold badge

      Re: This is why I use multiple hashes

      Matching SHA-1 and MD5 simultaneously might be easier than you (or I) think, else the security pros would have recommended exactly that technique instead of inventing SHA-256.

    3. Tomato42
      Boffin

      Re: This is why I use multiple hashes

      Using two hashes doesn't work to increase security above the security of the stronger one: https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/1472.pdf

      Just use SHA-256, it has been in all cryptographic libraries for over a decade already!

      1. Anonymous Coward
        Anonymous Coward

        Re: This is why I use multiple hashes

        Tomato42@,

        Of course, you are only as secure as the 'Strongest Hash' but doesn't the recording of Multiple Hashes make the calculation of a Hash Collision harder (impossible ???) as you would have to calculate a hash collision for all the Hashes, with the same document.

        (Any mis-match of any of the hashes would flag up a problem, so even if SHA-256 was broken you would still have to match the other hashes.)

        I don't have the math skills to prove whether this is feasible or even possible. !!!

        1. Charles 9

          Re: This is why I use multiple hashes

          The problem is that all hashing functions work on the same fundamental principles, plus there's the Pigeonhole Principle to consider (due to hashes being smaller than their documents, collisions MUST occur). The paper above demonstrates you can correlate multiple hashing functions so that finding a collision for all is as easy as finding a collision for one.

          Now, an alternative proposal may be to chain hashes by hashing the whole document as well as particular segments of the document, producing multiple overlapping checks. The Merkle Tree is an example of this technique, though in this case a fixed-structure hash chain would probably be better-suited and more robust against preimage attacks. The technique is also algorithm-agnostic so can be moved up from SHA-1 to SHA-3 or whatever.

  11. Ken Moorhouse Silver badge

    Concoct forged document

    The point about forging a document and the difficulty of making that forgery convincing is compelling.

    However, with the passage of time and the likelihood of people cracking these algorithms, the likelihood of being able to produce a convincing forgery is increased to the extent that people should never rely on electronic signatures if the life-span of the document's importance exceeds say one or two years.

  12. Anonymous Coward
    Anonymous Coward

    Its interesting to note`

    That some of the earlier versions of Cryptowall used MD5 as well as weaker versions of SHA in their functions early on, before generating the 2048 bit AES key from a hash of various unique identifiers.

    Using this method, it might be possible to reverse engineer these versions to fool the software into undoing the damage by faking a paid ransom key.

    This assumes the following.

    1) you have an identical copy of the machine, byte for byte,

    2) the date, time etc of the machine is also identical

    3) the hardware is also identical eg RAM SPD data, CPUID, LAN address etc.

    This would be the computational equivalent of running time backwards, then copying the key as it is being generated and using this to extract the original files.

    Any takers?

  13. This post has been deleted by its author

  14. Oh Homer
    Headmaster

    Sound and fury, signifying nothing

    Given enough time and resources, all security will be breached. The point of security is not to stop the inevitable, but merely to slow it down long enough for the breach to no longer matter.

    Calculating just a single collision in 6,500 years of CPU time is not exactly what I'd call the death blow for SHA1. That's still significant enough that manufacturing collisions for low-grade material just won't be worth the effort, and even military stuff will be old news by the time you've compromised it.

    Maybe if you could spoof target coordinates in under 60 seconds using a smartphone you might have a problem, or if you'd be inclined to waste 100k+ on the equipment necessary to fiddle with the terms of somebody's 50k contract, but as it stands none of that is likely any time soon, which means the security is doing its job as designed.

    That will not always be the case for SHA1, but it's far too soon to hit the panic button, unless your primary interest is hype not security.

    1. Anonymous Coward
      Anonymous Coward

      Re: Sound and fury, signifying nothing

      "Maybe if you could spoof target coordinates in under 60 seconds using a smartphone you might have a problem, or if you'd be inclined to waste 100k+ on the equipment necessary to fiddle with the terms of somebody's 50k contract, but as it stands none of that is likely any time soon, which means the security is doing its job as designed."

      Unless, of course, that 50k contract has knock-on effects that can propel the total into the millions, in which case you now have good RoI.

    2. tom dial Silver badge

      Re: Sound and fury, signifying nothing

      6,500 CPU-Years is a few weeks of work on a 150,000 unit botnet. It is well not to make too much of numbers that criminals can be downscale easily. For that matter, the US DoD has nearly three quarters of a million civilian employees, nearly all of whom are provided workstations that could, in principle, be recruited for about 100 hours a week, producing 6,500 CPU-Years in a week or so.

      Producing a document tailored to specific semantic change might (or might not) be many orders of magnitude more expensive, but sticking with SHA-1 based on some big numbers seems rather foolish.

  15. WatAWorld

    Is this the same Google that has been unable to implement an automatic update for Android?

    Dumb question, I know, but is the Google that sponsored this the same Google that has been unable to implement an automatic update for Android?

    Maybe instead of publishing "how to hacks" for other people's products and systems they should put some more time and effort into making their own products secure:

    1. Figuring out how to make automatic updates work (like Apple did decades ago) and,

    2. Getting vendors and re-sellers to agree to let that automatic update process work (like Microsoft did a decade ago).

    Or is this some different Google?

    I know this other stuff, zero days in other people's products and systems, is important.

    But JOB ONE for Google should be taking care of its own products, its own customers and making stuff attached to its name secure -- rather than sitting up their in their giant glass house throwing stones.

    1. Tomato42
      Facepalm

      Re: Is this the same Google that has been unable to implement an automatic update for Android?

      you know that google is a large company, and thus it's impossible for all of their employees to work on the same thing? You know, "too many cooks" and all that...

    2. Oh Homer
      Headmaster

      Re: "unable to implement an automatic update for Android"

      Not defending Google, in fact I agree with the sentiment at least, but the fact is there is no such thing as one "Android OS" that Google can deploy to millions of handsets. Instead there are countless proprietary smartphone operating systems based on Android, deployed by various manufacturers to devices that Google has no access rights to, at least not for the purpose of updating the OS.

      And if Google could somehow update the OS remotely, it simply wouldn't be the same custom OS as provided by the vendor, it'd be vanilla Android, missing all the proprietary bits necessary to keep the vendor's proprietary UI and shovelware running, missing the vendor's proprietary drivers necessary to keep the hardware working, and would thus also invalidate your warranty, which is with the vendor, not Google.

      This is the problem with OEM arrangements in general, not just with Android in particular, and is a problem that Apple avoids by being both the OS developer and the hardware vendor.

      Of course another way to avoid this problem would be to abolish all the proprietary junk in both the hardware and OS, and have an open hardware, open specification, open source solution where everyone could apply daily, incremental updates to every part of the system, including apps and core OS components, but then companies like Samsung would whine about losing their "competitive advantage", as we'd all be able to construct our own smartphones from kits in Maplin.

      Actually an even more fundamental reason for all that proprietary junk in smartphones is that the FCC/CEPT/Ofcom mandate that a lot of it must be a black box, incapable of being fiddled with by consumers, but that's another story.

      1. Charles 9

        Re: "unable to implement an automatic update for Android"

        "Of course another way to avoid this problem would be to abolish all the proprietary junk in both the hardware and OS, and have an open hardware, open specification, open source solution where everyone could apply daily, incremental updates to every part of the system, including apps and core OS components, but then companies like Samsung would whine about losing their "competitive advantage", as we'd all be able to construct our own smartphones from kits in Maplin."

        Due to the competitive nature of the market, particularly in mobile, open hardware is not going to happen, as trade secrets and patents (and we're talking hardware here, so their use here is valid) are in play. This is also one reason one can't just make a completely open mobile OS because a lot of mobile hardware is black-boxed to prevent "Giving Information To The Enemy" and the interfaces only come in binary blobs complete with contracts and so on.

  16. YARR
    Boffin

    In the spirit of unit-y...

    I propose that this house agree on the need to establish a definitively quantifiable unit of elapsed processing that is not time or cost based, and to adopt said unit henceforth.

    1. Vath

      Re: In the spirit of unit-y...

      The need is seconded.

      I'd like to point out that although the need is great so too is the task. It is my opinion that measurements of computational exertion can only be divided into the number of operations required or the time required for such operations to complete on a known set of hardware. With this assumption I'd like to make a few suggestions:

      * The Moon Mission (estimated total processing power used in the Apollo 11 mission)

      * The Colossus (estimated total processing power used by the Colossus machine to break enigma during WWII)

      * The BlockChain (estimated total processing power required to mine all 1 million BitCoin)

      These are just my suggestions.

      1. YARR

        Re: In the spirit of unit-y...

        Another possibility could be the computation required to complete the Human Genome Project.

        However a unit should ideally be a determinate amount of processing as opposed to a expected / estimated amount of processing that usually applies to code-cracking algorithms.

  17. bombastic bob Silver badge
    Pirate

    this is how it started with WPA

    and this is how it first started with WPA, which was later shown to be crackable in an extremely short period of time using more sophisticated techniques.

    I expect that now it's been done, someone will invest the effort to reduce the number of calculations down to something a bit more usable, maybe by exploiting other weaknesses that COULD be derived from this proof of concept...

  18. Prst. V.Jeltz Silver badge

    I suggest a new system involving snowflakes

    1. Charles 9
  19. JimmyPage Silver badge
    Stop

    Stop using PDFs ?

    And any other fancy formats ?

    Could you carry on using SHA-1 with bog-standard ASCII documents, like a .TXT ?

    I'd like to see two of those identical in length, with the same SHA-1, where one could be a acceptable version of another.

    1. sysconfig

      Re: Stop using PDFs ?

      That's a very good point you're making there, JimmyPage.

      Since false certificates were part of this discussion, I'd like to see that too. A cert is nothing but a ASCII text document of a very specific format. That should be a lot harder to pull off than using binary blob formats like PDF, which would allow you to hide a lot of stuff quite easily to tweak the hash to your liking.

      Having said that, I'm not defending SHA-1. It was already known that its days are numbered.

      Also, let's not use the term "calculate" when we refer to this stunt Google pulled off. Anything that uses 6500 years of compute time sounds a lot more like trial & error to me... or trial, verify, dismiss, repeat. Not quite a straight forward calculation. So SHA-1 is not really broken; it's just too weak as compute power becomes cheaper.

      EDIT TO ADD, even if wandering off on a tangent: There are better ways to break SSL encryption, regardless of the hash used. How many of the Certificate Authorities that your OS&browser know, do YOU know? How many of them do you personally TRUST? SSL is fundamentally broken by design; unfortunately with no feasible alternative as yet.

      1. Kiwi
        Boffin

        Re: Stop using PDFs ?

        Also, let's not use the term "calculate" when we refer to this stunt Google pulled off. Anything that uses 6500 years of compute time sounds a lot more like trial & error to me... or trial, verify, dismiss, repeat. Not quite a straight forward calculation. So SHA-1 is not really broken; it's just too weak as compute power becomes cheaper.

        Thomas Edison is said to have found hundreds of ways that don't work for making light bulbs. For a long time there were 0 light bulbs in the world, then there was one, and suddenly not only were there thousands and then millions but whole construction industries (very little reason to run power cables to homes before electric light, so very few towns even had electricity). In a very short time massive amounts of industry and the relevant research into improved efficiencies came about because one person figured out how to do something.

        Google have shown it can be done. Someone will optimize the code and shave some time off (and at 6,500 years, even one in a billion reduced instructions will save a hell of a lot of time!), someone else will look at the math and see a better equation, and someone will drop an instruction on their rather large botnet. First run might be 6,500 years, second might be 1,000 hours.

      2. Tomato42
        Boffin

        Re: Stop using PDFs ?

        > A cert is nothing but a ASCII text document of a very specific format.

        there must have been some serious changes to the ASN.1 DER encoding, because last time I checked it was very much so a binary format, storing the RSA parameters as big-endian integers, etc.

        > Also, let's not use the term "calculate" when we refer to this stunt Google pulled off. Anything that uses 6500 years of compute time sounds a lot more like trial & error to me...

        if it was brute force, it wouldn't take 6500 CPU-years to compute, it would take good few orders of magnitude more - over a million times more to be exact

  20. ritey

    HMAC-SHA1

    is still fine if the data it's authenticating is small and to a known structure.

    But for large documents SHA1 should have been scrapped years ago.

    1. Tomato42
      Boffin

      Re: HMAC-SHA1

      HMAC-SHA1 is safe _only_ if it is used as a MAC - with a secret key - it's just as insecure as regular SHA-1 when it's used as a hash function.

  21. Sorry, you cannot reuse an old handle.

    double check?

    has anyone tried to MD5 the two documents and see whether those checksums are the same?

    I understand the significance of the feat and the rush to use more and more (computational) secure hashing algorithms but wouldn't be just easier to calculate both MD5 and SHA-1 on any document and see if BOTH are the same of if one of the two differs - then it's not the same document...

    [don't underestimate the power of combining two "stupid" systems to make a "smart" one]

    1. Tomato42
      Boffin

      Re: double check?

      or you could have migrated to SHA-256 good 10 years ago, have a hash function with smaller space requirement and faster at that.

      and while those two documents most likely have different md-5 hashes, creating documents that have the same md-5 and sha-1 hash is not significantly more complex.

  22. Paul Woodhouse

    hmm, wouldn't it be possible to have code to automatically detect the padding that must be required to actually generate the collisions after you've altered the original document? I would imagine it would be very very very rare for collisions to be possible from actual document alterations that by themselves would be meaningful, hmm, or keep an eye out for weird looking meta tags...

  23. cosmogoblin

    Newsworthy?

    This is obviously of interest on The Register, but is it actually interesting more generally?

    I remember when the very first hash collision was identified, being astonished that it had ever been considered impossible. As several commentards have pointed out, if the hash is smaller than the document, then OBVIOUSLY collision exist (whether or not you can find them in less than 10^n years).

    On a separate note, surely you could just use separate hashes. Finding a collision on one algorithm is difficult; finding documents which collide on five or six would be near-impossible.

    1. Charles 9

      Re: Newsworthy?

      As another commenter noted (with citation), it's actually easier than you think. You're better off using one strong hash than multiple weaker ones (the paper notes that the end result will be at best as strong as your strongest but at worst as weak as your weakest).

      1. cosmogoblin

        Re: Newsworthy?

        Very interesting, thanks for pointing that out. The cryptomaths is beyond me, but I assume the abstract is supported. It's a very surprising and counterintuitive result.

  24. Version 1.0 Silver badge
    Big Brother

    Not news

    I think that the evidence for SHA1 being broken has been around for a long time. It may look like a lot of computing is needed but there are a few organizations that have more than enough to do this anytime.

  25. Matt Bucknall

    History repeating...

    The same debate has already been had over MD5... What's the point in repeating it? Move to SHA-2/3, get on with life.

    1. Kiwi

      Re: History repeating...

      The same debate has already been had over MD5... What's the point in repeating it? Move to SHA-2/3, get on with life.

      What about older documents, like that contract you signed 8 years ago? What about documents signed with SHA-2 today, in 10 years time? Maybe not everyone can dismiss this so easily and "get on with life"?

      1. Tomato42
        Boffin

        Re: History repeating...

        XAdES-A, PAdES-A and CAdES-A exist and are designed with exactly that issue in mind.

        The solution is to timestamp the whole document (including old timestamps) using new cryptography before the old crypto is deemed obsolete.

  26. Anonymous Coward
    Anonymous Coward

    SHA1 in Git

    Nitpick: Linus himself has said the security model of Git doesn’t depend on the hashes being cryptographically secure. That's what signed commits are for.

  27. pkoning

    "...have managed to alter a PDF without changing its SHA-1 hash value", says the article.

    WRONG. The actual work is "have managed to generate a pair of PDF files that have the same SHA-1 hash". That is a very different statement. Both are a security concern, but the security implications of a collision pair are quite different (less severe) than the implications of a hypothetical "alter a PDF without changing its hash".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like