back to article Google open sources very slow compression algorithm

Google has open sourced a new compression algorithm called Zopfli that it says is a slower-but-stronger data squasher than the likes of zlib. The product of Googler Lode Vandevenne's 20 per cent time, the day a week Google allows its staff to work on side projects, Zopfli is said to reduce files to sizes 3.7–8.3 per cent …

COMMENTS

This topic is closed for new posts.
  1. MacroRodent
    Headmaster

    It is even slower than you say

    > The catch is that it takes one hundred per cent longer to do so.

    Actually, the web site says it is ~100x slower, not the same thing. 100% slower would be only twice as slow.

    1. Silverburn

      Re: It is even slower than you say

      Indeed.

      Also this: 3.7–8.3 per cent smaller than its rivals (PDF).

      So...it's 100x slower, for only a 8% decrease in size. Hardly seems worth it, really, given storage is hardly a big issue these days, even on phones.

      Maybe he should direct his obvious skills into some new forms of real-time encryption/decryption, which would be more relevant today?

      1. Graham 24

        Re: It is even slower than you say

        Storage isn't the issue - it's about reducing transmission times / bandwidth usage, not storing the content once it's arrived at where it's going.

      2. James Hughes 1

        Re: It is even slower than you say

        And Silverburn wins today's prize for 'best comment without comprehending the article'

        1. Silverburn
          Facepalm

          Re: It is even slower than you say

          Actually, I understood it fully. But maybe I could have made my point clearer.

          Given the meagre 3-8% reduction in size, the disproportionate increase in CPU (and thus battery life) to obtain it will not mean any significant advantage for the consumer,

          It might make the carriers happy but even then, 3% (even if *every* device used this new algo), will hardly dent their year-on-year data increases, and 8% probably only works on specific scenarios.

          The consumer will hardly notice the benefit on local storage, given the "growth" of the phone's OS and application footprints will probably wipe out any benefits there.

          So given the real world useage, chances are, all the user will see is a massive increase in battery drain, with no real benefit, in *either* local storage or improved data transmission performance.

          Far better to improve security on the devices, as this will have immediate benefits to the consumer, given it's likely the smartphone will become a significant attack vector for hackers in the coming years. Preferably one with minimum CPU requirements.

          1. Timmay
            FAIL

            Re: It is even slower than you say

            You just showed you didn't understand it fully - the intention is this compression would be used only on the server side, and the decompression is no more taxing than existing algorithms (the same in fact), so unless you're running your servers on batteries, this would not affect battery life in any way.

          2. stanimir

            Re: It is even slower than you say

            @Silverburn,

            The client enjoys less CPU cycles and less bandwidth - it's the server/compressor that has to do way too much work for little benefit.

            As for storage -bzip2/xzip is the way to go if you wish to save space (bzip2 is 20% better than zopfli on enwik8 test... and orders of magnitude faster)

            1. Paul Shirley

              Re: -bzip2/xzip is the way to go

              On a completely arbitrary and wholly inappropriate dataset, using 7z, lzma2 did 2.8% better than bz2.

              Neither of them will decompress with zlib and there's no realistic chance of updating enough of the worlds browsers with alternatives in a hurry - I imagine Microsoft would resist supporting bzip2 with a passion! We have universal zlib because PNG requires it (or a compatible implementation) and the permissive licence makes using real zlib a no brainer. Even for Microsoft ;)

              There's no killer application to drive bzip, lzma2 or any other advanced compression. I expect someone to improve the compression speed pretty rapidly so it probably doesn't matter anyway.

              1. stanimir

                Re: -bzip2/xzip is the way to go

                Neither of them will decompress with zlib

                @Paul,

                I quite realize that (having 1st hand experience porting zlib to java, java lacks Z_SYNC_FLUSH support - lame as it gets). However, the idea is that even if one browser starts providing support for a better algorithm and there is a mod for apache/tomcat/nginx/whatever it will be automatically available.

                The use of compression is quite flexible w/ http and the user agent (browser) can announce any supported one, it's up to the server to decide. It could be a good selling point and it doesn't require a lot of work actually.

                Also, LZMA SDK is placed in the public domain. So there is no issue w/ the licensing terms.

                Btw I posted results about lzma (xzip) and the compression level is noticeably improved compared to bzip2 but the cpu cycles are increased an order of magnitude too - still a lot better than the "new compression".

                1. Paul Shirley

                  Re: -bzip2/xzip is the way to go

                  @stanimir: I suspect without IE support it would be as successful as WebM/P, JPEG2000 and countless other badly supported file format changes have been :(

                  Be nice to see exhaustive search applied to more compressors, to see how far they converge on the same 'perfect' compression ratio. Or more immediately, all the PNG files on web recompressed, though the streamability requirement might limit the gain.

            2. Eddie Edwards
              Facepalm

              Re: It is even slower than you say

              >> bzip2 is 20% better than zopfli on enwik8 test

              Wat? So Google have made a 100x slower compression that is already beaten by BWT from 20 years ago?

              I guess they could add BWT and make it better, then? :)

              1. stanimir

                @Eddie Edwards -- Re: It is even slower than you say

                The idea is that the compression works w/ deflate/inflate that's readily available on every browser - so there is no need to change anything on the client side

                On a wimpy Opteron 2372 (@2.1GHz) -- benchmark on sizes/speed vs

                corpus - 100000000 enwik8

                gzip -9 36’445’248 enwik8.gz (~10sec) - matches the PDF results

                bzip2 29’008’758 enwik8.bz2 (~17sec)

                xz 26’375’764 enwik8.xz (123 sec) [lzma -- the algorithm is just better]

                Zopfli 34’995’756 (didn't wish to way 15+ minutes to complete, data from the linked PDF)

          3. Chris007
            Trollface

            Re: It is even slower than you say @silverburn 09:58

            twice in one day - care to go for a hat trick of not understanding the article replies

            1. Anonymous Coward
              Anonymous Coward

              Re: care to go for a hat trick

              He'll be keeping quiet and hoping nobody notices.

            2. david 12 Silver badge

              Re: It is even slower than you say @silverburn 09:58

              there are no stupid questions.

              Only moronic commentards who have nothing better to do than fill up the storage space, cpu cycles, and bandwidth with content-free comments trying to indicate how much more clever they are than other commentators.

  2. Jamie Jones Silver badge
    Thumb Down

    Benchmarks....

    No testing of bzip2 or xz in the benchmarks...

    1. This post has been deleted by its author

    2. Michael Wojcik Silver badge

      Re: Benchmarks....

      No testing of bzip2 or xz in the benchmarks...

      Because. They. Are. Not. Deflate. Compressors.

      Good lord. If this wasn't clear enough in the article, it's frickin' crystal in the (very short) paper.

      The point of this exercise is to produce a Deflate-compatible output that's smaller than what the best current Deflate compressors produce.

      And really, there's very little point in testing against bzip2, other than its relative popularity in the obscure-compressor sweepstakes. The Burrows-Wheeler Transform, while clever, is really nothing more than a roundabout way of constructing a Hidden Markov Model, and consequently a BWT-based compressor is just a less-successful variant of better HMM-based compressors, such as those implementing some version of PPM. The reigning lossless compressors, for minimal output size on English text corpora, seem to be PPM or hybrid ones that use multiple predictive models.

  3. ratfox
    Happy

    Heh… Zopfli

    It means little jar…

    1. frank ly

      Re: Heh… Zopfli

      It can't be called a .jar file (which is a shame), so I hope they call it a .zop file.

  4. stanimir

    zlib compression

    By default deflate (the zlib compression) does not use maximum compression nor max memory for dictionaries. -9 would be the max compression but not max memory

    Also gzip uses crc32 that's practically unnecessary over tcp, so they should have tested via simple deflate.

    80times as cited in the pdf is just too slow for few percentages for files in sizes of megabytes. The deflate algorithm is essentially 16bit as well. When transferring such files they would be compressed w/ better algorithms no necessary compatible w/ inflate on the client side. Since "web content" like images and video that are large enough are already compressed internally and their is nothing much to gain.

    @jamie jones

    bzip is not "inflate" compatible on the decompression end.

    .

    1. Jamie Jones Silver badge

      Re: zlib compression

      "@jamie jones

      bzip is not "inflate" compatible on the decompression end."

      Thanks for pointing that out, I missed the bit about it being compatible with current decoders.

  5. heyrick Silver badge

    "where any reduction in the amount of time radios work means better battery life and lower bills for data consumption."

    This from the company wanting to push all sorts of advertising on their mobile platform. D'you think in-app adverts and such don't consume part of the allocated bandwidth?

    1. Field Marshal Von Krakenfart
      Boffin

      This from the company wanting to push all sorts of advertising on their mobile platform. D'you think in-app adverts and such don't consume part of the allocated bandwidth?

      Always follow the money tails and see who gains from this. Either google just want to make more room for the ads, or they are trying to develop their own compression algorithms so that they don't have to pay royalties to anyone. Perhaps Google are going to serve all google searches compressed in this format, and mobile phone makers will have to pay google to use the decompression algorithm???? Either way I'm sure google are not doing this for the benefit of the users.

      The fact that this "new" algorithm is x100 slower than other compression methods may explain why it is not used in other compression algorithms, either that or the chocolate factory is following the advise of Mark Weatherford and have employed people who do do not understand the problems of combinatorial explosion in search functions.

      1. Paul Shirley

        @Field Marshal CantReadALot

        Did you even read the article? Is Slashdot having a holiday - we seem to be overrun with the 'can write but cant read' halfwits usually seen there.

  6. heyrick Silver badge

    The question I have is,,,

    ...how much processing power/time is necessary to decompress?

    1. stanimir
      Holmes

      Re: The question I have is,,,

      From the linked PDF -> the decompression is the same (inflate) as any other algorithm compared to. Probably even slightly better due better cache hits (smaller size).

  7. Lee D Silver badge

    Could be useful.

    I see a market for a device that sits between a webserver and the Internet and applies Zopfli compression to HTTP replies transparently (gzip-compatible, and gzipped web pages are extremely common to save bandwidth), and caches common results just like a Squid reverse-proxy or similar. Make it an embedded device that does nothing by Zopfli compression, implement the algorithm in an ASIC or similar embedded chip (so the speed difference would hardly begin to matter) and there's no reason it can't lop off another 8% off your data transfer for practically zero cost if you're large enough.

    No changes required to your viewers - gzip compressed content is bog-standard - no changes required to your webservers, and an 8% saving in traffic sent externally for the cost of a set of Zopfli boxes on the outgoing lines. I can see where someone like Google, Facebook etc. could make a large profit on their investment there, if they have big enough lines.

    (Strangely, the Google homepage does not seem to use gzip compression, but BBC and Facebook appear to).

    Probably not for the likes of us mere mortals, though.

    1. stanimir

      Re: useful

      and an 8% saving in traffic sent externally for the cost of a set of Zopfli boxes on the outgoing lines

      The quoted 8% would be applicable to large enough static text files that are actually cacheable.

      If the file is cacheable - the client would likely fetch it once only anyway. So there you have it - if you have huge, huge javascript file Zopfli can be of a tiny use (and you can just compress it yourself anyways instead of using proxy). "Tiny" - b/c the clients will use their own cached version instead of downloading it on each request.

      Facebook and BCC would have dynamic content mostly, so they need online compression. Zopfli compression algorithm is way way way too slow for online compression. Deflate is fast enough, though.

      Side note: sites shall use 'deflate' not 'gzip' - gzip comes w/ crc32 and extra header.

      For reference compared to LZMA algorithm Zopfli still ~25% worse in size and still slower processing.

      To save bandwidth some browsers shall start supporting LZMA and announce that in Accept-Encoding header - the server would pick the best from gzip/deflate/lzma and server the content w/o wasting tons of cycles and bandwidth on a dated 16bit compression.

    2. Ian Yates

      Opera Turbo, then?

  8. chrisbrown

    Its a standard

    I imagine zip is popular because it is in the public domain. But using a file format - on the internet - where the header is at the end of the file, is really dumb.

    1. stanimir

      Re: Its a standard (zip format)

      The CEN is at the end of the file to allow easier adding of new files w/o necessarily decompress/compress of the entire archive

      ...but zip file format has nothing to do w/ the article.

    2. Field Marshal Von Krakenfart

      Re: Its a standard (ZIP)

      I don't think the zip standard is public domain.

    3. Roland6 Silver badge

      Re: Its a standard

      >But using a file format - on the internet - where the header is at the end of the file, is really dumb.

      Depends upon the content and the intended use, remember the checksum is always at the end, so it is only when the recipient has received the entire communication will they know if the transfer was corrupt or not. Hence for system updates, for example, it might be wise to wait to receive the entire transmission before starting to process it...

  9. Zmodem

    7z/7zip is opensource, 7z is better then rar and uha on all filetypes

  10. Irongut

    3% better than zlib? Pathetic!

    7z is still king.

    And look it's open source too!

    1. Michael Wojcik Silver badge

      Re: 3% better than zlib? Pathetic!

      Oh, the ninnies are out in force today.

      7zip supports a variety of compression algorithms. 7zip with deflate produces output larger than Zopfli on the tested corpora. 7zip with LZMA is not compatible with Deflate, which is the whole goddamned point.

      Incidentally, for large English-text corpora, there are encoders which produce smaller output than LZMA. Do a little research.

  11. Tom 13

    Re: systems that cram data into wonderfully tiny bundles.

    But has anybody ever managed to recreate the neutron packing density Novel achieved on their 3.5 inch driver disk for Netware 3.12? You put that thing in and could leave the top floor of the World Trade Center (cause it was still there back then) walk down to the corner sit at the counter, order a cup of coffee and a danish, finish both, and then come back as the disk was just about done unpacking.

  12. John Savard

    Developing World

    The first thing I thought was that this algorithm would likely be very useful in meeting the needs discussed in the earlier article about the Mobile World Conference in Barcelona.

    For myself, as any extra compression reduces redundancy, I'd see a use for this with cryptography.

  13. John Savard

    Slow

    When I read the article, it seemed they were saying it was 100 times slower to compress, but only 2 or 3 times slower to decompress, and so the latter was seen as inconsiderable, while the former restricted it to compress once, decompress many applications.

    1. Michael Wojcik Silver badge

      Re: Slow

      Same decompression time, on average, as any other Deflate implementation. Read the paper, John - it's only three pages.

  14. Anonymous Coward
    FAIL

    100x increase in CPU use = a lot of baby polar bears falling into the sea

  15. Domino
    Thumb Up

    How big are the java scripts on youtube? Say 100k? So saving at least 3k on millions (billions?) of transfers per day seems a small trade for a one off cost of 100x compression time. Any extra power / time used will be saved many times over daily, not just for Google, but across the internets routers. The pdf says it's deflate compatable, so it sounds like the only change needed is server side to use the compression. The server would need to be smart enough to know what's static content and what's not, which is why I used js as my example - easy target to implement :)

    1. stanimir
      Meh

      @domino

      I Already mentioned it -> the javascript would be downloaded once only, as it'd be cached by the user-agent.

      Simple rule - if a resource can be served by that algorithm it'd be done once only - also the algorithm will not be so "good" on smaller resources.

      Compared to the video sizes - 8k (optimistic ratio of your quote 100k) once is less than a drop in the sea.

  16. Michael Wojcik Silver badge

    The algorithm

    Took a quick browse through the code, and it looks like Zopfli is pretty much stock Deflate, augmented with more testing to determine the best places to split blocks and recalculate the dynamic tree. So it's pretty much exactly what you'd do if you wanted to improve the compression ratio of Deflate: put more work into figuring out the optimal Deflate representation of the input.

    So not groundbreaking, but a perfectly solid piece of incremental improvement.

    Still, it's fun to see yet another bit of performance squeezed out of Lempel-Ziv '77. That algorithm is older than most of the people using it.

This topic is closed for new posts.