back to article Never mind your little brother - happy 10th birthday, H.264

As technology advances, video codecs come and go naturally enough. But while H.265 is still waiting in the wings, we should pay tribute to the groundbreaking H.264, which is a decade old this month. H.264 is possibly not the snappiest or most memorable name, but even 10 years on it remains an important video coding standard, …

COMMENTS

This topic is closed for new posts.
  1. FartingHippo
    Facepalm

    My. Brain. Hurts.

    I vaguely, in a hand-wavy kind of way, knew that information in a frame that didn't change from the previous one could be 'compressed'. Meaning it didn't have to be transmitted again, just the location of the unchanged pixel. And thus digital television worked!

    However, it appears my understanding was like thinking quantum mechanics is all about rolling dice...

    Fascinating article. Cheers John.

    1. Lee D Silver badge

      Re: My. Brain. Hurts.

      For years, I spent my early coding days trying to "beat" ZIP compression in a similar way. If we have some data, can we try to predict what comes and somehow send the difference between our predictions and reality. The idea was started when I learned a technique in maths class to find a polynomial that would find any sequence of numbers you give it (which I used for everything from my "tri-square numbers" A-level project - finding a pattern of the numbers which are both triangular and square - which failed abysmally through lack of time and insight but got me an A through sheer determination and range of techniques even though I'd missed the obvious).

      Of course, I was too dumb at the time to notice that most file data isn't all that predictable (there could be an argument made that you could predict the next word in a document, or the next part of a clipart image, but it's inherently poor because it's TOO unpredictable). I spent so much time trying to squeeze some form of compressed data and/or a partial "correction" vector through to a decoder that it cost me far more than I'd ever have saved, even in the time of floppies and 33K modems, and 90% of the time my data turned out to be larger than the original because the "corrections" for that application had to be lossless.

      Maybe I should have applied it to photographic images back then, and I could be holding a handful of patents. Fact was, though, that full-colour images were rare and huge and hardly anybody had hardware capable of displaying them at that time (which reminds me: anyone remember an old DOS image viewer program called "display"? I remember it but it's IMPOSSIBLE to find via Google - they obviously didn't think that name through! - and I remember was that it was called Display has a DOS EDIT-blue screen, displayed full colour images through VESA modes, had a million and one options and had someone's name on it (probably the author but I'll be damned if I can find them).

      With video, obviously the jump to temporal encoding springs up and makes things easier (I believe that, pretty much, the first image from a GOP is basically a JPEG, which was a problem already solved), not that I could claim to be able to do it myself with any efficiency.

      Anyone else remember the old full-page adverts for the early MPEG cards that said you could store 3 minutes of video footage on a floppy? Seemed SO impressive back then but, hell, I have 1.5Mb MPEG's that - if you knocked the resolution back to what we had back then - could play for longer than even that.

      1. BoyModernist
        Thumb Up

        Re: My. Brain. Hurts.

        I used to use Cshow. It fit on a bootable floppy and used VESA modes to display pretty much any image on pretty much any PC. That and Vernon Buerg's List are among my all-time favourite programs.

      2. Suricou Raven

        Re: My. Brain. Hurts.

        I dabbled in compression software too. Only one of my programs turned out to be any good, a block-level deduplicator that functions as a drop-in replacement for a compressor. It only works on data with block-aligned duplications - a constraint that happens to include media images and virtual machines. If you need to archive a VM, it'll do wonderfully. Takes ages though.

      3. jake Silver badge

        @Lee D (was: Re: My. Brain. Hurts.)

        For that kind of thing, I'd look at the Simtel archive, now sadly offline. Try this link:

        http://archive.org/details/cdrom-simtelcoasttocoast11994

      4. Oscar Pops
        Thumb Up

        Re: My. Brain. Hurts.

        There's a DOS app called display here:

        http://freesoft.cyberside.net.ee/FreeSoft/graphics.htm

        I never used it so don't know if it's the one you're looking for?

      5. Michael Wojcik Silver badge

        Re: My. Brain. Hurts.

        For years, I spent my early coding days trying to "beat" ZIP compression in a similar way. If we have some data, can we try to predict what comes and somehow send the difference between our predictions and reality.

        This is precisely what predictive encoders like PPMd do, and they do beat Lempel-Ziv-style adaptive-dictionary encoders (such as zip's Deflate), in terms of compression ratios, for most general-purpose applications. They basically create a Hidden Markov Model that predicts the next symbol based on the prior stream, and then the encoder sends corrections. It's the same idea as the time-domain subtractive encoding discussed in the article. (The BWT, of bzip2 fame, does something similar, through a roundabout fashion.)

        Maybe I should have applied it to photographic images back then, and I could be holding a handful of patents.

        Maybe, but you would have had to do some serious research. The general idea has been around for a while. The Cutler / Bell Labs DPCM patent was filed in 1950. Of course many a patent has been issued in this area since, but those resulted from dedicated research in the area, since the basic technique of differential coding and mutual prediction for compression was well-documented.

  2. Anonymous Coward
    Anonymous Coward

    I wonder who that could be?

    "After all, what is more cynical than the broadcaster that uses a high bit rate when the HD service is introduced and then lowers it after people have bought their sets? It's deplorable behaviour but then again, there are some UK broadcasters that are no strangers to scandal."

    Nope. No idea. *cough*Sky*cough*

    1. Anonymous Coward
      Anonymous Coward

      Re: I wonder who that could be?

      I was thinking of a more recent scandal. Of course OFCOM don't help when they sell off broadcast bandwidth for supposedly better economic causes, and a certain broadcaster believes it has to maintain (crappy) quality parity between a bandwidth-constrained terrestrial service and a less constrained satellite service.

    2. DrXym

      Re: I wonder who that could be?

      Sky can only utilise the bitrate that is available on the satellites. The 28.2E cluster of satellites has become severely congested in recent years as more channels were piled in and there was a clamour for HD channels too. Fortunately congestion is easing up as Astra 1N and 2F have stepped in and 2E and 2G are planned to launch soon too.

      I think the best thing Sky (and Freesat) could do is put a very clear end of life roadmap for MPEG-2 broadcasts for channels hosted through their respective services, e.g. five years hence and all MPEG-2 completely ceases. That's plenty of time to get rid of the older boxes and it means that channels can move to AVC and save bitrate for the same picture quality, or increase picture quality, or go HD. Some channels even broadcast in SD and HD so the SD channel could get the chop.

    3. Tom 38

      Re: I wonder who that could be?

      Broadcasters lowering bitrates is not always the issue it is made out to be.

      The BBC were accused of this, but they were simply using an encoder tuned for a particular Constant Rate Factor (CRF), which attempts to encode at a particular quality level, not a particular bitrate. The encoder was replaced, and similar scenes were encoded much more efficiently with the new encoder, which reduced the required bit rate for a particular quality factor. The new stream had a lower bitrate (9Mb/s) than the old stream (17Mb/s), but (allegedly¹) the same quality.

      Course, its not always like this. ITV are notorious for cramming in their channels in to any bandwidth (I know its MPEG-2, but look at ITV-4 on Freeview, the bitrate is so low you can almost see artefacts on every frame).

      ¹ Lots of people disagreed, there was lots of arguing, eventually the encoders were replaced again, this time with VBR encoders that used, on average, 9 Mb/s, but could be bursty in high action segments and use up to 17 Mb/s briefly.

      1. Anonymous Coward
        Anonymous Coward

        Re: I wonder who that could be?

        On my Dell D400 laptop for iPlayer BBC4 there are regular "fuzzy block" artefacts when a scene changes or there is constant movement like trees or water. I have wondered if that is due to the laptop's 2ghz Centrino cpu. My broadband is 12mbps and big Microsoft update downloads achieve 8mbps.

        1. Tom 38

          Re: I wonder who that could be?

          No, the blockiness will be down to the bitrate used for iplayer being quite low. The 'HD' iplayer streams are around 832x468 in size, and a bitrate of around 1400 Kbit/s.

          For blockiness, CPU doesn't come in to it at all, internet download speed only matters if you cannot download the file fast enough, which would present as pausing/buffering and not blockiness - although if your download speed is not sufficient, iplayer could switch you to an even lower bitrate stream, which would of course appear blocky more often.

          1. Richard 22

            Re: I wonder who that could be?

            No - the HD iPlayer streams are 1280x720 at around 3500kb/s (not all programs have this stream). You're thinking of the highest rate SD streams. Both are rather lower than you might want though.

  3. SD24576
    Thumb Up

    Fascinating, thank you.

  4. frank ly

    MPEG-3, .mp3 ?

    Are audio .mp3 files related to some aspect of the MPEG-3 standard? I have a vague memory or reading that they are. Lately, I've seen audio .mp4 files and am wondering how much better/different they are.

    1. Joerg

      Re: MPEG-3, .mp3 ?

      .mp3 is just MPEG-2 Audio Layer 3 (although it was born as a MPEG-1 stream extension).

      All the work for MPEG-3 then became MPEG-4. So MPEG-3 officially never existed.

    2. Tom 38

      Re: MPEG-3, .mp3 ?

      MP3 is actually short for "MPEG-1/2 Audio Layer 3", and a ".mp3" file describes both the codec the audio has been encoded with and the file format.

      ".mp4" on the other hand is simply a generic container as described by the MPEG-4 Part 14. It's a container format that can contain all kinds of data, audio data, video data, subtitles, pictures - all kinds of things, all kinds of codecs. The file suffix may be ".mp4", but there are lots of other file suffixes used for the same format (m4a, m4v, m4b...). The audio is typically AAC, or some variant like HE-AAC, but can be almost anything, including things like Apple Lossless, or even MP3.

      .

  5. BigNose
    Coffee/keyboard

    I spent a number of years with a Video Conferencing company, so needed a lot of understanding of H.264

    Whenever I was at home and watching the likes of Euro football on Channel 5, I used to habitually complain about the lack of the footbal visibility as it drifted against the crowd background.

    Football in HD sez Channel 5, but the grass looked horrible too and all the time I knew it was about the bandwidth, probably due to streaming it back over the landlines at the time, across europe to UK.

    I was desperate for an indicator for any transmission showing the BW used.

    All the time we use IP, BW will be the real indicator of quality.

    Nice article.

    1. hayseed

      Security and sports cameras should NOT use excessive blurring of objects in motion to save bandwidth!!

  6. Anonymous Coward
    Anonymous Coward

    H.264 was not required for HDTV

    H.264 was NOT required to enable HDTV transmission, as almost every TV and TV station in the United States demonstrates - ATSC is MPEG2.

    Would it have been nice if ATSC were H.264? Yes - I could then directly stream from my Mythbox to my tablet. But as the 54" TV in my video room demonstrates every time I tune in a local HD station, MPEG2 is perfectly capable of enabling over-the-air transmission of 1080i60 content.

  7. Anonymous Coward
    Anonymous Coward

    I'm reminded of this description of an inertial guidance system:

    The aircraft knows where it is at all times. It knows this because it knows where it isn't. By subtracting where it is from where it isn't, or where it isn't from where it is (whichever is the greater), it obtains a difference, or deviation.

    The Inertial Guidance System uses deviations to generate error signal commands which instruct the aircraft to move from a position where it is to a position where it isn't, arriving at a position where it wasn't, or now is. Consequently, the position where it is, is now the position where it wasn't; thus, it follows logically that the position where it was is the position where it isn't.

    In the event that the position where the aircraft now is, is not the position where it wasn't, the Inertial Guidance System has acquired a variation. Variations are caused by external factors, the discussions of which are beyond the scope of this report.

    A variation is the difference between where the aircraft is and where the aircraft wasn't. If the variation is considered to be a factor of significant magnitude, a correction may be applied by the use of the autopilot system. However, use of this correction requires that the aircraft now knows where it was because the variation has modified some of the information which the aircraft has, so it is sure where it isn't.

    Nevertheless, the aircraft is sure where it isn't (within reason) and it knows where it was. It now subtracts where it should be from where it isn't, where it ought to be from where it wasn't (or vice versa) and intergrates the difference with the product of where it shouldn't be and where it was; thus obtaining the difference between its deviation and its variation, which is variable constant called "error".

    1. Suricou Raven

      Re: I'm reminded of this description of an inertial guidance system:

      It's even funnier in audio form.

      http://birds-are-nice.me/CANary/SHA1/37b44697d2eb92315ecada158824171b6797552a/audio/x-wav/guidance.wav

    2. Michael Wojcik Silver badge

      Re: I'm reminded of this description of an inertial guidance system:

      Hmm. The mysterious Reg downvoters strike again. We are a cranky bunch, aren't we?

  8. IGnatius T Foobar

    MPEG is a patent cartel

    Technology aside, MPEG is a patent cartel that impedes, not promotes, the progress of digital video.

    1. Tom 38

      Re: MPEG is a patent cartel

      No it isn't. The license fees payable are only necessary if commercially distributing video, and even then they are trivial, and having the best technology disseminated by all parties is the best possible outcome.

      Going back to each platform having its own proprietary-yet-massively-identical-to-h264 codec would be a fucking nightmare. Do you want a world where Sorensen Spark is still relevant?

      1. Vladimir Plouzhnikov

        Re: MPEG is a patent cartel

        If you want a cartel - go to DVD FLLC or BD ALO.

        It is those people who force manufacturers to put in their products and you to pay for malware intended for the sole purpose of inconveniencing you and infringing your rights when you buy any DVD or BD players or discs...

      2. Michael Wojcik Silver badge

        Re: MPEG is a patent cartel

        Do you want a world where Sorensen Spark is still relevant?

        Isn't it still the codec for Adobe Flash? I know many people feel Flash is no longer relevant, but there's still a lot of it on the web, despite the efforts of His Late Jobsness et alia.

  9. Anonymous Coward
    Anonymous Coward

    Kudos to the Reg...

    ...For having experts talk about their work... It would be nice to see more quality articles like this.

    P.S.

    Sometimes it can be embarrassing recommending the Reg to colleagues for tech news on days when the front page has stories like this: "CRUNCH: 'Drunk' chap cuffed in high-speed car nookie prang rumpus".

  10. This post has been deleted by its author

  11. Parax
    Trollface

    Actually

    I think you'll find it's Pronounced H462

  12. John Smith 19 Gold badge
    Unhappy

    I miss the dayxs when I could flip between 3 programs simultaneosuly.

    Today it takes about 5 secs to switch between 2 channels.

    I wonder how frequently they send those re-synch frames in the data stream?

    1. Tom 38

      Re: I miss the dayxs when I could flip between 3 programs simultaneosuly.

      I wonder how frequently they send those re-synch frames in the data stream?

      This was talked about briefly in the article, the stream is arranged into GOP, Groups Of Pictures, a list of frames. Each frame can be of I, P, B. I-frames are the entire picture, P-frames are forward predictive and B-frames are bi-directionally predictive frames, ie they add to the previous/next frame. If you haven't decoded the previous frame, then it can only show the difference, and the video looks "corrupted".

      Obviously, I-frames take the most space, P-frames, being one way predictive, take up less space, and B-frames take even less space, so for optimum quality at a given bandwidth, you want as few as possible I-frames, less P-frames and more B-frames. However, as you point out, you need as many I-frames as possible to make seeking/scrubbing not appear corrupted.

      A typical GOP may have something like this structure "IBBPBBPBBPBBI". The next I-frame is 12 frames after the first I-frame, and this is the GOP length (N). The maximum distance between 2 reference frames (like an I or P frame) in the GOP is 3, which is the GOP size (M), so this GOP would be described as N=12 M=3. The GOP length is also called the Intra Period.

      Anyway, all of these things are configurable by the encoder. So the answer to your question is, "as infrequently as the encoder thought they could get away with". You want enough I-frames that you can seek easily, you want as many P/B-frames as possible to keep the bitrate down. Pay your dollar, make your choice.

      1. Vladimir Plouzhnikov

        Re: I miss the dayxs when I could flip between 3 programs simultaneosuly.

        While DVD program streams are limited by DVD specs to 12 (PAL) or 15 (NTSC) frames per GOP, the broadcast video uses transport streams with GOPs as wide as 40 frames or more. One frame in PAL lasts 0.04 sec, so 40 will mean 1.6 seconds between I-frames.

      2. Lee D Silver badge

        Re: I miss the dayxs when I could flip between 3 programs simultaneosuly.

        Though the technical details are interesting, with modern TV's dual, triple and even quad-decoders, etc. why can it just receive the next / previous programme at all times and have the i-frames ready to go from? Teletext used to be the same kind of deal, so TV's started to cache it (and I remember my first WinTV card with cached teletext - fabulously fast).

        I know you can't "guess" what the viewer is likely to view next, so you can't cache everything, but what's to stop one tuner showing the program, another getting the previous channel still (so you can flick between two channels without jumps), and the others getting your "favourite" program and/or the one that the EPG is currently looking at the schedule for.

        Though I understand we can't have it all, with modern tech and throwaway DVB chips, it's still ridiculous that TV manufacturer's are pushing voice recognition and TV user "profiles", but nobody has bothered to make channel switching faster.

        1. P. Lee

          Re: I miss the dayxs when I could flip between 3 programs simultaneosuly.

          +1

          You only need to cache from the last keyframe and you only need to one tuner per multiplex. That must be a miniscule cost on a 40" screen.

          Add some smarts (do stats on most often used multiplexes or look at whereabouts the user is in the EIT) and you can probably cut the number of tuners dramatically.

          Perhaps its all moot with people recording everything rather than watching live.

      3. Steve Medway

        Re: I miss the dayxs when I could flip between 3 programs simultaneosuly.

        If you want instant channel hopping it's perfectly feasible today whilst being able to simultaneously time shift + record every TV channel being broadcast at the same time. Cover all the muxes with a tuner, simples and nowadays cheap.

        EyeTV on the mac has been always been decode all channels on a mux since back in the PowerPC era.

        xbmc + linux is one alternative but it's less friendly to setup if you don't have a mac. Five dvb-t + one dvb-t2 tuner is all that required for the every channel in the UK. From experience it'll behaves a lot nicer on a pc/mac with an ssd for the live buffer.

        Why on earth YouView didn't go this route along to reduce net bandwidth requirements I'll never know, oh hang on far too 'free' a solution for UK TV Cartel to stomach.

      4. Michael Wojcik Silver badge

        Re: I miss the dayxs when I could flip between 3 programs simultaneosuly.

        You want enough I-frames that you can seek easily, you want as many P/B-frames as possible to keep the bitrate down. Pay your dollar, make your choice.

        Of course for most folks that's pay your dollar, live with someone else's choice. It's not like the broadcasters and cable-TV companies are going to let their subscribers dictate the GOP length or composition.

        Though personally I can't muster too much discontent over their choices. It's just entertainment; if I decide it's no longer a good value, I'll cancel my subscription. There's no shortage of books to be read.

  13. mark l 2 Silver badge

    I remember when Virginmedia (think they were still NTL then) first started broadcasting HD streams they were using MPEG2. I think they have now moved over to H264 with the TIVO boxes but the older V+ boxes are still using MPEG2 for HD i think.

    1. Frank Bough

      Nothing wrong with MPEG2 if you've got the bandwidth.

  14. Anonymous Coward
    Facepalm

    BBC HD

    " After all, what is more cynical than the broadcaster that uses a high bit rate when the HD service is introduced and then lowers it after people have bought their sets? "

    Ah the BBC HD services. Once a benchmark standard, they slowly rotted away the quality to please the bean counters.

  15. asdf
    Trollface

    congrats h264

    Only with the latest TV technology have you finally showed why you are better than xvid (MPEG-4 ASP) when done by an expert. Only took a decade.

  16. The Mighty Spang
    Mushroom

    re: patents and licencing on camera

    really interesting and often missed thing is if you are using something like a canon 5dmk2 or even panasonic eng cameras like 151 and 371 (and all others i would imagine, those are just ones where i've looked the manual) and filming with h264 or avc.

    you are ***NOT*** licenced to use the recording for commercial work.

    unfort i can't seem to copy and paste out of the pdfs, so check 5d2 english manual pg 241 - "about mpeg-4 licencing' or the panny hpx-370 manual pg 6.

This topic is closed for new posts.