back to article ITU-T wants video sizes to halve again by 2020

The International Telecommunications Union wants researchers to get busy on new video compression codecs, setting an ambitious target to double their squeezing compression power by 2020. H.264 is probably today's most-used video codec, but the ITU-T's asking for experts to submit proposals to double-down on H.265 codec, also …

  1. Steve Davies 3 Silver badge

    The only winners here

    will be the Patent Lawyers.

    There are lots of patents out there that describe compressing a video signal to reduce the bandwidth during transmission with the all important "with a Computer" and "Over a network"

    These owners/troll will want their cut of the money.

    Why do I say this?

    Even Google is having a hard time with getting VP9 accepted and there have been reports of patent suits filed against Google despite their claims that their thing is patent free.

    The ITU might want this but the lawyers will have a big say in stopping/delaying/encumbering any new codecs etc.

    That is the sad state of play as I see it.

  2. John Latham

    End game

    As I understand it (dimly) this is basically a CPU/bandwidth tradeoff.

    At what point (resolution? local processing power?) is the compressed representation of the source components of scene smaller than its compressed high-def rendering?

    Will a future video stream read like "fat white male in scruffy clothes walks into an empty 1950s American-style bar and orders a drink from a platinum blonde waitress in a short black dress', with the local device left to render the scene as it sees fit?

    i.e. is the limit case of video compression just telling a story?

    1. Martin an gof Silver badge

      Re: End game

      is the limit case of video compression just telling a story?

      It's ancient wisdom, but as they say, the pictures are always better on radio :-)

      There's another crossover point, I think. Reducing the bandwidth (streaming or storage) is always good, but given that the availability of both streaming and storage bandwidth is constantly increasing, does there come a point where it's just not worth bothering? Where the computational requirements of squeezing another 5% reduction in size are outweighed by the hardware required to perform those computations?

      M.

    2. The Mole

      Re: End game

      "As I understand it (dimly) this is basically a CPU/bandwidth tradeoff."

      Yes, and no. It is a CPU+memory/bandwidth/CPU+memory trade-off, normally in encoding compression and decompression aren't symmetrical in complexity. Normally the cpu needed to decode is much less than working out the optimal encoding (which is good given it needs to be able to run in real time). Generally modern video compression works by saying 'this frame differs from these frames in these ways', in an idealised world optimal compression would allow you to reference an infinite number of different frames, in the real world player memory constraints are a big factor in constraining efficiency/

    3. 2+2=5 Silver badge
      Joke

      Re: End game

      > i.e. is the limit case of video compression just telling a story?

      I've got a new, extremely high compression system - you send just the video's title and the telly downloads it from the nearest Torrent server. No need for the rights holders to have to worry about that nasty, technical compression stuff. Just get the punter to pay for access to the URL and profit!

  3. John Smith 19 Gold badge
    Go

    Is this even possible?

    50% reduction on an already highly compressed format?

    Odds on bet this will also be "lossy" but how much of an image can you actually throw out before it's not an image any longer?

    Let's find out.

    1. Anonymous Coward
      Anonymous Coward

      Re: Is this even possible?

      See the Google AI "news" article earlier this week where you show it an 8pixel by 8pixel (maybe a bit more, but not much) picture and it fills in the rest for you at mugh higher resolution. By guesswork, and based on what you've told it about the scene (it's a face, here are some faces we saw earlier, it's a bedroom, here are some bedrooms we saw earlier).

      What could possibly go wrong, with either idea?

      1. ArrZarr Silver badge

        Re: Is this even possible?

        If you were to provide a single image of the face to the computer and have the computer understand concepts like "smiling", "glaring" all the way down to stuff like "self deprecating smirk where you've been asked to do something that you would expect somebody in your role to know but you haven't ever had to deal with it", that would probably cut down on bandwidth requirements a hell of a lot but increase the compute requirements too.

        Given that all the information is theoretically creatable, it should be possible to improve the guesswork specifically for the video.

      2. Dave 126 Silver badge

        Re: Is this even possible?

        Such an approach would require a daft amount of processing at playback, and actors in the background might look like humans but won't necessarily look like the actual actors that were present. If taken to its extreme, this approach would be akin to just telling the computer 'Dark-haired man in white t-shirt walks into a room'. It relies on the output computer already having an idea of what a 'man' looks like.

        If it could work for video, it could be useful for rapid story-boarding.

        1. HausWolf

          Re: Is this even possible?

          Thern gets eaten by a gru

          1. Ben Bonsall

            Re: Is this even possible?

            Thern gets eaten by a gru

            Easy.

            Former Swedish National Football captain's head on a plate, Cartoon villain holding knife and fork.

      3. Dale 3

        Re: Is this even possible?

        "Enhance!"

    2. Dave 126 Silver badge

      Re: Is this even possible?

      It will be a lossy format. Non-lossy compressed formats are only used for capture and editing etc

  4. Charles 9 Silver badge

    Like a suitcase.

    The suitcase is only so big and can only weigh so much. Eventually, you really can't cram further without losing too much. Is there a way to tell how close we are to that limit?

    1. Dave 126 Silver badge

      Re: Like a suitcase.

      >Is there a way to tell how close we are to that limit?

      Not really, because to do so would require a perfect understanding of what we humans perceive; it' not purely a mathematical, computational question. Our eyes can only resolve a small area of our visual field at high resolution - for arguments' sake, let's say that the DVD player had control over which part of the screen our eyes flitted over - it would then only have to render the parts of the screen our eyes flit over at maximum resolution. Obviously this wouldn't happen, but it makes that point that our brain does a lot of filling in the gaps, and that a lot of the data sent to a screen is wasted on us. The trouble is, which data is 'wasted' on us varies from viewing to viewing, and from viewer to viewer.

      So, the question becomes 'what is the human-relevant information in the video?'. It doesn't matter if we are only viewing Humphrey Bogart in grainy black and white - we are still taking in the emotional content that the director intended. Watching him slap a bad guy around isn't improved by using high res and HDR. On the other hand, a David Attenborough nature documentary would benefit.

      1. Anonymous Coward
        Anonymous Coward

        Re: Like a suitcase.

        "Humphrey Bogart in grainy black and white - we are still taking in the emotional content that the director intended. Watching him slap a bad guy around isn't improved by using high res and HDR. On the other hand, a David Attenborough nature documentary would benefit."

        Supercompression (not to be confused with Supermarionation) only really helps if the supercompression processes's design assumptions are a good match for the image content *and* the way the viewer's visual processing works.

        Anecdote is not evidence and all that, but...

        I watched A Hard Day's Night (remastered, but still black and white) in HD on a 40" screen a few weeks back. I've previously (and recently) seen it on everything from original cinema release, through Apple Quicktime (?) on CD-ROM, on DVD, and now remastered on HD. The extra detail did significantly improve the viewing experience in a variety of ways. Two of those ways were: (1) no visible quantisation in ciruclar dark (but not black) areas on walls - "pools of light" no longer had distracting steps in them, which the digital non-HD versions all did (2) previously illegible text on certain wall signs was now legible, and often added to the content and context of the story.

        That said, I really don't see how supercompression in general is going to help anyone except broadcasters (including cable), who want to squeeze more and more carp into the same (or less) bandwidth. What's increased compression done for terrestrial and cable and satellite in the UK so far? Made most channels less watchable, that's what.

        I used to know what the basic compression theory is about (including 4D compression and other such delights, including whatever the visual equivalent of psychoacoustics theory is now called), all of which rely on the validity of certain assumptions about the picture and/or the elements within it.

        If the assumptions in the theory don't adequately match the practicalities in the picture content and in the viewer, the picture *must* by definition lose detail, and lots of it. Which already matters in standard definition, and is going to matter even more in UHD, if UHDTV isn't going to go the same way as 3DTV has done.

        A wise person once said:

        “In theory, there is no difference between theory and practice. But, in practice, there is.”

  5. John Sager

    Run the Battle of Helms Deep locally

    That was a lot of AI avatars interacting in Weta Digital's CPU farm. I guess it'll take a few more iterations of Moore's Law before we see that capacity in our smart TVs.

  6. Frumious Bandersnatch Silver badge

    4d interferometry

    AFAIK, pretty much all video codecs assume that the video to be compressed is 2D and intermediate frames only take account of the difference between one frame and the next. Both are reasonable simplifications if you want something that's fast to encode or decode, but they mean that a lot of exploitable structure is ignored. Another feature common to most codecs is that self-similarity within a frame is mostly ignored, with most focus being put on motion estimation as a way to compress inter-frame differences in common cases (eg, panning, moving objects within the frame).

    If you think about algorithms that can turn images (or objects in them) into 3D approximations, this is a lot easier to do if you have a video camera attached to a vehicle (or carried) than if you present the algorithm with an unordered collection of stills of the same target from different vantage points. It's easier to reason about the relative motion of the camera between frames. It's going to be more smooth, and looking at a sequence of images it's going to be easier to divide up areas between static (modified only by relative viewpoint) and transient (moving objects passing through the frame).

    If the cost of encoding isn't so much of a problem, you could apply interferometric analysis to a sequence of images. For the relatively fixed objects, you could build up a 3d approximation of those objects and generate a pixmap to skin them. Taking a sequence of images like this might also help to sharpen the image, hence cutting down on the amount of noise, leading to better compression. You can't sharpen single images, but you can with multi-sampling over time or slightly different viewpoints. To make interferometry work, you'd have to be able to adapt to things like focus and motion blur, detecting it on the way in (and tagging affected regions per frame) and adding it on the way out.

    Videos also have various spatial self-similarities, besides the time-based ones. The most easily-exploitable option for compression is to assume that self-similar blocks will be neighbouring each other, and that's now most codecs work (mostly through compressing the palette across neighbouring blocks, AFAIK). If the codec tried representing areas as simple 3d meshes with pixmaps, then it could maintain a cache of these over an extended period. An algorithm would explicitly compress these mesh+pixmap objects based on their self-similarity. If a transient object moves across a surface, it wouldn't necessarily mean that the data about what's currently invisible due to the occlusion gets kicked out of the cache, meaning that once the transient object has passed, it should be cheap for the decoder to repair the "damage". Likewise with things like fast cuts, where the data for one bunch of frames can be re-used when the camera comes back to them a few seconds later rather than starting with a new key frame each time.

    If encoding cost is no object, then you can try to reverse engineer lighting information from the original stream. When the contribution from lighting is removed from each area, you can compress the forest of mesh+pixmap cache objects much more efficiently. Or, you can use it to refine your idea of what a surface is by tesselating its original mesh and throwing out a lot of the pixmap data (which takes up a lot of space relative to a mesh + lighting model).

    Going from (effectively) a simple block-based compressor to one with meshes, textures and lighting does, of course, make things a lot more costly for the decoder. Still, if there aren't too many light sources or reflectance, I could imagine a next-gen GPU managing to handle this. (Too much reflected light turns it into a generic ray tracer, which has very poor locality of memory references)

    This sort of thing could handle fairly static objects, but there's also the problem of how to compress deformable objects like faces or the silhouettes of transient objects that aren't spatially modelled. Probably some completely different approach is warranted there.

    This all sounds pretty pie in the sky, but getting an extra 30%-50% out of existing approaches probably won't be easy, IMO.

  7. Mike 16 Silver badge

    Frog Boiling

    I suspect that the main driver of advanced compression techniques will be slowly acclimating viewers to an ever-worse viewing experience. Bonus points: follow the cable trend in eliminating most (if not all) of the error correction. That will work even better on a mobile device, amiright?

    So the tradeoff is: when I try to watch an entire movie on my phone, do I want to hit my monthly data cap before it ends, or have the battery explode?

    BTW: The "story-telling" thread reminds me of an article I read years ago, about how speech-compression researchers were bummed that bandwidth was getting cheaper, faster than their compression was getting better, so they re-tooled their gear and wrote new grant proposals around "speech recognition" and "more natural sounding speech synthesis".

  8. low_resolution_foxxes

    I think it's a good headline goal that would help achieve a budget. Remember that 4K UHD is about to hit and that will use substantial bandwidth.

    It will be a holistic process - film studios using standard colours and backgrounds, fade in/outs, etc. Compression does a number of cool quirky things to change the image - only change the things that matter. Perhaps you could use multiple reference frames throughout sections of the movie based on the computed best frame matches.

    For reference, the H.264 standard was published in 2003, ergo the vast majority of patents will have expired by 2022, so the MPEG LA patent royalty group will lose it's patent royalties in a few years and will push to apply it's new fangled patents, whether they are worthwhile or not.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019