Google vice president of engineering Urs Hölzle has warned that unless we update the internet's underlying protocols, any improvements to network bandwidth will be wasted. "It's very clear that the network speed itself will increase," Hölzle said today during a keynote speech at the internet-infrastructure obsessed Velocity …
Nothing to see here, move along.
Speed is pretty important. But I'm quite sure they're quite happy with any extra information leaking toward them as ``a side effect'', like their DNS proposal. Yes, of course it's all for our own good, and of course google is not at all evil, nosiree. So that's alright, carry on then.
Other issues to consider first...
Almost every website I visit loads fast enough that I don't care about how fast it actually is. In fact there are only two situations where I have long load times.
1. Website won't load because I am waiting for (according to my browser's status bar) "adserver.effing-slow.com" or whatever shitty ad service the site uses.
2. Waiting for "analytics.google.com"
So maybe they should do something about that first.
Waiting for analytics.google.com?
3. Websites that are just a blank page with a Flash application. After maybe 1/4 second a "LOADING 0%" image appears, replaced with one saying "LOADING 10%" after about 10 seconds.
Usually I don't wait for the whole thing to load, because I assume that it won't be worth the wait...
The really annoying wait while all the crap you aren't interested in appears instantly and the content you actually want fades in slowly because it is "cool" to do it that way.
Talking of annoyances and flash, what about all the "click here to see a larger image" which then draws a small box, slowly zooms it out to half a screen or so, changes colour, then fades in the same image you were looking at in the first place, the same size as it was when you first saw it but in a larger window. Gah!
Slow loading web pages
have you considered Firefox with AdBlocker Plus.
If you make extremely effective use of its block lists, than a request to
can be blocked from being issued.
When the browser DOES NOT even MAKE a request to such a server, your page load times improve.
I know some lonely houses by the road
How many websites are they like that any more? I can't remember the last time I've seen one. Sure, this was an annoyance, not a problem, an annoyance 7 years or so ago but should I ever happen on one again I think I'd be tempted to take a look at it, for old times sake.
Also, people commentating here that websites are fast enough are probably members of the digital rich with fast internet connections and modern computers. You should consider your blessings rather than say speed improvements are not required.
So Larry Page says that "speed is a product's most important feature. Everything else is secondary"
really? By that logic a broken product that runs incredibly fast is better than a working product that runs slowly.
Speed IS important, but to say that "everything else", which would surely include; security, reliability and yes even aesthetics, is secondary... well that's clearly false.
I know Larry probably didn't mean it literally... but he represents one of the few companies that knows how to do IT right, why is he spouting marketing bullshit?
But this is exactly how google do their IT.
Stuff in Beta forever. Placing the importance of getting stuff out there over it working well.
All well and good if it's a little web app. Not such a hot idea when you're changing protocols.
(that's before we even get onto the "wisdom" of screwing with a protocol to cope with bad and/or obese web page design)
I'm not an expert in such things, but...
If the webpage was just one simple chunk of HTML, then the latency only counts one round trip. But if the initial webpage goes back again and again and again and again, spawning an ever growing cascade (word chosen intentionally) of Internet calls, then the latency is going to be multiplied by some factor such as ten or fifty (or whatever).
This is very obvious for those poor slobs stuck on satellite Internet with huge "a-thousand-and-one" latency. They visit certain simple websites and click, delay, bang, done. They visit other more-complex website, and it can take 10 or 15 seconds before the silly thing finishes loading all the bits and pieces.
"One simple chunk of HTML"
That's not strictly true. Most web servers use a window size of 2k for HTTP, which could have been designed specifically to highlight latency problems. Increasing this to a more realistic figure in the tens of kilobytes, and thus trusting TCP to do it's job of ensuring delivery, would result in an immediate decrease in latency sensitivity.
Also increasing the arbitrary restriction to two threads per client that most web servers implement would allow much faster loading of most pages, as the clients would be able to load a number of page elements in parallel, again reducing latency sensitivity.
About time TCP/IP got a workover
...and the 'do no evil' bulldozer can push it hard and fast, but header compression that 'bypasses' lower layers is a bit worrying as that is where the security stuff is handled, isn't it?
It annoys me when my browser is waiting for a picture to load that it already loaded a minute ago. ...Yes, I know that there are cache settings that can make this worse or better...
The browser should ask for a webpage's heiarchical 'parts list' with each element or branch timestamped. Then the browser could send back a much smaller parts list of the webpage elements that it actually needs (comparing timestamps). With such an approach, *applied throughout the entire Internet*, at every switch and router, then the actual traffic would just be what's actually changed since the last buffering.
Einstein at work?
Sorry, but what are you smoking?
There are already plenty of headers allowing browsers to check for modified content and, dependent upon user settings, optimise page reloading. But this is of no help when you load the page for the first time.
As for a list of parts - well, yes, this might be nice in theory but doesn't work for HTML. HTML is a flat file data serialisation that might, or might not contain embedded content (images) that the browser can display. The list of parts can only be deduced once the DOM (document object model) has been deduced from the HTML.
Yes, this is horribly inefficient but that's how it works unless you move over to a different format for content - binary such as PDF which has a convenient index? or do without the DOM altogether, JSON? Proxies such as Opera Turbo do something like this to reduce both the initial number of requests and the time it takes to transfer. The only fly in the ointment is getting everyone on board,
Einstein at work? Patently!
Actually, the lists of "parts" of an HTML document can be determined from the HTML asynchronously as the DOM is being populated -- in fact pretty much every modern browser already does this. Sure the list isn't verifiably complete until the HTML is completely loaded, but web authors can (and some do) ensure that browsers can start checking/loading external resources (CSS, scripts, media files, etc) as soon as possible by referencing them early in the HTML -- in meta-tags, for example. (Opera, for example, might have up to 8 connections to a server and 20 total connections open at any time [this is the default; like almost everything else in Opera, it's configurable in preferences ; ])
The biggest delay isn't in enumerating or navigating this list of parts, it's in the underlying network protocol.
An article in MSDN magazine about optimizing web sites from 2008 (http://msdn.microsoft.com/en-us/magazine/dd188562.aspx) highlighted exactly the issue Google is highlighting in their research: TCP starts every transfer at a very slow rate and ramps up speed until errors are detected, at which point it backs off to a stable rate. This works quite well for large files, but means that small files often don't reach an optimum speed. Google's proposed solution is to change the way TCP works. The MSDN article, coming from a website developer's perspective, suggested (among other things) combining small files needed for the same page into bigger files (e.g, making one big image with all of the page's images on it and using CSS sizing, positioning, and clipping to place the sections of the image in the right place on the page) which could allow more of the data to travel at full speed.
No need for buzzwords
He's saying to move the cache information that you now get with headers up to ride with the first page instead of each requiring another tcp three-way-handshake to get. How you do that (another set of headers, amend html, http keep-alive, ...) is less important. And yes, some of that already exists. The simplest that you can already do now is setup a (bigger than the built-in one, dedicated) cache service on your side of your slow uplink.
I think it would also be nice if webdesigners tested their sites more with slow link emulators (or actual slow links) and learned to reduce the number of different hosts their pages rely on.
I forgot who mentioned it or even who made it, but there'd been efforts previously to pack it all up into a single file together previously. PDF doesn't support display-while-loading because the ToC is in the back (same with, oh, zip and rar at least, tar has only "this next file is" headers so it could). HTTP keep-alive does much of the same, only at the HTTP level, provided both sides support it and the content doesn't need to come from a different host. Perhaps it would be useful to extend its support for multiplexing various streams over a single channel. Once that's underway, we might notice that we're being tripped up by how "fairness" currently works in TCP (as we already notice WRT bittorrent et al). But I digress.
Anyway. One could, for example, have the HTTP server pre-parse the HTML and draw up a parts list with timestamps in the headers. That's not unreasonable because HTTP is, after all, a protocol drawn up to transport HTML.
Display while loading..
"PDF doesn't support display-while-loading because the ToC is in the back "
I often see PDF files displayed while they are still loading using the browser plugin. It's more of a page at a time thing rather than individual elements but you can also scroll to any page in the document and it will load that before loading the surrounding pages.
So i don't get the whole google chrome speed thing, its rendering static text and images quickly, ok its quicker than some of the other browsers buts its only because they're at 1990s level of speed in the first place, its like being fastest in a one legged sack race.
I mean the OS's render quickly, games renderer way more stuff at a phenomenal rate, with considerably more complex data.
But yeah congrats for rendering some text and images quickly.
Browsers slower than native code? Whodathunkit?
See that's the problem. A computer game is whizzy and fast, and more than likely running native code. It's probably been made in C, C++ or some other compiled language that's designed around being speedyquick, with maybe some custom high-level script welded into the game engine itself.
Frankly, I'm surprised browsers run as quickly as they do, given the amount of crap that goes into your average rendering engine these days (a gajillion different lexers/parsers for different markup languages, exceptions for common badly-coded pages that don't follow the standards, exceptions for pages coded for badly-programmed browsers by Microsoft..).
Oh, the obligatory XKCD link: http://xkcd.com/676/
U is for Unicorn
Browsers do need to do an awful lot of stuff. In that respect the people at the w3c are masters of confusing the world with their abstract dreams of supposedly human readable gibberish. It doesn't help that their recommendations are all written from the perspective of the "web author" and are very little help if you'd like to write a browser from scratch. Most approaches at resolving ambiguity mean they've allowed all possibilities so now browsers have to, yes, implement them all.
I've poked around in the mozilla code enough to know it itself isn't terribly efficient, as code goes, I mean. It runs in circles through layer upon layer of abstraction built in a very 90s framework (the word ``framework'' itself has become a buzzword among developers, implying ``bloat'', even more so if that other dreaded buzzword appears, ``lightweight'' qv LDAP) en passant reinventing in C what C++ was invented to do -- though one might argue it was perhaps excusable back when they did it, but that's a debate for another time.
We could do better.
It takes not only better software, but better file formats also, more capable protocols where we have HTTP now, and perhaps some tweaks to the underlying protocols. TCP does eventually need fixing to cope with ever increasing bandwith*delay products, though I don't know if what google is doing is going to be more useful beyond the fact it'll be their usual embracing and extending done with exceeding cleverness, but not so much long-term vision. I'd rather have the IETF debate the stuffing out of it first, before google tries to win hearts and minds with their ``we're a big corp but we're not evil, honest'' shtick. Because that's why they're pushing it forward. ``Look mom, we make teh intarwebz better.''
Eventually we might get around to doing better. But I can already tell you that if SGML was a mistake, XML is a very creative reinvention of the exact same mistakes. And that patching up the underlying stuff (TCP is pretty good already, if decidedly non-trivial. HTTP much less so) is not going to provide a fundamental improvement, only an incremental one.
Games renderers vs browser renderers
Remember that games almost exclusively use hardware support for 3D rendering - its currently very unusual (outside R&D depts) for browsers to have HW accelerated rendering.
The OS itself also uses HW acceleration for much of its 2D display - and although some of this is also used by browsers as a side effect , in general the sort of 2d acceleration provided for isn't what is needed for browsers.
In short, a browser has to do its own rendering in SW, games and OS have lots of HW support.
It's an interesting fact that even though the data is more complex, the hardware now used means 3D renderering is faster than 2D rendering, and in some systems the 3d render HW is used to do the 2D rendering!
instant speed boost
@anonymous coward #2
Find your hosts file, add line which reads
save file and restart pc.
I'm sure you mean ...
Oh, and reboot your computer? I lol'd ...
Aww, don't lol too much
He probably got it from Computer Weekly...
reboot your computer
or nbtstat -R
Ahhh, http headers...
Http headers are especially ugly for ajax (which Google tends to do quite a lot) - I've seen a 12 byte stock quote update get wrapped in 600 bytes of http headers. Not to mention that you get to choose either HTTP/1.0, where you can do one request/response per tcp connection (reseting which gets you back to Google's noted TCP initial receive window deal) or HTTP/1.1, in which you can have multiple requests and responses, but they have to come in order, so a single slow request can block subsequent requests that might just be waiting at the server.
So yes, we need a HTTP/2.0, and we need it in 1999. If it takes a coporation so ethically murkey and tasteless as Google to get us off of some seriously ancient protocols, I guess that makes them a necessary evil.
Exploit the extra bandwidth, he says?
"...However, if you don't fix the protocols, we will not be able to exploit that extra bandwidth."
You mean, you will not be able to shove more goddamn' advertising in our faces.
There, fixed it.
Are you sure?
Turned the market on its head? Are you sure about that? I personally don't agree. Sure, it has an optimised JS engine, but then everything Google do is massively heavy in terms of JS load, so yeah, it's probably faster on sites of their creation. For pretty much everything else the gain is minimal, certainly not ground-breaking or something that "turned the market on its head" even 2 years ago.
"The problem is the ##junk##"
Block ads/flash/js & I can load a stonking page like <http://www.theoildrum.com/node/6645> images & all in 3 seconds with a full refresh from source (ie. bypass local cache). Anything normal-size is subsecond.
Who here thinks an average web page size of 320K is ridiculous? This page is currently 12.55 KB (12,856 bytes) as I post but growing with every comment. Is he talking about all the embedded animated GIFs and such that my browser is never going to request or may already have cached (such as the icons below)? That's my suspicion, they want to push some embedded image model to make it easier to shove ads down our throat, or to at least eliminate the performance benefit of blocking them.
"You mean, you will not be able to shove more goddamn' advertising in our faces" I'm amused -- the Reg article you are commenting on not only has more ads than a google search results page 25 vs. about 12), but they are more intrusive (how about that big Microsoft cloud services ad right in the middle of the copy?). Furthermore, El Reg has a lot of "white papers" that are just thinly disguised ads. At least Google's ads are clearly marked as such. So, where is your disdain for El Reg?
Ads?? What Ads, I don't see no G--D--- ADS!!!
Oh, that is right, I use Firefox with AdBlocker Plus and No Script.
Use the hosts file
Using one of the many ad blocking hosts files available is much more effective as it isn't just restricted to Firefox.
No fox icons, you'll have to settle for just the fire...
Want speed? Use Links or Lynx.
I watch Opera's count of what it loads on a page, and some of those have about 80 items on them. Want to see a page faster? Use Links or Lynx. No pictures, no script, just grab the text and display it.
Yes, HTTP is overkill for a lot of things. Then add on SOAP, and it starts getting ridiculous, absurd even. Pile something on top of HTTP and SOAP, and I call it protocol abuse.
And the solution is to tweak TCP?
The browsers need to kill slow connections much quicker. The main page loads in a couple of seconds, but everything stops dead in its tracks because an ad server or something else that's completely irrelevant to enjoying the page is slow. OK, just kill that item. Or better yet, don't load it at all. So use a text-only browser.
You'd have a point if elinks or lynx was any good
They do basically do the job they're designed for, but in general usage they're somewhat of a pain in the arse.
It's undoubtedly true that websites should be written to work in text browsers, but few are.
Still, I'm glad tmux, elinks/lynx and other tools make a basic Unix shell account considerably more usable.
Is this actually news?
I only ask as I remember messing about adjusting TCP receive/congestion windows when ADSL was first trialled in the UK.
The google pdf is interesting and all but its pretty bloody obvious to anyone with more than a passing familiarity with TCP.....
Ajax, hinting, content linking, verbosity, and ads are the problem!
HTTP2.0 with persistent connections and sometimes used request pipe-lining addressed slow start already. It doesn't address pulling content from a dozen or more different servers.
What Google really wants is to speed up ajax interactions, stuff like search suggestions and all those other crappy things that pop up stuff whenever your mouse goes over ANYTHING not whitespace. F me, I can't open a web page without playing Mario Brothers running my mouse along whitespace to get to something I want to click and see!
I hate XML. Its gotten faster over the years, but it still wastes lots of bandwidth. Look at a Word doc! Worse, look at a Word doc converted to HTML. Dynamic server pages are nearly as bad and wasteful of bandwidth. BTW, something else slow in their paper - Gmail and Google maps!
Finally, TCP/IP is still evolving after 30 years. HTTP connections originally closed like normal TCP connections with FIN handshake, long replaced by FIN RST. IP Implementations, network congestion, speed differentials, router memory, and traffic dropping (shaping) policies all impact behavior besides connection bandwidths, server settings, and client settings. All the bounding factors might have opened up enough most places so their changes work.
I hate nearly all the interactive web page crap shoved at me. Google's changes will just encourage more of it. Give me a static page and let me click what I want to see!
Yep speed is good...
...as it means you can squeeze more ads inbetween the pages as they load and pass the clicks back down the line to...now let me see who takes a cut of clicks and ad-revenue...hmmm tricky one!
Umm, smal question..
How much of this new Speedy protocol is dedicated to giving Google information about what you do on the Net?
Yes, I'm paranoid - they have given me plenty of reason to be. That's resolving via Google DNS isn't going to happen here soon either. Google is NOT independent.
Rather than change TCP, upgrade HTTP
HTTP is a hack of FTP basically, specifically tailored to ASCII content. For what it's designed for, ie transferring ASCII HTML pages over the network, HTTP is very good. About the only 2 advanced features of HTTP 1.1 that it can do is compress content with deflate and resume downloads. And that's it. For binary transfer it's useless, and while FTP is better, it's issues with NAT and firewalls with Passive / Active transfers don't make it compatible enough.
However nowadays, with so much crap (I'm looking at you, Flash, MP3 streamers, etc) also streaming over HTTP, it'd be far, far better to introduce a whole new transport protocol for the web, that allowed the usage of both UDP (for streaming) and TCP (for transacted items), then with the JPEG progressive format even images could be speeded up considerably by 'streaming' them over UDP.
Furthermore, multicasting would also be particularly useful for services like iPlayer as well - all of which HTTP can't actually do itself but has been hacked to do.
Then you could look at multiple connections built into the protocol so that a browser can open multiple concurrent connections.
Then the web server requires a Session cookie to track a user's path through the site however, that's used because HTTP doesn't maintain a connection. So make it keep the state and channel open to the server and the session can be built in to the protocol as well, thus saving the need for cookie support.
Lastly, a good, hard look at HTTP headers is desperately needed as the invasion of privacy and browser fingerprinting is at an all time high.
The problem is that by taking a good hard look at cookie support and HTTP headers, Google is possibly the most polarized WRONG sponsor of such a protocol as a result...
Might want to look at SCTP then.
Or think of some HTTP type thing on top of that.
Code or it never happened
If SPDY is so good, and google want us all to change our infrastructure based on their research, then where the fuck is the code. Implement a mod_spdy handler for apache, provide a spdy interface for chrome and let us decide.
This is going to cause problems in the Enterprise...
From the PDF:
Applications using concurrent TCP connections
Traffic patterns from applications using multiple concurrent
TCP connections with a large init cwnd represent one of
the worst-case scenarios where latency can be adversely impacted
by bottleneck buffer overflow.
So any website running multiple virtual servers or in a virtualize server farm (such as in a VMWare cluster running multiple virtual systems, as an example) these changes are going to cause performance to go to hell. I know many environments where 20-30+ virtual machines (and in some cases MANY more) are running on each physical server.
Also, Im going to have to go and dig, but there were a number of cases where SGI/Irix systems and IBM AIX systems running on the same network saw issues due to the way IRIX aggressively attempted to stream IP packets and the way AIX systems liked to "play nice" and not attempt to force themselves into a network but "wait their turn" to communicate. I see this as just a more current version of IRIX's attempts to aggressively continue streaming data until someone else on the network forcefully requested "their turn" to stream. SGI had to add a new kernel tuneable allowing admins to force this option off globally for the IRIX TCP stack so their systems wouldn't perform as well within mixed vendor networks but would "play well with others" once implemented.
This proposal doesn't appear to have really tested with anything more than Linux and current Microsoft OS and they do not clearly describe how many nodes were involved in the testing. Lets see what happens when you supernet (255.255.253 or 252.0 subnet mask) and put 500+ nodes within a broadcast domain at GbE speeds and mix a few more other OS versions in the mix before this gets seriously considered.
I would love to hear what the NANOG guys are saying about this proposal.
Comments on EAK's and prone's comments
Mr. Horizontal (prone!): HTML was developed by a physicist, not a computer scientist, to help communicate at CERN with a simple mark-up language authored by hand using a text editor. While many terse and efficient, binary, document+graphics page mark-up languages already existed, most were proprietary (Interleaf, Frame, MS Word, PageMaker, Quark, WordPerfect etc..), Bernars-Lee chose to create his own, modeled from the most verbose and inefficient mark-up language, SGML; a product of committee meant to satisfy every need and an ISO standard.
HTTP is used outside its scope because it was the easiest way to get data through most firewalls. Additionally, it survives "traffic shaping" and QoS better than any UDP traffic on non-well known ports, but not as well as VoIP.
You are 70% wrong about HTTP not maintaining a connection. Only embedded use the simpler V1 implementations. V2 allows connection persistence. Pipelined requests (doing GETs on new objects before outstanding ones have been sent) happen far less than they should due to all the layers and components of software constructing and sending content from servers - its just less bug-prone and complicated for each bloated package to serve its part on its own connection. HTTP connections are also kept open using a technique called browser comforting which tells browsers to not time out the GET, more data is on the way.
Google's changes also accommodate (encourage?) that inefficient and obese server software that's too complicated to funnel disparate pieces over already established connections. Google's changes supply another out for bad software, besides all those revenue generating land mines on pages.
Eric, you made some interesting points. Virtual servers can be a blessing. With only 65K TCP port numbers(16 bits in IPV4) for an IP address and being required to wait before reusing closed ones, running out sometimes is a reality previously solved by load balancers to proxy connections to multi-hosted server interfaces (a pool of IP addresses to use).
You are spot on about interaction issues. TCP window sizes are listed in the Google paper for various implementations. Vista and Win7 are tiny compared to the others, especially MacOS. There are actually two reasons for it. The built-in QoS scheduler is the main one. Higher priority packets will have a long queuing delay to get serviced if it has to wait a long time for a gap in a long HTTP stream. It has more to do with behavior of the server and devices in between than the client. Microsoft plays it safe such that the high priority applications get good service despite unknown and uncontrollable devices. The workaround for poor network file sharing speeds on LANs (where "feature" noticed the most) while listening to streaming audio was turning off streaming multimedia priority in the registry.
I've seen a bug in another Unix stack that caused a problem like the one you have with IRIX. Using Wireshark, I saw that the server had sent all the data followed by a FIN, saying it had no more data and was ready to close. The client didn't get one of the last packets, and ACKed only up to before the problem. The server responded with the missing packets OUT OF ORDER, followed by a FIN. Repeat infinitely, connection appears "hung". If the client was able to re-order packets, this would not be a problem. I think it was a Linux client, so an OS used as a server can't spare the time and memory to hold some number of packets to sort, hopeful that it has missing pieces. If either the bug didn't exist or the client did reordering, all would have been fine.
There is more than just http to consider
Whilst http and its use on the Internet may be important, it is not the only application to use TCP. In corporate environments, http is likely to behind file serving and databases, in terms of TCP traffic volumes. When tinkering with TCP parameters/algorithms, it is important to consider the function and performance of all applications in use.
The snag with the idea of using multiple TCP connections (like common P2P apps) is that this subverts the typical "per connection" sharing of bandwidth and is likely to be countered by traffic shaping measures. Given that some P2P apps masquerade as http traffic to get around firewall restrictions, it is reasonable to assume that TCP connections on http or https ports will already be considered for such constraints (Google Maps sometimes falls foul of such measures).
@Eric Kimminau TREG, spot on. In a nice clear network, having a huge congestion window speeds things up. In a "not-nice" congested network, having a huge congestion window leads to high latency and perhaps congestion collapse.