Protocol buffer: it’s the object serialization scheme the pretentious little shit on your development team has been talking at you about during lunch hours for the past couple of days. You’ve been feigning interest with a steady stream of “oh-yeahs” and “that-sounds-cools”, so you don’t really know what it is. Well today is your …
Ah spot the newbie
When reading the article I got the feeling this was a first attempt at a Reg article. Clicking on the 'more articles by this author' link only confirmed this.
Someone who's been told what the reg style is like, and decided that means using swear words, and comparing programming code to having fun with the Prom Queen.
It was actually a fairly interesting article fact wise, and I'm sure the style will settle down after a few more posts / some time spent reading a few more posts etc.
You don't need swear words in every sentence to make a reg article, that's what the comments section is there for.
A couple of points
"Google invented the protocol buffer because they found XML parsing to be too slow, and XML messages too large"
No shit? I think this...
...sums up the nonsense about XML quite nicely. It's just nice to see that someone as obviously cool and trendy as google has finally seen through this most transparent (or should that be obfuscated and opaque) horror.
As for google's "protocol buffer" serialisation scheme, what's wrong with a good old fashioned (ie - it works) csv file (or similar). Something like...
Ah, of course! That's the problem isn't it! It may work. It may be uber-efficient (sorry - I don't have an umlaut on my keyboard) and it may be infinitely well understood. But is it cool? Errr.... not really. Is it trendy and therefore almost certainly obsolete this time next year? Errr.... noooo... probably not. Oh dear then - consign it to the dustbin of coding simplicity.
Good Article, Bad Language
Interesting and amusing overview of the subject, but why the swearing.
Try transmitting binary data that way, and see how far you get.
Also, where it matters, Protocol Buffers and Thrift and ASN.1 and so forth are generally more efficient than weird tokenisers for typeless CSV.
El Reg hacked
Has El Reg been pwned by the Twat-o-Tron?
If I'm passing objects between two functions in a program, I don't care how it is done. I'm happy for the compiler to solve that problem for me. Not only are its solutions probably at least as efficient as anything I dream up, I can always change compiler if they aren't. (Dynamically linked libraries put a little wrinkle into that, but we seem to manage.)
Now move that same problem into the persistence or transport domain. Suddenly you are no longer free to change your mind. You can no longer be sure that the other end of the connection is -or-was the same version of the software. It could either pre-date or post-date your version.
OK, any fule can stick a version number in the software. However, if your serialisation scheme is automatic, this means it is keyed off implementation details. A different version may not *have* anything corresponding to the element that you've just serialised. You've just sent the moral equivalent of a core dump to your past-or-future-self and you're expecting an automatic system to somehow translate. It's a bit like reverse engineering, twice, and then glueing together.
Fine, so you sit down and *design* a protocol/format that you are willing to support indefinitely, thereby avoiding this fatal flaw, and let your automated system transport *that* instead.
Oh bugger, actual *design* was the hard problem we were hoping to avoid with this system. Once we've gone to all the effort of a proper design, actually mapping its primitive data types to (say) network-ordered raw binary is less than an hour's work and actually sending it over the wire or off to storage is a one-liner.
Reflection has its place. That place is in-proc. Never export implementation details outside of your process, unless you already know that your product is doomed to failure and so you'll never have to produce "version 2".
grind that axe baby!
lots of reiterations of the pretentious little doo doo. Most of us just laugh quietly at them - what set you off on this rampage of hatred?
Nice article in anycase, very funny and quite informative.
Symbol use Ironic more than indicative
Good article, good language.
Could someone explain what "weird tokenisers for typeless CSV" means, please?
This discussion is already mouldy.
...sums up the nonsense about XML quite nicely."
No. It sums up the lameness of people writing blogs while they should be spending quality time in front of the TV with a beer & chips. Or maybe having useful discussions with Jehova's Witnesses.
"what's wrong with a good old fashioned (ie - it works) csv file"
Exactly. It doesn't do trees for one.
Turnoff = Bad language
The article's content was very much ruined by the bad language. I'm sorry, but I am in a work environment here... talking about f***ing the prom queen and all that s*** does not really fly when the CEO walks past.
You can code your data any way you like - binary data is no problem. XML manages as do lots of other encoding schemes. It's just the endless baggage that comes with it that you don't need.
And yes, I accept you may need something a little more descriptive than a plain csv, but again, you don't need all the baggage of something like XML, and in a huge number of cases, a variable:value list is perfectly good; most Unix config files have used this for donkeys without much trouble, and with the not-insignificant advantage that they are actually human readable (yes, XML is, technically, human readable, but in practice, it can be horribly complex and with no discernible advantage for that complexity).
Translation for Protocol Buffers?
"I'm about to drop some science up in this bitch."
Uh? Once again, in a language that your readers actually speak?
"You define your object in terms of its primitives in a special language"
Ahhh, gotcha. Perhaps English, next time?
Agree with others - passable article content-wise, distractingly full of expletives.
We're all pretending to be grown ups here, so no need for the "look at me, I've used a naughty word!!!" malarkey.
Has 50 Cent suddenly taken an interest in application development and got a job with El Reg. Or is this some lame attempt from a noob writer to be edgy & street?
Cos it was a little embarrassing in a "Ali G" kind of way. Don't do it again.
Re: This discussion is already mouldy.
Q: "what's wrong with a good old fashioned (ie - it works) csv file"
A: Exactly. It doesn't do trees for one.
Yes it does! I remember old Windows 3.1 ".ini" files used to use something like....
Some more data
Even more data
It works. It's reliable. It's human-readable. It's not difficult, and it doesn't require a boat load of "<" and ">" characters that don't actually DO anything
You can do trees in CSV
I've been on the receiving end, it means that there are relationships and ordering assumed by the CSV as in record type A is the head of the tree, B a leaf C a leaf of a leaf etc.. Seen this with geographical data supplied as CSV.
It stops being funny after a while (debugging and testing is a nightmare) and you wish they'd just used XML, honest. It may be "slower" but the time you spend reconstructing the trees and wondering if the hierarchy is deep enough (as in some committee would have to add a leaf type F for example) is more than repaid by using structured data.
That said, haven't read about the Google thingamabob and don't care. Structured data for structured things, CSV is fine for long lists of heterogeneous stuff that doesn't have relationships between records. Horses for courses and all that jazz.
(Aside: Firefox's dictionary likes thingamabob, at least for UK dictionary).
No wai homey, 50 Cent is too cool for this shiznitz, they've hired Eminem instead.
Disappointing, what could have been a decent article was ruined by the style and should have been tagged NSFW. Lose the filthy language, young padawan.
More like, embarrassing in a "Richard Madeley does Ali G" kind of way.
pretentious little shit ...
... wait, do you work at the same place as me ...?
as for the swearing, I think it adds a great deal, keep it up.
good article, more (reality) please
WTF is happening to The Register?
First all the "Climate change a pinko plot" posts and now "no swearing please"?!??!
"Think of how scalable that shit's gonna be. You'll put a real hurt on all that imaginary load your system is taking. Then, you get to go home and fuck the prom queen."
is the first thing I've laughed out loud at in the Register for months. More please. And add swearing in the _middle_ of words next time.
No free speech here please
Can El Reg contributors please refrain from using words that I personally find offensive, and only use a writing style that I personally like.
I'm all for free speech but there is a limit, and it's whatever point I personally find unnecessary.
I am mentally retarded, and while I can program a computer no problem, I can't quite understand freedom of speech.
Right, back to Fucking The Prom Queen (XBox, £29.99).
how many reps?
It does suggest a boy trying to talk dirty and not quite getting it right. Forgiveable I suppose, but it ends up ludicrously with "flex your nuts".
2. How many pounds?
3. How many reps?
Just had to put in my two <local-minor-denomination>s for the affirmative team - I found this article extremely amusing. Frankly I'm amazed people still see this 'language', let alone find it offensive. Indeed, this article reads exactly like conversation from our office.
Mines the one with the fur collar and cane.
Trees and CSV
Windows .ini files, yes, they're not really CSV files are they? Anyway, bollocks to trees, where it really gets interesting is graphs. XML has a (semi-)standard way to reference another object in the tree this by tightly defining the "id" tag as a unique identifier for a node. Or by using XPath.
There's no standard way to do this in CSV, and I haven't seen this in ASN.1 yet (but maybe it's there). Either way, serializing an object graph is the general case, and while CSV, .ini files, Apache-style config files and so on are probably simpler for specific cases, if you want a standardized catch-all solution then XML/XPath is one option, and maybe this google thing is another.
Oh, almost forgot: "arse". Wouldn't want this comment to raise the tone.
May I suggest some proper alternatives:
2. Swamp Donkey
3. Chutney Ferret
6. By George!
Perhaps they would go over better with the tea drinking crowd here, Ted ;-)
(Commence incineration in 5, 4, 3, 2....)
Sorry, crap article
If your discussing a new technology, tell me what problems it solves, give me an example or two. Tell me in a factual way why it’s something I should invest my time and money in to learn. Don’t tell me I can go home and fuck the prom queen. Although I’m from the UK, I get the impression that most “prom queens” are unintelligent morons that care more about being popular and looks than getting an education. Frank Zappa’s “Valley Girls” springs to mind, “Gag me with a spoon” indeed.
“Thrift failed to gain heavy traction because its name isn't terribly cool, nor does it give way to an acronym that contains the letters J or X.”
Ok, that’s disturbing, Teddy boy says that a technology failed because it’s name was not “cool” and failed to have a descent acronym.
I am not an aggressive guy, far from it in fact. But, had I been in a meeting with my peers and Teddy boy piped up with that little nugget, I would not be able to stop myself from saying BOLLOCKS!
What programmer, worth anything, would base his or her choice of technology on a bloody name or acronym? I don’t care if it’s called “Tinkerbelle Web API Technology” or T.W.A.T. If it’s good, and solves problems it’s worth learning.
As for the AC “A couple of points” commenting “It's just nice to see that someone as obviously cool and trendy as google has finally seen through this most transparent (or should that be obfuscated and opaque) horror.”
Sorry, the same goes for you, “Cool” and “Trendy” don’t wash.
Like it or loathe it, XML is here and understanding it is currently a better career choice than refusing to. Arguing that Google or anyone else has a better way won’t get you far unless it’s adopted by the masses.
I, for one, wont spend my time to learn Google’s answer because a somebody tells me its “cool” without first backing it up with some real world examples.
"I'm about to drop some science up in this bitch."
Loved it. Srsly.
Re: Pad's comment on prom queens. You're right. Dumb, (usually) blonde, bimbos. That's what makes 'em great, for nocturnal activities, not long term relationships. Or even extended conversations the next morning...
/in b4 anyone correctly points out that a prom queen wouldn't have touched me with a bargepole back then. Or now.
Ted - grow up.
I have to agree in part with many of the previous comments, that the amount of (seemingly pointless) swearing detracts from the "article" (or, more accurately, rant). Where I disagree with some is in the amount of fact in article - it seems almost completely devoid of it and I could similarly find no meaningful technical examination.. just a few mumbled protocol names.
The "reason" for protocol buffers was not (just) that XML was too slow - which you might realise if you took your head out of your arse and read a bit.. and if you don't want it, then don't use it... it's just another serialising framework with some features that may prove useful to some people. It may be of no use to you, overly complicated or restrictive.. or the dog bollocks.
Please don't let what appears to be your intolerance of anything from Google to colour what this might mean to everyone else. I have no axe to grind either way - but don't see much usefulness in such poorly written, technically empty, mindless nonsense either.
Unless you do your own research, any article in any journal will need some fleshing out with original source information. This romp is no different.
"Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:
* are simpler
* are 3 to 10 times smaller
* are 20 to 100 times faster
* are less ambiguous
* generate data access classes that are easier to use programmatically"
As the author notes, Google is long on developing technologies (or honing existing technologies) to address issues of scale. "3 to 10 times smaller" and "20 to 100 times faster" are quantifiable benefits in a large-scale operation.
And for those of you who are not familiar with the concept of "serialization" ... stay away from this technology until you graduate. It's a very useful tool for storing and passing complex data relationships in non-volatile packages.
Oh ... and here's the obligatory: "craphole" ... although I do hope this trend of including curse words becomes optional in the near future. I did enjoy the surprise element during my reading of this article, however, and find it hard to argue with the "pretentious" digs and the comments about the realities of scale faced by the vast majority of programmers.
Actual can do an interview with out street talk. You're think more like ghost face from wu tang. You look at the words individually and you know its English, string to gather and you swear he speaking a forcing language that uses English words .
Um..were you intoxicated whilst writing this!?
Ted Dziuba: Uncov
How did no one notice that this is the uncov guy?
Thank you, El Reg, for making a deal to hire the greatest technical writer of all time.
Ted, keep up the swearing. I have missed you.
Mine's the one that has been shooped.
Summer is here!
...so pundits and web commentariat take the time to dwell on well-known truths. Today:
"The right tool for the right job"
Use a heavy tool with a deep-green ecosystem for a complex, standard or gonzo job. Drop baggage as needed. XML is king here.
Use a simple tool for a simple "XP-style" job. Hand-code as needed, you may even get the escaping correct. INI files go here.
Use a specialized tool in the high mountain ranges in which additional costs and adventurism are justified. Only for sherpas with snow googles.
For fun, take a week off and get into Lex, Yacc or ANTLR to flex muscles.
And we finish with a link about file formats: http://www.faqs.org/docs/artu/ch05s02.html
I notice we are out of proper serialization country, but so what. No swearwords were encountered.
XML is God
I wrote an app a couple of years ago that had 75,000 lines of Java code, and 100,000 lines of XML configuration. Okay, maybe that was a bit extreme, but it was very cool.
There is nothing "wrong" with XML - it meets all of its design goals quite nicely. All those critics of XML should read the XML spec some time, and look at the 10 design goals (my favourite: Terseness in XML markup is of minimal importance). If XML is *used* in ways that don't meet its design goals, that's hardly the fault of XML itself.
Take SOAP, for example - it is a hideous XML dialect, but that wasn't the fault of XML, rather it was the fault of the designers of SOAP.
XML processing doesn't have to be slow. If you use XML with Java, check out the JiBX framework (http://jibx.sourceforge.net/) - XML binding that is 10x faster than most other binding frameworks. You can even get hardware appliances to speed up XML processing if you really need scalability - look at IBM's DataPower box.
You're all surprised because this is how Americans slang it up.
Is this ZDNet?
Protocol buffers are just silly - you have XML when you need a self-describing document with a schema (pick a style) and JSON when you don't. If you're going to need performance, stream it - I mean this guy (at Google) is comparing a heavyweight DOM parser with his PB implementation! If you need smaller size, down the wire, GZIP or RLE (for speed) it then just SERIALIZE THAT if you're trying to do some binary transfer!
The whole point of XML was that it was text based. XML is used as a first class citizen in several frameworks for creating data access classes and even as intermediaries when creating data access models. JSON is a first class citizen on the client side so why create a new bleeding format? Answer: who cares? A file format is a file format is a file format. If you're sitting that far on the bleeding edge you're bound to get your bollocks sliced off sooner or later - so if you are doing any kind of software development which makes money then this API is probably not for you - for the next 24 months at least.
Good Article -- but not news
Ever since the birth of XML people have been rubbishing it as too verbose -- which it is, human ureadable -- which it is, and too slow -- which is sort of true more anon,and promoting XXL hamburger junky sized messages -- which is true.
They then propose alternatnatives JSON, YAML etc. etc. which nearly all turn out to be more of the above but with smaller message sizes.
The problem really is XML was there first, it works, and is well supported, and none of the alternatives are actually any better.
XLM parsing is slow, a problem made worse by neophites using DOM parsers to read one or two attributes when a stream parser would do the job in a tenth of the time. However JSON, YAML etc. etc. are also quite slow, in theory they should be faster but XML has been around a long tilme and a lot of clever people have been optimising those parsers, plus while building a DOM structure is slow it speeds up hte rest of your program as access to complex data is simpler and faster.
XML messages are big --not actually a problem unless you are stuck on an old X25 network or a 16K modem. In most modern networks the latency time ( "the please mister can I send my data now?" wait time ) is nearly as much as the time spent sending a message. This tends too nullifyu any speed advantage for smaller messages.
I recently read one of those IBM Developer Works articles in which the author was trying out a CORBA over SOAP workaround to get an RPC call through a firewall. Being concientious he did some performance testing and was surprised to find that RPC over SOAP was faster than the native J2EE Corba/RMI interface. Bearing in mind that you will never see the words "optimize" or "effciency" in the WS** or SOAP manuals this is pretty good.
Only five minutes before I read this article, my very own PLS was spouting The Truth about serialization and Google's Protocol Buffer at me while I was desperately trying to absorb coffee.
And then I lost it all over my keyboard.
Jihad Crusaders .... Infidels to Fatima ..... Sub-Prime Coders.
""You define your object in terms of its primitives in a special language"
Ahhh, gotcha. Perhaps English, next time?" ... By Jon Green Posted Monday 14th July 2008 15:26 GMT
ITs Raw Python, Jon. An ACQWired Passion with XXXXCentric Taste. QuITe Necessary for the Full Monty in Virtualised Space and ITs CyberSpaces.
But ITs Transparency makes IT AIdDoddle to Reverse Engineer for Original Source .... OriJinn. Wholly Spirits? Soul Mates?
Which is just an Opinion Reflected and Reinforced/Enriched for Fact.
The bigger better shitter story
I was disappointed in the article - the bigger story about immovable XML-hair-shirts and spindly Googlie-disciples was hinted at but ruined with an unconvincing style which robbed the technical aspects of any authority. Calling someone a pretentious shit kind of marks you out as one of the same. Even if they actually are pretentious shits it's such a self-defeating phrase, such a mealy mouthed geek-nerd retort of unnerving snarkiness it makes you sound like a petulant 5 year old.
So far no-one on the XML side has explained how to deal with fast, minimal, low-latency data; and no-one on the protocol-buffers side has detailed the limits of practicality to their approach.
Has the list of things to argue about run so low we're having flamewars about XML ? XML ?! The whole internet is up in arms over one preferable method over another used as a framework for serialising data for transmission ??? !!! It might have started with Spectrum versus BBC-B but fuck me if it the religions of the internet haven't become so utterly tedious I'm not sure I want to logon any more.
PH because she knows as much about protocol buffers and XML implementation as I ever want to. And she's a pretentious little shit!
Who is this article written for?
The Register: it's alright for news, crap for technical content.
If this is an effort at a technical article, please just stop now. This describes nothing specific about the technical content; it just demonstrates the author has "attitude". Not interested.
bad language aside
The article (and Google's new meme) is making a mountain out of absolutely nothing. We all know xml sucks for serialisation. YAML is much nicer than ASN.1 or CSV. Lets have first-order objects instead of simple name, value pairs, please!
Has no-one read the Bile Blog?
I'd have thought that at least someone would have commented on the similarities between this article and some of Hani Suleiman's more mild outpourings on 'Open Sores' and anything else that ticks him off.
Paris - because she knows an angel dies every time someone uses a bad word.
For all the people bitching about the technical content: this is a ***HUMOROUS ARTICLE***, not a technical one - I can't believe anyone could have made that mistake.
If it's not your style of comedy - fine. Personally I thought it was hilarious, and it's exactly the sort of geeky humour that belongs on the Reg.
the people at Google used LEX to generate their parser?
I just read over http://code.google.com/apis/protocolbuffers/docs/overview.html and I dunno quite what to say. Every couple of years we to see some gee-whiz system mapping classes onto 'generic' text descriptions stirred in with rehashed RPC. If the mapping comes from an egghead in a university nobody much gives a damn. If it comes from Google it is news. If it comes from Microsoft it's less news but it gets used a lot.
Thus the power of money I suppose.
Re : C'mon people
> For all the people bitching about the technical content: this is a ***HUMOROUS ARTICLE***, not a technical one - I can't believe anyone could have made that mistake.
Why can't you believe it ?
... because there *are* actually people in the world who don't follow the arrival of every so-say technical Messiah in the blogging world ?
... or because there's way too much dross of a similar style around that *is* trying to be serious ?
..or because it's not particularly funny ?
Personally i'm going for all three but - as you point out - that's partly because this particular rant is not my style of humour (FMPOV bad-mouthed for the sake of, no technical jokes and making no point - just over-the-top Google bashing which is a pretty fucking easy target).. i've read some of his other stuff and it's generally pretty bloody funny.
This. Is. Shit.
The comment section is way better than the articule
"We all know xml sucks for serialisation"
"Your learning is simply amazing, Sir Bedivere. Pray explain again how sheep's bladder can be used for intercourse."
Fuck those whiny tools
"I'm about to drop some science up in this bitch" awesome
THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST
- Vid Hubble 'scope snaps 200,000-ton chunky crumble conundrum
- Bugger the jetpack, where's my 21st-century Psion?
- Windows 8.1 Update 1 spewed online a MONTH early – by Microsoft
- Something for the Weekend, Sir? Why can’t I walk past Maplin without buying stuff I don’t need?
- Review 'Mommy got me an UltraVibe Pleasure 2000 for Xmas!' South Park: Stick of Truth