back to article Google releases serialization scheme

Protocol buffer: it’s the object serialization scheme the pretentious little shit on your development team has been talking at you about during lunch hours for the past couple of days. You’ve been feigning interest with a steady stream of “oh-yeahs” and “that-sounds-cools”, so you don’t really know what it is. Well today is …

COMMENTS

This topic is closed for new posts.
  1. Guy
    Flame

    Ah spot the newbie

    When reading the article I got the feeling this was a first attempt at a Reg article. Clicking on the 'more articles by this author' link only confirmed this.

    Someone who's been told what the reg style is like, and decided that means using swear words, and comparing programming code to having fun with the Prom Queen.

    It was actually a fairly interesting article fact wise, and I'm sure the style will settle down after a few more posts / some time spent reading a few more posts etc.

    You don't need swear words in every sentence to make a reg article, that's what the comments section is there for.

  2. Anonymous Coward
    Anonymous Coward

    A couple of points

    "Google invented the protocol buffer because they found XML parsing to be too slow, and XML messages too large"

    No shit? I think this...

    http://www.codinghorror.com/blog/archives/001114.html

    ...sums up the nonsense about XML quite nicely. It's just nice to see that someone as obviously cool and trendy as google has finally seen through this most transparent (or should that be obfuscated and opaque) horror.

    As for google's "protocol buffer" serialisation scheme, what's wrong with a good old fashioned (ie - it works) csv file (or similar). Something like...

    var1, value1

    var2, value2

    Ah, of course! That's the problem isn't it! It may work. It may be uber-efficient (sorry - I don't have an umlaut on my keyboard) and it may be infinitely well understood. But is it cool? Errr.... not really. Is it trendy and therefore almost certainly obsolete this time next year? Errr.... noooo... probably not. Oh dear then - consign it to the dustbin of coding simplicity.

  3. John Miles

    Good Article, Bad Language

    Interesting and amusing overview of the subject, but why the swearing.

  4. Robert Synnott

    @Guy

    Try transmitting binary data that way, and see how far you get.

    Also, where it matters, Protocol Buffers and Thrift and ASN.1 and so forth are generally more efficient than weird tokenisers for typeless CSV.

  5. Alastair Smith
    Dead Vulture

    El Reg hacked

    Has El Reg been pwned by the Twat-o-Tron?

  6. Ken Hagan Gold badge

    Wrongheaded

    If I'm passing objects between two functions in a program, I don't care how it is done. I'm happy for the compiler to solve that problem for me. Not only are its solutions probably at least as efficient as anything I dream up, I can always change compiler if they aren't. (Dynamically linked libraries put a little wrinkle into that, but we seem to manage.)

    Now move that same problem into the persistence or transport domain. Suddenly you are no longer free to change your mind. You can no longer be sure that the other end of the connection is -or-was the same version of the software. It could either pre-date or post-date your version.

    OK, any fule can stick a version number in the software. However, if your serialisation scheme is automatic, this means it is keyed off implementation details. A different version may not *have* anything corresponding to the element that you've just serialised. You've just sent the moral equivalent of a core dump to your past-or-future-self and you're expecting an automatic system to somehow translate. It's a bit like reverse engineering, twice, and then glueing together.

    Fine, so you sit down and *design* a protocol/format that you are willing to support indefinitely, thereby avoiding this fatal flaw, and let your automated system transport *that* instead.

    Oh bugger, actual *design* was the hard problem we were hoping to avoid with this system. Once we've gone to all the effort of a proper design, actually mapping its primitive data types to (say) network-ordered raw binary is less than an hour's work and actually sending it over the wire or off to storage is a one-liner.

    Reflection has its place. That place is in-proc. Never export implementation details outside of your process, unless you already know that your product is doomed to failure and so you'll never have to produce "version 2".

  7. Scott Watson
    Boffin

    grind that axe baby!

    lots of reiterations of the pretentious little doo doo. Most of us just laugh quietly at them - what set you off on this rampage of hatred?

    Nice article in anycase, very funny and quite informative.

    Symbol use Ironic more than indicative

  8. Anonymous Coward
    Anonymous Coward

    Good article, good language.

    Enjoyed it.

    Could someone explain what "weird tokenisers for typeless CSV" means, please?

  9. Destroy All Monsters Silver badge
    Boffin

    This discussion is already mouldy.

    "http://www.codinghorror.com/blog/archives/001114.html

    ...sums up the nonsense about XML quite nicely."

    No. It sums up the lameness of people writing blogs while they should be spending quality time in front of the TV with a beer & chips. Or maybe having useful discussions with Jehova's Witnesses.

    "what's wrong with a good old fashioned (ie - it works) csv file"

    Exactly. It doesn't do trees for one.

  10. Anonymous Coward
    Thumb Down

    Turnoff = Bad language

    The article's content was very much ruined by the bad language. I'm sorry, but I am in a work environment here... talking about f***ing the prom queen and all that s*** does not really fly when the CEO walks past.

  11. Anonymous Coward
    Anonymous Coward

    @Robert Synnott

    You can code your data any way you like - binary data is no problem. XML manages as do lots of other encoding schemes. It's just the endless baggage that comes with it that you don't need.

    And yes, I accept you may need something a little more descriptive than a plain csv, but again, you don't need all the baggage of something like XML, and in a huge number of cases, a variable:value list is perfectly good; most Unix config files have used this for donkeys without much trouble, and with the not-insignificant advantage that they are actually human readable (yes, XML is, technically, human readable, but in practice, it can be horribly complex and with no discernible advantage for that complexity).

  12. Jon Green
    Thumb Down

    Translation for Protocol Buffers?

    "I'm about to drop some science up in this bitch."

    Uh? Once again, in a language that your readers actually speak?

    "You define your object in terms of its primitives in a special language"

    Ahhh, gotcha. Perhaps English, next time?

  13. Ian K
    Thumb Down

    Arse! Feck!

    Agree with others - passable article content-wise, distractingly full of expletives.

    We're all pretending to be grown ups here, so no need for the "look at me, I've used a naughty word!!!" malarkey.

  14. Gav
    Thumb Down

    Lame

    Has 50 Cent suddenly taken an interest in application development and got a job with El Reg. Or is this some lame attempt from a noob writer to be edgy & street?

    Cos it was a little embarrassing in a "Ali G" kind of way. Don't do it again.

  15. Anonymous Coward
    Anonymous Coward

    Re: This discussion is already mouldy.

    Q: "what's wrong with a good old fashioned (ie - it works) csv file"

    A: Exactly. It doesn't do trees for one.

    Yes it does! I remember old Windows 3.1 ".ini" files used to use something like....

    [blabla:treenode1]

    Some data

    [blabla:treenode2]

    Some more data

    [blabla:treenode3]

    Even more data

    It works. It's reliable. It's human-readable. It's not difficult, and it doesn't require a boat load of "<" and ">" characters that don't actually DO anything

  16. Francis Fish
    Boffin

    You can do trees in CSV

    I've been on the receiving end, it means that there are relationships and ordering assumed by the CSV as in record type A is the head of the tree, B a leaf C a leaf of a leaf etc.. Seen this with geographical data supplied as CSV.

    It stops being funny after a while (debugging and testing is a nightmare) and you wish they'd just used XML, honest. It may be "slower" but the time you spend reconstructing the trees and wondering if the hierarchy is deep enough (as in some committee would have to add a leaf type F for example) is more than repaid by using structured data.

    That said, haven't read about the Google thingamabob and don't care. Structured data for structured things, CSV is fine for long lists of heterogeneous stuff that doesn't have relationships between records. Horses for courses and all that jazz.

    (Aside: Firefox's dictionary likes thingamabob, at least for UK dictionary).

  17. Anonymous Coward
    Pirate

    @Gav

    No wai homey, 50 Cent is too cool for this shiznitz, they've hired Eminem instead.

    Disappointing, what could have been a decent article was ruined by the style and should have been tagged NSFW. Lose the filthy language, young padawan.

  18. Jon Green
    Dead Vulture

    Re: Lame

    More like, embarrassing in a "Richard Madeley does Ali G" kind of way.

  19. Anonymous Coward
    Go

    pretentious little shit ...

    ... wait, do you work at the same place as me ...?

    as for the swearing, I think it adds a great deal, keep it up.

    good article, more (reality) please

  20. Anonymous Coward
    Heart

    WTF is happening to The Register?

    First all the "Climate change a pinko plot" posts and now "no swearing please"?!??!

    This:

    "Think of how scalable that shit's gonna be. You'll put a real hurt on all that imaginary load your system is taking. Then, you get to go home and fuck the prom queen."

    is the first thing I've laughed out loud at in the Register for months. More please. And add swearing in the _middle_ of words next time.

  21. Anonymous Coward
    Anonymous Coward

    No free speech here please

    Can El Reg contributors please refrain from using words that I personally find offensive, and only use a writing style that I personally like.

    I'm all for free speech but there is a limit, and it's whatever point I personally find unnecessary.

    I am mentally retarded, and while I can program a computer no problem, I can't quite understand freedom of speech.

    Right, back to Fucking The Prom Queen (XBox, £29.99).

  22. Anonymous Coward
    Anonymous Coward

    how many reps?

    It does suggest a boy trying to talk dirty and not quite getting it right. Forgiveable I suppose, but it ends up ludicrously with "flex your nuts".

    1. What?

    2. How many pounds?

    3. How many reps?

  23. Pete
    Coat

    Whiney Bitches

    Just had to put in my two <local-minor-denomination>s for the affirmative team - I found this article extremely amusing. Frankly I'm amazed people still see this 'language', let alone find it offensive. Indeed, this article reads exactly like conversation from our office.

    Mines the one with the fur collar and cane.

  24. Anonymous Coward
    Anonymous Coward

    Trees and CSV

    Windows .ini files, yes, they're not really CSV files are they? Anyway, bollocks to trees, where it really gets interesting is graphs. XML has a (semi-)standard way to reference another object in the tree this by tightly defining the "id" tag as a unique identifier for a node. Or by using XPath.

    There's no standard way to do this in CSV, and I haven't seen this in ASN.1 yet (but maybe it's there). Either way, serializing an object graph is the general case, and while CSV, .ini files, Apache-style config files and so on are probably simpler for specific cases, if you want a standardized catch-all solution then XML/XPath is one option, and maybe this google thing is another.

    Oh, almost forgot: "arse". Wouldn't want this comment to raise the tone.

  25. Anonymous Coward
    Joke

    Swearing...

    May I suggest some proper alternatives:

    1. Bollucks

    2. Swamp Donkey

    3. Chutney Ferret

    4. Bloody

    5. Arsehole

    6. By George!

    7. Shite

    Perhaps they would go over better with the tea drinking crowd here, Ted ;-)

    (Commence incineration in 5, 4, 3, 2....)

  26. Anonymous Coward
    Anonymous Coward

    Sorry, crap article

    Why?

    If your discussing a new technology, tell me what problems it solves, give me an example or two. Tell me in a factual way why it’s something I should invest my time and money in to learn. Don’t tell me I can go home and fuck the prom queen. Although I’m from the UK, I get the impression that most “prom queens” are unintelligent morons that care more about being popular and looks than getting an education. Frank Zappa’s “Valley Girls” springs to mind, “Gag me with a spoon” indeed.

    “Thrift failed to gain heavy traction because its name isn't terribly cool, nor does it give way to an acronym that contains the letters J or X.”

    Ok, that’s disturbing, Teddy boy says that a technology failed because it’s name was not “cool” and failed to have a descent acronym.

    I am not an aggressive guy, far from it in fact. But, had I been in a meeting with my peers and Teddy boy piped up with that little nugget, I would not be able to stop myself from saying BOLLOCKS!

    What programmer, worth anything, would base his or her choice of technology on a bloody name or acronym? I don’t care if it’s called “Tinkerbelle Web API Technology” or T.W.A.T. If it’s good, and solves problems it’s worth learning.

    As for the AC “A couple of points” commenting “It's just nice to see that someone as obviously cool and trendy as google has finally seen through this most transparent (or should that be obfuscated and opaque) horror.”

    Sorry, the same goes for you, “Cool” and “Trendy” don’t wash.

    Like it or loathe it, XML is here and understanding it is currently a better career choice than refusing to. Arguing that Google or anyone else has a better way won’t get you far unless it’s adopted by the masses.

    I, for one, wont spend my time to learn Google’s answer because a somebody tells me its “cool” without first backing it up with some real world examples.

    Yours,

    Pad.

  27. J-Wick
    Thumb Up

    "I'm about to drop some science up in this bitch."

    Loved it. Srsly.

    Re: Pad's comment on prom queens. You're right. Dumb, (usually) blonde, bimbos. That's what makes 'em great, for nocturnal activities, not long term relationships. Or even extended conversations the next morning...

    /in b4 anyone correctly points out that a prom queen wouldn't have touched me with a bargepole back then. Or now.

  28. Tim Parker
    Thumb Down

    What dross...

    Ted - grow up.

    I have to agree in part with many of the previous comments, that the amount of (seemingly pointless) swearing detracts from the "article" (or, more accurately, rant). Where I disagree with some is in the amount of fact in article - it seems almost completely devoid of it and I could similarly find no meaningful technical examination.. just a few mumbled protocol names.

    The "reason" for protocol buffers was not (just) that XML was too slow - which you might realise if you took your head out of your arse and read a bit.. and if you don't want it, then don't use it... it's just another serialising framework with some features that may prove useful to some people. It may be of no use to you, overly complicated or restrictive.. or the dog bollocks.

    Please don't let what appears to be your intolerance of anything from Google to colour what this might mean to everyone else. I have no axe to grind either way - but don't see much usefulness in such poorly written, technically empty, mindless nonsense either.

  29. James Butler
    Boffin

    Details

    Unless you do your own research, any article in any journal will need some fleshing out with original source information. This romp is no different.

    http://code.google.com/apis/protocolbuffers/docs/overview.html

    "Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

    * are simpler

    * are 3 to 10 times smaller

    * are 20 to 100 times faster

    * are less ambiguous

    * generate data access classes that are easier to use programmatically"

    As the author notes, Google is long on developing technologies (or honing existing technologies) to address issues of scale. "3 to 10 times smaller" and "20 to 100 times faster" are quantifiable benefits in a large-scale operation.

    And for those of you who are not familiar with the concept of "serialization" ... stay away from this technology until you graduate. It's a very useful tool for storing and passing complex data relationships in non-volatile packages.

    Oh ... and here's the obligatory: "craphole" ... although I do hope this trend of including curse words becomes optional in the near future. I did enjoy the surprise element during my reading of this article, however, and find it hard to argue with the "pretentious" digs and the comments about the realities of scale faced by the vast majority of programmers.

  30. kain preacher

    50 cent

    Actual can do an interview with out street talk. You're think more like ghost face from wu tang. You look at the words individually and you know its English, string to gather and you swear he speaking a forcing language that uses English words .

  31. Benny
    Pirate

    Drunk?

    Um..were you intoxicated whilst writing this!?

  32. Pete
    Coat

    Ted Dziuba: Uncov

    How did no one notice that this is the uncov guy?

    Thank you, El Reg, for making a deal to hire the greatest technical writer of all time.

    Ted, keep up the swearing. I have missed you.

    Mine's the one that has been shooped.

  33. Destroy All Monsters Silver badge
    Linux

    Summer is here!

    ...so pundits and web commentariat take the time to dwell on well-known truths. Today:

    "The right tool for the right job"

    Use a heavy tool with a deep-green ecosystem for a complex, standard or gonzo job. Drop baggage as needed. XML is king here.

    Use a simple tool for a simple "XP-style" job. Hand-code as needed, you may even get the escaping correct. INI files go here.

    Use a specialized tool in the high mountain ranges in which additional costs and adventurism are justified. Only for sherpas with snow googles.

    For fun, take a week off and get into Lex, Yacc or ANTLR to flex muscles.

    And we finish with a link about file formats: http://www.faqs.org/docs/artu/ch05s02.html

    I notice we are out of proper serialization country, but so what. No swearwords were encountered.

  34. CTG
    Heart

    XML is God

    I wrote an app a couple of years ago that had 75,000 lines of Java code, and 100,000 lines of XML configuration. Okay, maybe that was a bit extreme, but it was very cool.

    There is nothing "wrong" with XML - it meets all of its design goals quite nicely. All those critics of XML should read the XML spec some time, and look at the 10 design goals (my favourite: Terseness in XML markup is of minimal importance). If XML is *used* in ways that don't meet its design goals, that's hardly the fault of XML itself.

    Take SOAP, for example - it is a hideous XML dialect, but that wasn't the fault of XML, rather it was the fault of the designers of SOAP.

    XML processing doesn't have to be slow. If you use XML with Java, check out the JiBX framework (http://jibx.sourceforge.net/) - XML binding that is 10x faster than most other binding frameworks. You can even get hardware appliances to speed up XML processing if you really need scalability - look at IBM's DataPower box.

  35. Anonymous from Mars

    Content

    You're all surprised because this is how Americans slang it up.

  36. Joe Cincotta
    Flame

    Is this ZDNet?

    Protocol buffers are just silly - you have XML when you need a self-describing document with a schema (pick a style) and JSON when you don't. If you're going to need performance, stream it - I mean this guy (at Google) is comparing a heavyweight DOM parser with his PB implementation! If you need smaller size, down the wire, GZIP or RLE (for speed) it then just SERIALIZE THAT if you're trying to do some binary transfer!

    The whole point of XML was that it was text based. XML is used as a first class citizen in several frameworks for creating data access classes and even as intermediaries when creating data access models. JSON is a first class citizen on the client side so why create a new bleeding format? Answer: who cares? A file format is a file format is a file format. If you're sitting that far on the bleeding edge you're bound to get your bollocks sliced off sooner or later - so if you are doing any kind of software development which makes money then this API is probably not for you - for the next 24 months at least.

  37. James Anderson
    Happy

    Good Article -- but not news

    Ever since the birth of XML people have been rubbishing it as too verbose -- which it is, human ureadable -- which it is, and too slow -- which is sort of true more anon,and promoting XXL hamburger junky sized messages -- which is true.

    They then propose alternatnatives JSON, YAML etc. etc. which nearly all turn out to be more of the above but with smaller message sizes.

    The problem really is XML was there first, it works, and is well supported, and none of the alternatives are actually any better.

    XLM parsing is slow, a problem made worse by neophites using DOM parsers to read one or two attributes when a stream parser would do the job in a tenth of the time. However JSON, YAML etc. etc. are also quite slow, in theory they should be faster but XML has been around a long tilme and a lot of clever people have been optimising those parsers, plus while building a DOM structure is slow it speeds up hte rest of your program as access to complex data is simpler and faster.

    XML messages are big --not actually a problem unless you are stuck on an old X25 network or a 16K modem. In most modern networks the latency time ( "the please mister can I send my data now?" wait time ) is nearly as much as the time spent sending a message. This tends too nullifyu any speed advantage for smaller messages.

    I recently read one of those IBM Developer Works articles in which the author was trying out a CORBA over SOAP workaround to get an RPC call through a firewall. Being concientious he did some performance testing and was surprised to find that RPC over SOAP was faster than the native J2EE Corba/RMI interface. Bearing in mind that you will never see the words "optimize" or "effciency" in the WS** or SOAP manuals this is pretty good.

  38. Anonymous Coward
    Thumb Up

    Brilliant

    Only five minutes before I read this article, my very own PLS was spouting The Truth about serialization and Google's Protocol Buffer at me while I was desperately trying to absorb coffee.

    And then I lost it all over my keyboard.

  39. amanfromMars Silver badge

    Jihad Crusaders .... Infidels to Fatima ..... Sub-Prime Coders.

    ""You define your object in terms of its primitives in a special language"

    Ahhh, gotcha. Perhaps English, next time?" ... By Jon Green Posted Monday 14th July 2008 15:26 GMT

    ITs Raw Python, Jon. An ACQWired Passion with XXXXCentric Taste. QuITe Necessary for the Full Monty in Virtualised Space and ITs CyberSpaces.

    But ITs Transparency makes IT AIdDoddle to Reverse Engineer for Original Source .... OriJinn. Wholly Spirits? Soul Mates?

    Which is just an Opinion Reflected and Reinforced/Enriched for Fact.

  40. Anonymous Coward
    Paris Hilton

    The bigger better shitter story

    I was disappointed in the article - the bigger story about immovable XML-hair-shirts and spindly Googlie-disciples was hinted at but ruined with an unconvincing style which robbed the technical aspects of any authority. Calling someone a pretentious shit kind of marks you out as one of the same. Even if they actually are pretentious shits it's such a self-defeating phrase, such a mealy mouthed geek-nerd retort of unnerving snarkiness it makes you sound like a petulant 5 year old.

    So far no-one on the XML side has explained how to deal with fast, minimal, low-latency data; and no-one on the protocol-buffers side has detailed the limits of practicality to their approach.

    Has the list of things to argue about run so low we're having flamewars about XML ? XML ?! The whole internet is up in arms over one preferable method over another used as a framework for serialising data for transmission ??? !!! It might have started with Spectrum versus BBC-B but fuck me if it the religions of the internet haven't become so utterly tedious I'm not sure I want to logon any more.

    PH because she knows as much about protocol buffers and XML implementation as I ever want to. And she's a pretentious little shit!

  41. BristolSteve
    Stop

    Who is this article written for?

    The Register: it's alright for news, crap for technical content.

    If this is an effort at a technical article, please just stop now. This describes nothing specific about the technical content; it just demonstrates the author has "attitude". Not interested.

  42. Frumious Bandersnatch
    Thumb Down

    bad language aside

    The article (and Google's new meme) is making a mountain out of absolutely nothing. We all know xml sucks for serialisation. YAML is much nicer than ASN.1 or CSV. Lets have first-order objects instead of simple name, value pairs, please!

  43. CastorAcer
    Paris Hilton

    Has no-one read the Bile Blog?

    I'd have thought that at least someone would have commented on the similarities between this article and some of Hani Suleiman's more mild outpourings on 'Open Sores' and anything else that ticks him off.

    Paris - because she knows an angel dies every time someone uses a bad word.

  44. Anonymous Coward
    Thumb Up

    C'mon people...

    For all the people bitching about the technical content: this is a ***HUMOROUS ARTICLE***, not a technical one - I can't believe anyone could have made that mistake.

    If it's not your style of comedy - fine. Personally I thought it was hilarious, and it's exactly the sort of geeky humour that belongs on the Reg.

  45. E

    Wanna bet

    the people at Google used LEX to generate their parser?

    I just read over http://code.google.com/apis/protocolbuffers/docs/overview.html and I dunno quite what to say. Every couple of years we to see some gee-whiz system mapping classes onto 'generic' text descriptions stirred in with rehashed RPC. If the mapping comes from an egghead in a university nobody much gives a damn. If it comes from Google it is news. If it comes from Microsoft it's less news but it gets used a lot.

    Thus the power of money I suppose.

  46. Tim Parker

    Re : C'mon people

    > For all the people bitching about the technical content: this is a ***HUMOROUS ARTICLE***, not a technical one - I can't believe anyone could have made that mistake.

    Why can't you believe it ?

    ... because there *are* actually people in the world who don't follow the arrival of every so-say technical Messiah in the blogging world ?

    ... or because there's way too much dross of a similar style around that *is* trying to be serious ?

    ..or because it's not particularly funny ?

    Personally i'm going for all three but - as you point out - that's partly because this particular rant is not my style of humour (FMPOV bad-mouthed for the sake of, no technical jokes and making no point - just over-the-top Google bashing which is a pretty fucking easy target).. i've read some of his other stuff and it's generally pretty bloody funny.

    This. Is. Shit.

  47. Anonymous Coward
    Thumb Up

    lol

    funny stuff

  48. Destroy All Monsters Silver badge
    Coat

    The comment section is way better than the articule

    "We all know xml sucks for serialisation"

    "Your learning is simply amazing, Sir Bedivere. Pray explain again how sheep's bladder can be used for intercourse."

  49. Seán

    Fuck those whiny tools

    "I'm about to drop some science up in this bitch" awesome

  50. nin-mofo

    @Pete

    THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST THE BEST

  51. Steven Hunter
    Joke

    Nothing to add

    Nothing important to add other that at first glance I thought the title was "Google releases /sterilization/ scheme"...

  52. parabla
    Flame

    What a bunch of fake programmers you are on here.

    We have some on here saying there is no need for a new protocol, as JSON is suitable.

    JSON still text encodes numerics! The cost of encoding and decoding numerics to ASCII is the precise reason for binary representation here. It is not mathematically/algorithimically possible for the encoding and decoding of a JSON representation to be even vaguely in the same ballpark in terms of performance, or in the same ballpark in terms of storage.

    Try producing 10gb of data nightly in terms of analytics (my personal experience, credit and interest rate risk by tenor by position.)

    There are plenty of applications where there are appreciable reasons of performance to binary encode message passing, particularly if you are looking at the controller for a large grid of machines for instance, it will be I/O constrained on control messages. Frequently one needs to multiplex the controller for this reason.

    Those criticising the idea are surely programming mickey mouse applications.

    What I can't believe is that people with no experience of performance optimisation of systems which require these sorts of technologies log onto here to spout their ignorance to the world.

    What kind of fool would come on here and spout their opinion about how a highly performant specialised protocol is completely unrequired - it just shows that the person making the comments is clueless as to any significant programming issue of scale.

    Please stick to your Visual Basic Studios, your Java Server Faces, your low demand intranet applications. Please stick to your arguments about whether Eclipse or Netbeans is better, and how "evil" Microsoft are, or how wonderful the .NET runtime is, or whatever you polemicise regarding.

    Standardisation of binary encoded interchange is an excellent concept and highly applicable to a number of issues. Do you suppose that the binary stream that makes up the data packets which travel between the database client library and the database server are XML/text encoded?

    No of course not, they are binary encoded packets. Such a regime of standardisation of binary encoding allows for hassle free development of such protocols.

    I don't know what the rest of you people are doing if you are so performance insensitive.

    JSON comparable with ASN.1? The bloke that wrote that would be whisked out of the interview room before he finished the sentence.

  53. Ian Michael Gumby
    Paris Hilton

    Now I'm confused.

    Maybe its my age, but if the point is to do object portability, wasn't this first addressed by CORBA?

    So what is truly exciting and new here?

    Paris Hilton cause maybe I'm having a brain fart.

  54. amanfromMars Silver badge
    Alien

    Going to Work in an Egg and EggenFelden

    "I just read over http://code.google.com/apis/protocolbuffers/docs/overview.html and I dunno quite what to say. Every couple of years we to see some gee-whiz system mapping classes onto 'generic' text descriptions stirred in with rehashed RPC. If the mapping comes from an egghead in a university nobody much gives a damn. If it comes from Google it is news. If it comes from Microsoft it's less news but it gets used a lot." ..... By E Posted Tuesday 15th July 2008 11:38 GMT

    E,

    Are you telling us that some gee-whiz system mapping classes onto 'generic' text descriptions stirred in with rehashed RPC is Current? Who is in ITs Running?

    And re Reg Style Communications ........ well, XXXXCentric IT is but no Less than Transparent in Thought for ITs Deeds. HiBrow UnderGround/OverArching Network?

    Now that is AI Virtual IntelAIgents Hub, is IT not?

  55. Anonymous Coward
    Anonymous Coward

    @parabla

    Hi parabla,

    I have no idea if you include my post in your statements, and please don’t think that I am trying to provoke you in any way. Your post has interested me far more than the original article as you hint at some real world problems, where the article did not.

    Manipulating 10GB of data overnight is clearly a challenge whatever the means of passing the data around. Would you be able to elaborate on that more?

    Was the data exchange handled using XML originally?

    If so, how long did it take?

    What form of binary encoded interchange did you use? Was it proprietary, or did you choose an existing format?

    What speed-ups did you see?

    If you used XML before, how hard was it to change the apps to the new format?

    Did you hit any pitfalls or did the changeover go smoothly?

    I hope you can take the time to respond, as these things are clearly an issue when dealing with large amounts of data and increasing performance.

    I still say I wont bother to spend my time learning a technology just because somebody says its cool though. That’s why I’m asking you, because you clearly have some experience of this.

    Kind Regards,

    Pad.

  56. BlueGreen

    @Now I'm confused - thrift Vs corba

    Before I start, generally one shouldn't be dealing with serialisatioin except as an abstracted layer which you can cleanly and easily swap out for another, thus rendering the details (whether XML/ASN.1/roll-yer-own) irrelevant.

    Probably mentioned 50+ times above but bears repeating.

    Now to the point, I came across thrift before and was curious about where corba fell down in the eyes of the thrift developers. I wrote to Mark Slee thus (slightly redacted):

    ---

    quite by chance I came across your paper Thrift: Scalable across

    Language Services Implementation. I found it very readable and clear.

    Your conclusions section makes very brief comparisons between Thrift and

    similar technologies. I've heard nothing good about SOAP, and COM is

    Microsoft-proprietary, so not with a bargepole. The comparison that

    does interest me is with CORBA. You describe it as debatably

    overdesigned, but is this a problem in itself? If you don't have to

    implement it, it's not your problem (granted, managing it might get a

    bit painful). You describe it as having a cumbersome installation;

    while this is undoubtedly not a good thing, it doesn't seem to be a very

    big thing and conceivably could be automated. The one description that

    may possibly be damning is calling it a heavyweight, but I don't know

    whether that refers to its installation size and complexity, or its

    performance. If it is an issue of performance then that's the killer

    blow, but if not then all other problems of

    overdesign/installation/whatever appeared to me to be minor -- at least

    compared with the effort of writing Thrift.

    So if you could spare a moment to elaborate, what was the problem with

    CORBA that forced you down the road to writing your own interoperability

    framework?

    ---

    He replied thusly (likewise slightly redacted, permission granted to post it here)

    ---

    1/ CORBA requires a heavy CORBA stack. The developer has to *learn*

    CORBA whereas Thrift aims to let the developer use standard native code.

    i.e. in Thrift you get to use HashMap in Java, std::map in C++,

    transparently.

    2/ CORBA does some really complicated things that aren't usually

    necessary. i.e. it has the notion of remote objects, rather than just

    thinking of it as sending data back and forth. This makes it slower and

    heavier.

    3/ It's not just that it's heavy to install, it's heavy to run and a lot

    more runtime overhead than Thrift, which is designed to be as

    lightweight as reasonably possible.

    4/ Doesn't have transparent application-versioning support like Thrift

    (i.e. changing object definitions on one side but not the other).

    5/ Not protocol-agnostic. Thrift lets you use Binary, or ASCII, or JSON,

    or XML and switch between them pretty easily.

    ---

    HTH

  57. Twm Davies

    Some more XML horror...

    http://twmdesign.co.uk/theblog/?p=94

  58. Solomon Grundy

    It's all Horrible

    Cause computers and everything to do with them is a silly waste of time and resources. Computers suck.

  59. Drew Furgiuele
    Thumb Up

    JESUS CHRIST, IT'S TED!

    Why the hell did you let Uncov.com lapse?! Write moar please.

  60. Pyros
    Alien

    I just realized something...

    Amanfrommars' wording comes across very strongly of a rather robust Eliza bot--but with a twist: there's actual external user input involved, between the initial assessment and output methods.

    Namely, someone's at the helm and is cherry-picking Amanfrommars' topics of choice to comment on.

    The choice of wording is still peculiar, but that's not hard to hardwire into an Eliza's runcode.

  61. parabla
    Happy

    @Pad

    The most demanding loads I've seen are the control of clusters of blades. Specifically we used compression to attempt to ameliorate this, but this created centralised CPU bottlenecks instead.

    i.e. independent of any specific grid software such as data synapse, particularly during job invocation of several hundred machines, there are I/O loads due to the verbosity of XML on the central machine which is issuing commands.

    Some of our messages were 50m or more, and we were providing thousands of jobs to run over a several hour period.

    The receiving workers are of course not overloaded, but the central machine is. One manner dealing with this is to compress the XML, however this creates now a CPU bottleneck in the pipeline.

    I am always extremely suspicious in such a circumstance of the costs of floating point / integer conversion to text. Likewise I am concerned with string composition from text messages (in optimal performance scenarios one would want a byte counted string in the string encoding, such that strlen() type traversal is unnecessary.)

    Lastly, the entire approach is representative of code generation approaches to software. If we think about RPC and IDL files which generate C stubs for RPC or CORBA for instance, code gen is a useful tool for generating on occasion entire tiers of the application stack.

    I have done similarly, using schemas related to the database to generate the entire data loading layer in my applications at work, which obviated 50000 lines of handwritten code, my generators were simple text creation tools and part of the build process.

    In this case, it appears that Google have defined a standard simplified structure description language, which generates strongly typed message descriptions in your language of choice, and serialises to ASN.1 or something to that effect.

    I personally find the technology very interesting indeed, and for places within the application where there is tight coupling within an application suite and performance an important criteria, this will be under my consideration.

    If I suppose to compose the message in JSON however, I will not have a strongly typed class in C++ or Java, and my data tier will lack compile time testing.

    Moreover, I will generally write the packing and unpacking of the structure by hand, leading to risk of software defects, rather than be given a platform wide library which supports the packing and unpacking of byte messages in a variety of languages.

    This is 100% useful piece of software.

    BTW, not to denigrate XML, it has its purpose where human legibility is important, or the message or storage is flexible, and the requirements are fluctuating. Likewise JSON is the engine of AJAX....

  62. David
    Coat

    C++ Java Python

    Because there ARE no other programming languages.

    This is one area at least where XML has PBs beat,

    unless you are willing to embed Python where the sun dont shine.

    Mines the coat with the python suppository

  63. Reuben Thomas

    XML is not necessarily text

    I'm amazed that no commenter, let alone the article's author has made the point that XML is not text. It's simply most commonly represented as such. I presume that Google, having clever people, have other reasons for using a non-XML binary format. I can certainly imagine some: using a language designed to represent objects makes it much easier to give type guarantees, and their protocol buffer scheme might well make guarantees about the performance of the code it generates. The fact that it's not XML is interesting, but it has nothing to do with its use of binary formats.

  64. Henry Cobb
    Alien

    CSV trees

    The data is being stored in relational databases at some point so it's already been normalized, right? So you get your tree structure for free.

    Just send the data as a file of tables, first line is column names, then the data columns followed by a blank line between tables.

    Put a backslash before every backslash, comma and newline and you're good to go. Just gzip the message the message at the end.

    Perhaps we can get SETI in on this. Send out all the collected wisdom of mankind in relational format and watch the ETs keep their distance from the crazy planet.

  65. Eddie Johnson
    Boffin

    Incredibly poor naming

    Perhaps the reason few people understand the "serialization scheme" is that it is an incredibly poor name. WTF does this have to do with the root word "series"?

    An interface by any other name is still just an interface. I guess this gives the markettards a few more opportunities to monetize the transactions and leverage their investment to profitize a subpar ROI?

  66. Anonymous Coward
    Heart

    Keep up the swearing

    The novelty might wear off with time, but this one was funny - more please.

  67. Alexander
    Go

    missed you Ted!

    Glad to see you writing once again...it's refreshing to actually laugh when reading something tech-related. I won't say "don't let the whiners get you down" cause I know you wouldn't dream of it, the pretentious little shits.

This topic is closed for new posts.

Other stories you might like