back to article Facebook re-write takes PHP to an enterprise past

Facebook's re-written PHP to transform the dynamic language for fast performance on web-scale server farms without adding additional hardware. The site's engineers have announced HipHop, which turns the popular and dynamic PHP code into highly optimized but static C++ and then compiles it using the GNU C++ compiler, g++. The …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Joke

    Eh?

    "Facebook's re-written PHP to transform the dynamic language for fast performance"

    You could have fooled me! Actually, being a Facebook user, you probably could have ;-)

  2. Anonymous Coward
    Coat

    Hmmm...

    C++ programmers (of which I'm one) have often sneered at PHP programmers (of which I'm one).

    Where does this leave us? Do I have to stop sneering at myself?

  3. Yet Another Anonymous coward Silver badge

    Seems completely sensible

    >The company claimed it's cut the CPU use on its servers by up to 50 per cent

    C++ the eco-friendly alternative !!

  4. Anonymous Coward
    Anonymous Coward

    Personal use?

    Could this thing be used by web designers to create compiled executables which would then help protect their PHP source from being stolen by clients or other people on shared hosting? Some hosts allow you to run your own executables.

    At the moment you have to pay a fortune for software that turns PHP into bytecode files.

    1. dontstopnow

      Re: Personal use?

      "At the moment you have to pay a fortune for software that turns PHP into bytecode files."

      Have you tried eaccelerator, apc or my personal op-code cacher of choice, xcache? All of these are open source IIRC

  5. jim 45
    Happy

    unexciting I guess

    "C++ is traditionally associated with the reliable - but relatively unexciting - world of enterprise and server-side computing."

    Most of the code that really matters, in the world today, is C /C++. "Unexciting" things like Windows, OSX and Linux. Oh and the internet itself. And your browser. And your email program. Probably your phone too.

    1. Poor Coco
      WTF?

      OS X?

      Um, no, that's Objective-C.

      1. Anonymous Coward
        Happy

        OS-X

        Errr, not quite. OS-X applications are generally writen in Objective-C but the underlying OS (which is basically BSD) is written in C, and the vast majority of the standard Unix tools that it supports are written in C or C++

    2. Anonymous Coward
      WTF?

      Quite

      Yes, I read the same sentence and thought exactly the same thing.

      What does the Reg think most of the world's applications run on? Perl? HTML? Java? Visual Basic?

      Of course not - 'real' applications (ie - the ones that actually DO stuff and make your mobile phones, and your washing machines and your bank's huge computers, and TVs etc etc) run on C, and C++, and Cobol (yes - there's still an awful lot of Cobol out there), and (god help us) Pascal etc.

  6. Lusty

    next step

    mainframe...

  7. Inachu
    Unhappy

    I am confused by these reports.

    Growing up as a tene in the 1980's also read about programs that convert other programs to make them highly efficient optimized code or it translates them into something else.

    This is the problem I see with computer programers.

    Why must we convert,translate,optimise?

    If we just port everything into the lowest or higher common denominator which would be binary then there will be no need to do these things.

    When the guy at GRC showed how his programs run VS a C++ or similar program his programs were faster, smaller, used less memory because from the roud up it was written in assembly language.

    Fast foward today and there is still so much bloat that it still slows everything down.

    Then years ago here at the register I remember reading about the failed florida compression company who based their compressions on quantum math to fit a 600 meg fle onto a 1 meg floppy.

    Also I remember a company befouled of copyright laws when consumers would use the said product that would recomplie an executable to make it smaller and run faster and companies were up in arms about reengineering their product.

    Seems in all this everyone forgot about the customer.

    1. Anonymous Coward
      Anonymous Coward

      Want an explanation? Here's one.

      Ever heard of the xor compression scheme that can compress everything to just one byte? No? Look it up, that quantum thing is another leaf from the same book. On with the show.

      Writing assembly has a few drawbacks, such as being time consuming, tricky, bound to a single cpu architecture, prone to exhibit "wrongly sized carpet syndrome", and most of all, lacking high-level abstraction facilities. You can build them, or you can use prefab ones. C was so successful because it was basically the next layer up in abstraction. Of course, there is a cost. What you lose in raw speed you gain in abstraction. C++ is C underneath with some fancy abstraction things added on top. Using them wrong can be horribly expensive in terms of cpu cycles wasted, but using them right can help organize very large programs, doing fancy algorithmic optimisation tricks, or things like that. Also, if your program is large enough a few wasted instructions tend to be less important than using more efficient algorithms. And unless the program usage is high, programmer time is almost always more expensive than cpu time.

      Also, there's programmers, and there's programmers. Some can write large programs in assembly, and some you don't even trust with a toy language like PHP. Yes, PHP is mostly a toy language, written as it was by big-eyed web two-dot-oh teens and taken from there, and is in many ways that affect scaleability and efficient use of programmers inferior to so many other languages, but its one huge advantage is popularity, so hiring programmers that can produce programs written in the language is easy. They'll be mostly mediocre programmers and the constructs they use aren't likely to be the most efficient ones even inside PHP, but with them facebook could move their business forward. And that's all that matters for a startup.

      Now, of course, they ran into the limits of their early beginnings and they hired a bunch of very smart people with a better tool to, well, take their programs and speed them up a bunch.

      Of course, had they stopped and thought for a few moments way back when, they could've organised their systems such that making pages would end up generated on each change to static data that can be served right off of a hard drive instead of needing to be generated on each page view, something that happens much more often than change. Had they been proficient in a more powerful language they likely also would not have used PHP in the first place. Then they wouldn't have had to pull some "skunk works" trick out of a hat and they wouldn't have a jubilous share-with-the-community feel-good press release.

      Bottom line: No, they didn't forget about the customer. But good engineering it lacked from the start. So, I might have a look at the code just to see what they did, but I'll be doing my damnedest to prevent ever needing any such thing.

  8. Tom Maddox Silver badge
    Coat

    Let me be the first . . .

    . . . in the inevitable flood of comments which say, "Who even cares? I don't use MyFaceTwitJournal because of a) privacy b) being far too cool c) my massive, engorged, throbbing penis. Ladies, I'm talking to *you*!"

  9. bikerboi87
    Thumb Down

    not a good advert

    i doubt facebooks reliability problems recently are a very good advertisement for this

  10. Adrian Crooks

    It's not the language, it's the compiler/processor

    So their solution was not to re-architect their system and code, but instead switch to a different language compiler. This seems like a slick workaround to the problem of native code compilation since they only have to support a high level -> high level conversion instead of going all the way to binary.

    The funny thing is that this unfortunately shows how much room there is available for the current PHP engines to improve.

    1. Disco-Legend-Zeke
      Go

      Compiled Code...

      ...especially three or four pass optomized code, will save a lot of carbon. What consumes electrical power in a computer is charging/discharging the wires between memory and the CPU. The better the code, the less power. Simple.

  11. Charlie Clark Silver badge
    FAIL

    meh

    PyPy is much more interesting:

    http://codespeak.net/pypy/dist/pypy/doc/

    Want speed? Go stackless.

  12. Anonymous Coward
    WTF?

    Why not?

    Why not write the whole facebook in C++? Or re-write the PHP interpreter from scratch? Why spend time converting a crap language to a complicated language?

    Every level of indirection kills half the optimizations you could've done. If they got 50% improve, it means:

    1. PHP interpreter is *really* crap. No news here.

    2. Their compilation infrastructure is a nightmare

    3. They could get at least another 50% by writing in native C++

    But, of course, PHP "programmers" are twice as easy to find and half the cost of C++ programmers... why bother?

    1. Pandy06269

      C++ != web

      "Why not write the whole facebook in C++?"

      Because C++ is not a HTML-oriented language. PHP was always designed from the ground up to handle HTML. The fact it can run on the command-line is a side-effect.

      "PHP interpreter is *really* crap. No news here."

      Where's your evidence? The reason Facebook are getting a 50% speed increase is because instead of PHP compiling the code down to C++ on *every* request, it's only doing it once, then just running the already-compiled application.

      They've effectively turned an interpreted language into a compiled language - and everyone knows compiled apps run quicker than interpreted apps.

      Even PHP obfuscator/encoders only optimise the code for the PHP compiler, it still has to be interpreted.

      "They could get at least another 50% by writing in native C++"

      Then they'd also have to interface with every module they use that's available in PHP (memcache, the HTML functionality, cookies, MySQL/PostgreSQL/SQL Server access etc etc.) Plus an Apache module to understand what the C++ app is telling it - redirects, cookies etc.

      "But, of course, PHP "programmers" are twice as easy to find and half the cost of C++ programmers"

      Totally agree with this.

      1. Francis Fish
        Paris Hilton

        Don't confuse the legions of script kiddies with real programmers

        PHP is a nice flexible system that works very well.

        There are a lot of people who "know" PHP, as in they can hack together other peoples (sometimes wrong) recipies from the net - this doesn't mean that there aren't quite a few competent people who know how to code producing good stuff very quickly without the Java or .Net bloat.

        It's an easy community to join, which is good. But that doesn't mean you can't do serious things with it.

        I prefer Rails personally, but have been working on some very large PHP based systems recently. It is very good when used by professional programmers.

  13. Anonymous Coward
    Thumb Up

    facebook sux BUT...

    I am interested in a PHP compiler, sure beats a code obfuscator!

    Runtime speed, not so important. Retaining control of my code, yes pls..

    Inachu, portability is the one-word answer to your query. If everything was written in assembly (eg. in CPU-specific code) then it would need to be re-written if it was to be run on a different CPU. Also, dunno if you've looked at assembly, it's quite complicated, higher-level languages such as PHP save us from having to deal with that. Certainly, Steve Gibson is a whiz, not many people can program like he can, though.

  14. Daniel B.
    Boffin

    At last!

    My main problem with most scripting languages is that I believe that interpreted languages add unnecessary overhead on the CPU that could be spent executing instructions, especially in a production environment where code is changed once a week but executed a zillion times between code changes. It isn't like I'm about to change my code every time I run it!

    1. Kristian B
      Alert

      @Daniel B.

      @Daniel B.

      What are you talking about "interpreted languages add unnecessary overhead"?

      Anyone using PHP in a production environment uses an accelerator (optcode cache) so unchanged scripts are immediately executed and not re-parsed and compiled on each request.

      Do you honestly think sites like Yahoo have their servers wasting time interpreting PHP source on every page request?!?

  15. Anonymous Bastard

    "work on PHP caching in the applications server"

    So it's an opcode cache that goes a bit further. Do I understand that correctly? I guess the webserver then communicates via good ol' CGI.

  16. Paul Shirley
    WTF?

    C++ is traditionally associated with...

    Really? I associate it with the exciting world of game programming, where any CV not starting with 3+ years of C++ flies straight from envelope to bin.

    Or it would do if the bloody education system actually taught the language instead of whichever scripted/late bound/JIT based monstrosity happens to be in fashion that year. So we end up accepting self taught C++ hackers with pitiful skill in the language because even that's better than experts in those other languages.

    1. Anonymous Coward
      Anonymous Coward

      You could always train them

      You know, get them 3 years earlier (and cheaper) and train them yourself.

      It can be better than waiting for someone else to train them for you.

  17. Anonymous Coward
    IT Angle

    Que?

    "...the new generation of scripting languages such as PHP" ??

  18. Filippo Silver badge

    write native - or don't

    Inachu: there are many, many excellent reasons not to use assembly, and as projects get larger and more complex and computers get more powerful, those reasons keep getting better and better - to the point that hardly anybody uses assembly even in performance-critical applications. Performance is important, but it's not the only thing. Nothing is ever "the only thing".

    Renato Golin: the article explains why they are not going native C++.

  19. Sentient
    Thumb Up

    Interesting

    It makes a lot of sense. They can keep using PHP as a nice and easy programming language for the devs and compile it to C++ so that it can be compiled to machine code which executes faster, hence uses less power.

    I don't see anything wrong with it.

  20. Mr Tempest

    @Daniel B.

    On PHP-driven sites for which scalability or performance have ever been an issue, scripts are not loaded / parsed / compiled on every request. The compiled opcodes are cached in memory, and the scripts are only re-compiled when instructed to do so, or (optionally) whenever the script is changed.

  21. Christian Berger

    Buffer overflows ahead

    C++ is surely better than PHP, but the number of people actually fully knowing the language is about double digit. C++ is just a moving target which takes 4 years to master and changes completely every 4 years.

    I would have rewritten it in a language like object orientated Pascal. It's compiled, but features functions like rangechecks. I have tried it on numerical applications and haven't been able to measure any significant loss in speed.

    1. on_the_road_to_tau

      ..if you like OO Pascal...

      ...take a look at Ada (esp. Ada2005). OO - even for tasks! - a much larger user base, free on Win, Mac and Linux (search for "GNAT Libre").

  22. Hatcat

    C++ use

    "C++ is traditionally associated with the reliable - but relatively unexciting - world of enterprise and server-side computing."

    Let's not forget games. I spend a quarter of my life developing games in C++. Nothing else touches it.

  23. lukewarmdog
    Badgers

    Ask and ye shall receive

    "Recordon said Facebook wants community feedback."

    You're still shit mate.

  24. on_the_road_to_tau
    WTF?

    "C++ is traditionally associated with the reliable"

    Surely this is a strange, new usage of the word "reliable"?...

    1. Anonymous Coward
      Anonymous Coward

      "C++ is traditionally associated with the reliable"

      "C++ is traditionally associated with memory leaks and nightmare-ish security holes"

      Fixed that for you.

      1. Anonymous Coward
        Thumb Up

        lol

        Mark Zuckerberg likes this...

    2. Steven Knox
      Thumb Up

      You beat me to it.

      My sentiments exactly. In fact, I speed-read the sentence as:

      "C++ :: reliable :: enterprise and server-side computing"

      and couldn't stop laughing for several minutes.

  25. Anonymous Coward
    Anonymous Coward

    @inachu

    "If we just port everything into the lowest or higher common denominator which would be binary then there will be no need to do these things"

    Yeah , well good luck with that. While its possible to create a binary just using a hex editor you'd probably take all day just to get Hello World to work. Now scale that up to a word processor.

    Also people occasionally mention that we should use assembler more often but they're generally people who grew up hacked Z80 opcodes. But comparing Z80 assembler to the monstrosity that is the opcode list for the latest 64 bit Intel or AMD chips is like comparing a bicycle with the space shuttle. I've coded in 32 bit x86 and believe me - its not fun. There are so many opcodes (and it doesn't help that Intel and AMD seem to add new ones every year) , not to mention effectively 3 processors on the same chip (standard register based x86 opcodes, stack based floating point opcodes , SIMD based parallel opcodes) that its got to the point that a human can't efficiently code in assembler on these architectures.

    And thats just 1 processor type - now you want to port your x86 assembler to SPARC. Instead of a few #defines being added and a recompile of the C++ code you now need to start from scratch again.

    Theres a good reason assembler is generally avoided like the plague for anything higher level than OS code these days.

  26. This post has been deleted by its author

    1. Mr Tempest
      Boffin

      Re: C++ Can't do...

      eval() is a very special case, and my guess is that HipHop doesn't allow it, as any website described as 'highly optimised' won't be using it.

      For any non-PHP heads, eval($string) evaluates $string *as PHP code*, allowing a script to build extra lines of code on the fly, and execute them. If that sounds like a bad idea, that's because it is: not only does eval() produce self-writing spaghetti-code, it also negates your opcode cache, because evals have to be evaluated(!) prior to compilation.

    2. Paul Shirley

      @C++ Can't do...

      When people say this they mostly mean 'it doesn't do that out of the box'. There are very few things C++ cannot do, with enough effort. Reflection is the main one because the information simply isn't there in the binary to allow it but if you need reflection in a production app you got the design wrong!

      There are things its harder to do (the 1st time) in C++ but hard != impossible. Good C++ programmers also know that other languages can get the coding done faster and very occasionally produce better run times. Something the script kiddies seem to struggle to understand...

  27. Anonymous Coward
    Boffin

    MySpace went through a very similar process

    The original MySpace was build in Adobe ColdFusion, which compiles into Java bytecode. For performace reason, and because of Adobe's lack of commitment to the CF platform, MySpace switched to the BlueDragon CF engine that compiles to (horror of horrors) .NET bytecode instead.

    Which as an interesting side effect, allowed MySpace to gradually switch to ASP.NET without having to ditch any of the existing code.

  28. Steve Martins

    php optimisation path

    Sounds to me that this is a benefit to PHP, rather than showing how inefficient it is. Now we can knock up a quick scripted project with little up front investment, then if it flies there are ways and means of optimising all the way to static code. Yes there may be more elegant solutions, but for a business this is surely a good thing.

  29. Anonymous Coward
    Thumb Down

    HipHop

    This must be the most stupid naming for anything, ever

    1. Anonymous Coward
      Anonymous Coward

      Word!!!

      <EOM>

  30. antoobrien
    Thumb Down

    Agile is not for 'hacking code'

    "Agile development methodology of hacking code for short project cycles."

    Obviously written by a hack that doesn't know what agile is. I've been using agile methodologies for years on enterprise scale apps and if anything proper development practices are even more important in agile.

  31. Anonymous Coward
    Anonymous Coward

    So...

    They've written a program to take SQL injection vulnerabilities and convert them into buffer overflow vulnerabilities?

  32. Anonymous Coward
    Anonymous Coward

    remember c++

    I look forward to the day we get to move from the thouroughly modern, visual c 1.53 native win32 applications that we maintain, onto a shiny and new c++ :)

    More often than not, companies with inhouse IT want their old applications amending and updating, rather than building completely new ones. A lot of the core applications, that keep them running day to day, are in c/c++, cobol, assembler, etc as they generally live with what they've got, tweaking it a bit here and there. If they've been around a while, they probably had systems done in the 60's and 70's that were given shiny new c/c++ front ends into the 90's when pc's became common. With a mishmash of VB6, php, asp and bits of .net tacked on for the stuff that has been added over the years.

    C++ isn't going anywhere for a long time!

  33. John F***ing Stepp

    Why didn't they do it in (your favorite language.)

    Borland used to have one compiler and several interfaces for it.

    So Turbo Pascal and Turbo C and even Turbo Basic gave you damn near the same Hello world.

    Personally, I liked C and hated the hell out of Pascal; but it is all in what you like to program in.

    (javascript rather than PHP for instance; yeah I know server side rather than client side. We compile the server side stuff.)

    On of the reasons I really like C++ is that you don't absolutely have to use OOP; there are times you want to do a stupid fast program and not have to think too hard to get it done.

    So, grab off a buncha memory for the two or three variables, hack a few lines in there and. . .

    Smoke test time!

    (Ok, sometimes I have to reinstall an OS, and occasionally the hard drive or motherboard.)

    This is why they made microwave popcorn.

  34. Ed L
    Thumb Up

    @C++ Can't do...

    To Tempest and Shirley,

    I would hasten to agree with you that using techniques like this probably are not best practice, and there is always another way of accomplishing the desired outcome. However, sometimes it makes sense as the code is so much more elegant or easier to understand than the alternative method.

    Ultimately of course, anything should be achievable in any language as it all ends up as machine code instructions in the end!

    I think you are right, they probably would just leave this out of HipHop for the reasons discussed above.

    The optimisation (cache) side of things I had not actually considered before, but now I can see that you would probably want to work around such a method if your application is going into the big time... Thanks for your insightful comments.

  35. Anonymous Coward
    Grenade

    it's what you do with it that counts.

    As a software engineer one of your skills should be to determine what language out of the many available is most appropriate for the task at hand. As an example, in a game you might choose Python to handle AI and scripting events as this will let the level designers play around and expand the functionality easily without recompiling or using special tools, but things like the graphics engine and networking code will more likely be in C or C++ for performance reasons. For a web based application that is expandable and frequently changing you would be insane to even think about using C++ (I'm not even sure how you would go about it).

    That so many of you seem to treat languages like football teams that should be 'supported' makes me happy because, lets face it, I'm not going to worry about losing my job to an idiot.

  36. Harry Tuttle
    Headmaster

    Depends on the quality of HipHop's design

    ~~~~| Stage(1) ~~~~~~~~Stage(2)

    ~~~~| HipHop ~~~~~~~~~~GCC

    ~~~~~| PHP ----------------> C++ ------------------> Binary

    We know stage(2) is mature, stable, reliable, powerful, flexible.

    What do we know about stage (1) ?

  37. Pirate Dave Silver badge
    Pirate

    Heh?

    I thought the rise in interpreted languages on the Internet was largely down to the fact that it was supposed to be "safer" as far as buffer overflows and the like, compared to the old way of writing web apps in native code and using a cgi interface. So now Faceplant wants to reverse this and go back to running native code? Why does that seem like a super bad idea?

    1. Anonymous Coward
      Linux

      There have been variants on this approach for years

      So ..the traditional sped up approach,

      1)Front end application implemented as some scripting language web interface

      2)Interface layer implemented as scripting language extensions

      3)Backend end libraries in C & C++

      Quick to develop, easy to enforce constrains in the API layer and access to your favourite scripting language for the glue.

      Instead, we get,

      1 )Front end application implemented as some scripting language web interface

      2) Interface layer implemented as scripting language translated into native code

      3) Backend end libraries in translated C & C++ interfaces to native code

      Not exactly a new approach, but interesting that they are targetting C++ as opposed to C,

      perhaps they are preserving more OO constructs or simply they want to generate STL aware code.

      For language generation, a lot of people (myself included) use C as an output interface when knocking up tools for code generation, SWIG seems to be the best known example of the technique, despite the unreadable code it generates.

      Personally, I have long made the assertion that java is the perfect language for use as an output generation language, e.g truly labourious to write manually but with enough redundancy in the output to make it easy to generate individual statements.

      Eclipse and other IDE's do this on some level already, but nothing seem to expose the generation layer to developers in quite the right way.

      ttfn

  38. Steve Evans

    Well...

    Just as long as they manage to fix the ever growing URL bug that seems to curse me the moment I try to tag a photograph and requires a manual cropping in the address bar to get things back to a vaguely normal point.

  39. Anonymous Coward
    Anonymous Coward

    Tbh, i dont really understand much of that article..

    ...but what i gather is that they are trying to make facebook more reliable, and less buggy....well sorry facebook, either im not that 90% running on this new HipHop structure, or it just doesnt work becasue i still have problems with facebook, it can be slow, when i get notificiations i go to view them and then they're not there, i go to view pictures to be told there are no pictures when clearly there is and that bloody "Oops" error message. Tbh facebook you should sort this out, because for me who uses facebook alot its very frustrating.

    Rant Over, just to make it clear Fbook is better than Bebo or Myspace :D

  40. Andy Watt
    FAIL

    Facebook FAIL - is HipHop a way forward?

    Hmmm. I'm now locked out with the all-encompassing "maintenance - try again in a few hours" cop-out. Total fail FB, it was stuttering for a few days then gave up completely. Is it just me? I don't think so...

    The big question - was this system properly tested for the scalability and speed advantges they desperately wanted, and were they stupid enough to roll it out simultaneously to (yet another) unwelcome UI change?

    Another suit idea, filtered through geeks, which results in shit pouring on the consumer...

    Hmmm.

This topic is closed for new posts.

Other stories you might like