Facebook's re-written PHP to transform the dynamic language for fast performance on web-scale server farms without adding additional hardware. The site's engineers have announced HipHop, which turns the popular and dynamic PHP code into highly optimized but static C++ and then compiles it using the GNU C++ compiler, g++. The …
"Facebook's re-written PHP to transform the dynamic language for fast performance"
You could have fooled me! Actually, being a Facebook user, you probably could have ;-)
C++ programmers (of which I'm one) have often sneered at PHP programmers (of which I'm one).
Where does this leave us? Do I have to stop sneering at myself?
Seems completely sensible
>The company claimed it's cut the CPU use on its servers by up to 50 per cent
C++ the eco-friendly alternative !!
Could this thing be used by web designers to create compiled executables which would then help protect their PHP source from being stolen by clients or other people on shared hosting? Some hosts allow you to run your own executables.
At the moment you have to pay a fortune for software that turns PHP into bytecode files.
Re: Personal use?
"At the moment you have to pay a fortune for software that turns PHP into bytecode files."
Have you tried eaccelerator, apc or my personal op-code cacher of choice, xcache? All of these are open source IIRC
unexciting I guess
"C++ is traditionally associated with the reliable - but relatively unexciting - world of enterprise and server-side computing."
Most of the code that really matters, in the world today, is C /C++. "Unexciting" things like Windows, OSX and Linux. Oh and the internet itself. And your browser. And your email program. Probably your phone too.
Um, no, that's Objective-C.
Yes, I read the same sentence and thought exactly the same thing.
What does the Reg think most of the world's applications run on? Perl? HTML? Java? Visual Basic?
Of course not - 'real' applications (ie - the ones that actually DO stuff and make your mobile phones, and your washing machines and your bank's huge computers, and TVs etc etc) run on C, and C++, and Cobol (yes - there's still an awful lot of Cobol out there), and (god help us) Pascal etc.
Errr, not quite. OS-X applications are generally writen in Objective-C but the underlying OS (which is basically BSD) is written in C, and the vast majority of the standard Unix tools that it supports are written in C or C++
I am confused by these reports.
Growing up as a tene in the 1980's also read about programs that convert other programs to make them highly efficient optimized code or it translates them into something else.
This is the problem I see with computer programers.
Why must we convert,translate,optimise?
If we just port everything into the lowest or higher common denominator which would be binary then there will be no need to do these things.
When the guy at GRC showed how his programs run VS a C++ or similar program his programs were faster, smaller, used less memory because from the roud up it was written in assembly language.
Fast foward today and there is still so much bloat that it still slows everything down.
Then years ago here at the register I remember reading about the failed florida compression company who based their compressions on quantum math to fit a 600 meg fle onto a 1 meg floppy.
Also I remember a company befouled of copyright laws when consumers would use the said product that would recomplie an executable to make it smaller and run faster and companies were up in arms about reengineering their product.
Seems in all this everyone forgot about the customer.
Want an explanation? Here's one.
Ever heard of the xor compression scheme that can compress everything to just one byte? No? Look it up, that quantum thing is another leaf from the same book. On with the show.
Writing assembly has a few drawbacks, such as being time consuming, tricky, bound to a single cpu architecture, prone to exhibit "wrongly sized carpet syndrome", and most of all, lacking high-level abstraction facilities. You can build them, or you can use prefab ones. C was so successful because it was basically the next layer up in abstraction. Of course, there is a cost. What you lose in raw speed you gain in abstraction. C++ is C underneath with some fancy abstraction things added on top. Using them wrong can be horribly expensive in terms of cpu cycles wasted, but using them right can help organize very large programs, doing fancy algorithmic optimisation tricks, or things like that. Also, if your program is large enough a few wasted instructions tend to be less important than using more efficient algorithms. And unless the program usage is high, programmer time is almost always more expensive than cpu time.
Also, there's programmers, and there's programmers. Some can write large programs in assembly, and some you don't even trust with a toy language like PHP. Yes, PHP is mostly a toy language, written as it was by big-eyed web two-dot-oh teens and taken from there, and is in many ways that affect scaleability and efficient use of programmers inferior to so many other languages, but its one huge advantage is popularity, so hiring programmers that can produce programs written in the language is easy. They'll be mostly mediocre programmers and the constructs they use aren't likely to be the most efficient ones even inside PHP, but with them facebook could move their business forward. And that's all that matters for a startup.
Now, of course, they ran into the limits of their early beginnings and they hired a bunch of very smart people with a better tool to, well, take their programs and speed them up a bunch.
Of course, had they stopped and thought for a few moments way back when, they could've organised their systems such that making pages would end up generated on each change to static data that can be served right off of a hard drive instead of needing to be generated on each page view, something that happens much more often than change. Had they been proficient in a more powerful language they likely also would not have used PHP in the first place. Then they wouldn't have had to pull some "skunk works" trick out of a hat and they wouldn't have a jubilous share-with-the-community feel-good press release.
Bottom line: No, they didn't forget about the customer. But good engineering it lacked from the start. So, I might have a look at the code just to see what they did, but I'll be doing my damnedest to prevent ever needing any such thing.
Let me be the first . . .
. . . in the inevitable flood of comments which say, "Who even cares? I don't use MyFaceTwitJournal because of a) privacy b) being far too cool c) my massive, engorged, throbbing penis. Ladies, I'm talking to *you*!"
not a good advert
i doubt facebooks reliability problems recently are a very good advertisement for this
It's not the language, it's the compiler/processor
So their solution was not to re-architect their system and code, but instead switch to a different language compiler. This seems like a slick workaround to the problem of native code compilation since they only have to support a high level -> high level conversion instead of going all the way to binary.
The funny thing is that this unfortunately shows how much room there is available for the current PHP engines to improve.
...especially three or four pass optomized code, will save a lot of carbon. What consumes electrical power in a computer is charging/discharging the wires between memory and the CPU. The better the code, the less power. Simple.
PyPy is much more interesting:
Want speed? Go stackless.
Why not write the whole facebook in C++? Or re-write the PHP interpreter from scratch? Why spend time converting a crap language to a complicated language?
Every level of indirection kills half the optimizations you could've done. If they got 50% improve, it means:
1. PHP interpreter is *really* crap. No news here.
2. Their compilation infrastructure is a nightmare
3. They could get at least another 50% by writing in native C++
But, of course, PHP "programmers" are twice as easy to find and half the cost of C++ programmers... why bother?
C++ != web
"Why not write the whole facebook in C++?"
Because C++ is not a HTML-oriented language. PHP was always designed from the ground up to handle HTML. The fact it can run on the command-line is a side-effect.
"PHP interpreter is *really* crap. No news here."
Where's your evidence? The reason Facebook are getting a 50% speed increase is because instead of PHP compiling the code down to C++ on *every* request, it's only doing it once, then just running the already-compiled application.
They've effectively turned an interpreted language into a compiled language - and everyone knows compiled apps run quicker than interpreted apps.
Even PHP obfuscator/encoders only optimise the code for the PHP compiler, it still has to be interpreted.
"They could get at least another 50% by writing in native C++"
Then they'd also have to interface with every module they use that's available in PHP (memcache, the HTML functionality, cookies, MySQL/PostgreSQL/SQL Server access etc etc.) Plus an Apache module to understand what the C++ app is telling it - redirects, cookies etc.
"But, of course, PHP "programmers" are twice as easy to find and half the cost of C++ programmers"
Totally agree with this.
Don't confuse the legions of script kiddies with real programmers
PHP is a nice flexible system that works very well.
There are a lot of people who "know" PHP, as in they can hack together other peoples (sometimes wrong) recipies from the net - this doesn't mean that there aren't quite a few competent people who know how to code producing good stuff very quickly without the Java or .Net bloat.
It's an easy community to join, which is good. But that doesn't mean you can't do serious things with it.
I prefer Rails personally, but have been working on some very large PHP based systems recently. It is very good when used by professional programmers.
facebook sux BUT...
I am interested in a PHP compiler, sure beats a code obfuscator!
Runtime speed, not so important. Retaining control of my code, yes pls..
Inachu, portability is the one-word answer to your query. If everything was written in assembly (eg. in CPU-specific code) then it would need to be re-written if it was to be run on a different CPU. Also, dunno if you've looked at assembly, it's quite complicated, higher-level languages such as PHP save us from having to deal with that. Certainly, Steve Gibson is a whiz, not many people can program like he can, though.
My main problem with most scripting languages is that I believe that interpreted languages add unnecessary overhead on the CPU that could be spent executing instructions, especially in a production environment where code is changed once a week but executed a zillion times between code changes. It isn't like I'm about to change my code every time I run it!
What are you talking about "interpreted languages add unnecessary overhead"?
Anyone using PHP in a production environment uses an accelerator (optcode cache) so unchanged scripts are immediately executed and not re-parsed and compiled on each request.
Do you honestly think sites like Yahoo have their servers wasting time interpreting PHP source on every page request?!?
"work on PHP caching in the applications server"
So it's an opcode cache that goes a bit further. Do I understand that correctly? I guess the webserver then communicates via good ol' CGI.
C++ is traditionally associated with...
Really? I associate it with the exciting world of game programming, where any CV not starting with 3+ years of C++ flies straight from envelope to bin.
Or it would do if the bloody education system actually taught the language instead of whichever scripted/late bound/JIT based monstrosity happens to be in fashion that year. So we end up accepting self taught C++ hackers with pitiful skill in the language because even that's better than experts in those other languages.
You could always train them
You know, get them 3 years earlier (and cheaper) and train them yourself.
It can be better than waiting for someone else to train them for you.
"...the new generation of scripting languages such as PHP" ??
write native - or don't
Inachu: there are many, many excellent reasons not to use assembly, and as projects get larger and more complex and computers get more powerful, those reasons keep getting better and better - to the point that hardly anybody uses assembly even in performance-critical applications. Performance is important, but it's not the only thing. Nothing is ever "the only thing".
Renato Golin: the article explains why they are not going native C++.
It makes a lot of sense. They can keep using PHP as a nice and easy programming language for the devs and compile it to C++ so that it can be compiled to machine code which executes faster, hence uses less power.
I don't see anything wrong with it.
On PHP-driven sites for which scalability or performance have ever been an issue, scripts are not loaded / parsed / compiled on every request. The compiled opcodes are cached in memory, and the scripts are only re-compiled when instructed to do so, or (optionally) whenever the script is changed.
Buffer overflows ahead
C++ is surely better than PHP, but the number of people actually fully knowing the language is about double digit. C++ is just a moving target which takes 4 years to master and changes completely every 4 years.
I would have rewritten it in a language like object orientated Pascal. It's compiled, but features functions like rangechecks. I have tried it on numerical applications and haven't been able to measure any significant loss in speed.
..if you like OO Pascal...
...take a look at Ada (esp. Ada2005). OO - even for tasks! - a much larger user base, free on Win, Mac and Linux (search for "GNAT Libre").
"C++ is traditionally associated with the reliable - but relatively unexciting - world of enterprise and server-side computing."
Let's not forget games. I spend a quarter of my life developing games in C++. Nothing else touches it.
Ask and ye shall receive
"Recordon said Facebook wants community feedback."
You're still shit mate.
"C++ is traditionally associated with the reliable"
Surely this is a strange, new usage of the word "reliable"?...
"C++ is traditionally associated with the reliable"
"C++ is traditionally associated with memory leaks and nightmare-ish security holes"
Fixed that for you.
Mark Zuckerberg likes this...
You beat me to it.
My sentiments exactly. In fact, I speed-read the sentence as:
"C++ :: reliable :: enterprise and server-side computing"
and couldn't stop laughing for several minutes.
"If we just port everything into the lowest or higher common denominator which would be binary then there will be no need to do these things"
Yeah , well good luck with that. While its possible to create a binary just using a hex editor you'd probably take all day just to get Hello World to work. Now scale that up to a word processor.
Also people occasionally mention that we should use assembler more often but they're generally people who grew up hacked Z80 opcodes. But comparing Z80 assembler to the monstrosity that is the opcode list for the latest 64 bit Intel or AMD chips is like comparing a bicycle with the space shuttle. I've coded in 32 bit x86 and believe me - its not fun. There are so many opcodes (and it doesn't help that Intel and AMD seem to add new ones every year) , not to mention effectively 3 processors on the same chip (standard register based x86 opcodes, stack based floating point opcodes , SIMD based parallel opcodes) that its got to the point that a human can't efficiently code in assembler on these architectures.
And thats just 1 processor type - now you want to port your x86 assembler to SPARC. Instead of a few #defines being added and a recompile of the C++ code you now need to start from scratch again.
Theres a good reason assembler is generally avoided like the plague for anything higher level than OS code these days.
Re: C++ Can't do...
eval() is a very special case, and my guess is that HipHop doesn't allow it, as any website described as 'highly optimised' won't be using it.
For any non-PHP heads, eval($string) evaluates $string *as PHP code*, allowing a script to build extra lines of code on the fly, and execute them. If that sounds like a bad idea, that's because it is: not only does eval() produce self-writing spaghetti-code, it also negates your opcode cache, because evals have to be evaluated(!) prior to compilation.
@C++ Can't do...
When people say this they mostly mean 'it doesn't do that out of the box'. There are very few things C++ cannot do, with enough effort. Reflection is the main one because the information simply isn't there in the binary to allow it but if you need reflection in a production app you got the design wrong!
There are things its harder to do (the 1st time) in C++ but hard != impossible. Good C++ programmers also know that other languages can get the coding done faster and very occasionally produce better run times. Something the script kiddies seem to struggle to understand...
MySpace went through a very similar process
The original MySpace was build in Adobe ColdFusion, which compiles into Java bytecode. For performace reason, and because of Adobe's lack of commitment to the CF platform, MySpace switched to the BlueDragon CF engine that compiles to (horror of horrors) .NET bytecode instead.
Which as an interesting side effect, allowed MySpace to gradually switch to ASP.NET without having to ditch any of the existing code.
php optimisation path
Sounds to me that this is a benefit to PHP, rather than showing how inefficient it is. Now we can knock up a quick scripted project with little up front investment, then if it flies there are ways and means of optimising all the way to static code. Yes there may be more elegant solutions, but for a business this is surely a good thing.
This must be the most stupid naming for anything, ever
Agile is not for 'hacking code'
"Agile development methodology of hacking code for short project cycles."
Obviously written by a hack that doesn't know what agile is. I've been using agile methodologies for years on enterprise scale apps and if anything proper development practices are even more important in agile.
They've written a program to take SQL injection vulnerabilities and convert them into buffer overflow vulnerabilities?