When Facebook moved its servers to HipHop for PHP – the code transformer it built to convert PHP into optimized C++ – the company's average CPU usage dropped by 50 per cent. And after six months of additional engineering, the tool was about 1.8 times faster. Now, after another six months, the company says, it has improved …
So they go to all this effort, than hamstring it by using GCC (a compiler well-known for the poor performance of it's object code) rather than shelling out the little bit extra for Intel's compiler? *golf clap*
Go compile it with Intel
And report back with some actual numbers. Then maybe you'll earn some thumbs up.
Can icc compile for ARM chips? or PPC?
Can icc even be trusted to compile optimal code for non Intel x86 chips??
Surely just not use PHP in first place?
If they use the proper language like Java or C# they would not now be in these woes and spend all these monies on converter to legacy C++ which is only the same step back too but just a bit faster.
@peyton? @Bob Gateaux
GCC vs ICC - it's quite widely acknowledged that Intel's compiler does a significantly better job on Intel hardware than gcc.
I've seen bits of C compiled as a .NET app run 80 times slower than the same source code compiled by icc as a native app. Alright, it's was quite maths intensive code, but even so.
C# / Java may indeed be better languages than PHP, but they're by no means ideal when you need to scale to truly large setups like Google, Facebook, etc. If a C# app achieves 90% of the performance of the equivalent C++ app, that can amount to $Millions extra in hardware and electricity costs. Noble sentiments of purity of languages soon go flying out the window when faced with the real and expensive problems of building vast systems. No software engineer can successfully argue that automatic garbage collection (a convenience for lazy programmers in a hurry) is worth $Millions.
By developing HipHop Facebook are avoiding the decision to completely re-implement in C++, but it remains to be seen whether HipHop is a good enough 'bodge' (I use that term positively) to allow them to scale and retain the PHP base.
"I've seen bits of C compiled as a .NET app run 80 times slower than the same source code compiled by icc as a native app."
What are we supposed to draw from that ?
C# / Java developers are approximately 30% more expensive to employ perhaps?
When did C++ become legacy? Did I miss something? For that matter - why use Java at all? It's still an interpreted language, why not use a "proper language" like C++ or Assembly in the first place?
I "love" code snobbery - it's like willy waving for geeks.
"If you write efficient code, then it'll run pretty much just as fast on .NET as native, because after JIT compilation kicks in the first time it'll be compiled to native code and optimised just as efficiently as a native compilter anyway."
An ahead-of-time compiler can take as long as it wants to compile code. So it can apply some very computationally expensive algorithms to try to seek out optimisations. It can analyse as much or as little of the program as it wants to make optimisations.
A just-in-time compiler is designed to do a lot better than interpreting, but doing things quickly is still a concern. So they tend to apply only simple-to-compute optimisations, and don't generally get an opportunity to look at the whole program in overview.
In practical terms, just-in-time compiled stuff usually ends up being in broadly the same ballpark as the precompiled stuff (both being several orders of magnitude better than anything interpreted), but it's false to label it "just as fast". As noted by other commenters, if just-in-time costs you even just 5% of performance then that can amount to hundreds of millions of dollars in a year for an operation the size of Facebook.
It's still a very expensive stopgap (as in, man-years) for poor engineering decisions. Never time to do it right the first time, always time to do it over, and all that.
...will still run like a slug...
So what if no-one uses BigTable?
Apples to oranges!
HipHop allows for greatly accelerating one of the dominant programming languages of today, BigTable, um, doesn't - because it's a storage system not a programming language tool.
IOW, it's Google bashing bias for no good reason.
They were contrasting
That the open-sourced Facebook tools are being widely used. The closed-source Google tools are not. Even if Facebook doesn't have a "do no evil" motto, it's refreshing to see they are "doing some good".
Figures ssem a bit low
a 70% speed up from PHP to C++ seems a bit low to me
does a lot of "dynamic" things, which are still expensive when transformed into C++. Think of all the typing magic and uneccessary memory allocations.
Associative Arrays are a likely cause of slowness:
-- vs --
cout << arr;
spot = lookupIndex("name");
cout << arr[spot];
// yes, it can be combined into one line, but >>most<< will do a sanity check on spot before use....
Text searching can be fairly intensive if you don't have decent indexing methods. This is just one example of where PHP can run doggedly slow compared to a language that doesn't have such conventions, and thus has the programmer overhead of knowing the actual indexes for his variables before-hand.
props to facebook
NIce to see someone knows how to use C++.
And to think, up to now I'd just dismissed them (without knowing anything about the details) as a crew of opportunist turds pandering to the weak-minded. Shame on me.
Missing the point ?
Er, I think you're missing the point.
Facebook's web service(s) are written in PHP. You can agrue whether that's a good or bad decission 'till the cows come home. That's not the point of this exercise.
Facebook's systems are now, not trivial, and they wanted improved performance. They had two choices: Re-write everything in a new language, or improve the language/run-time environment.
Facebook took the decision to improve/enhance the run-time environment. It has nothing to do with security at all. (I'm sure that there was an El Reg article about it at the time. I'll leave it as an exercise for the reader to find it ;-) )
(Oh, and I did take a look at your document. Just a few paragraphs of not much and a few lists.)
@Frank Gerlach: efficiency & safety
You've peddled this claim for your language lots but I don't believe you understand the performance implications of reference counting (what do you think it does to the cache?) and you've claimed memory safety while admitting that cycles have to be broken manually (so if they aren't, mem will inflate potentially unlimitedly, another source of horrid runtime bugs).
Also: 'All Arrays are bounds-checked at runtime..' - wot, always? That'll kill performance, or do you do static analysis to detect safe checks and eliminate them? (likewise for refcounting?)
Ditto the stuff about stack depth checking.
Also, your being able to stack allocate most anything, how do you guarantee it won't outlive the call frame and leave a dangly pointer? Can't see that covered in your manual. I can't see a hell of a lot explained there either.
Validate your claims? I think not.
For the simple reason that the baseline is php and the stick is "improvement upon the baseline". You don't know what the maximum performance of the function is, only the performance of the implementation code as-is. Thus, you cannot discern if "70% faster than unadorned php" is faster or slower than the same thing implemented right away in a "faster" language.
But don't let facts or your fellow commentards get in the way of a good bout of starry-eyed hobby-horsing. By all means, knock yourself out. Just don't expect us to buy into it too.
No MMU? It depends...
Firstly I realize that this is going somewhat off tangent to the article here. Nevertheless...
On very small systems (ie microcontrollers) I can see the advantage of not having memory protection. Hell, just code on the bare metal. Contentiously I would imagine the best CPU one would use would be the simplest one capable of doing the job, and the code be small and simple enough that one or a small team of people would be able to understand thru' and thru'
Now, when you have increadibly large and bloated software designed by "committees", I'm sorry you cannot convince me that memory protection is not necessary.
Consider that even the simplest piece of software may be buggy and sometimes in the unlikeliest manner, that may initially even escape you. Add in the complexity of today's operating environs.
How many bugs are there in Java?
Look, I don't even fully trust my kernel. I hope and pray bugs in there (there most certainly are!) are consistent, observable and occur in a frequency practical enough that someone can find them. And not introduce new more horrible ones with the fix.
But at least, I know with memory protection, I have some chance of restarting some userland thing that crashes without taking the whole system down.
Hmm... you will trust some sort of memory safe language (probably a large project, thus probably containing bugs) running in essence with no memory protection? You will trust multiple processes on such a platform?
I wold say that would make quite a recipe for disaster.
re: Ok folks, once again
a) I doubt most people would argue.
b & c) not relevant to my criticism, which is of your claims for sappeur.
d) you are confused and don't understand that reference counting is merely a means of implementing garbage collection, albeit an inferior one (under most conditions). It is not 'not garbage' collection. It *is* a method of garbage collection <http://en.wikipedia.org/wiki/Reference_counting>. Ref counting is a particularly inefficient (under most conditions) because it keeps inc/decrementing the refcount (unless you optimise this out, which you haven't?) which vastly increases cache/memory pressure, which slows things down. If you use interlocked increment/decrement for thread safety, even worse. Why do you think modern ram-based gc is mostly generational these days? Because it's slower, of course! It's what everyone wants!
e) 'Reference counted-pointers and bounds checks are not *that* expensive' - if you've inserted the static analysis that removes the checks. I'm guessing you haven't because you haven't answered my question of whether you do, so benchmarks please - or stop making unsupported claims. 'clearly doable' != 'done it and delivered'
Also my dangling-stack-allocated pointer, wut about it.
>"you are confused and don't understand that reference counting is "
As always in computer science/IT there is the babylonian language confusion problem. I do not consider refcounting "full" GC, as the programmer must break cycles upon deleting a data structure. Anyway, Sappeur does refcounting, not some "stop the universe, we must do GC" thing. Which is a no-no in realtime systems. Sappeur can indeed be used for soft-realtime systems and for for mixed soft/hard realtime systems were the hard-realtime threads do not perform any heap allocations.
All "full GC" languages cannot do this.
Regarding performance, I am fully aware of the overhead, especially if more than one thread has access to a pointer/reference. In this case I must indeed perform threadsafe refcounting, which puts high load on the whole system if done too often. That's why programmers must make the conscious decision to declare a class "multithreaded" or not. In the latter case, the refcounting takes place in a single core - it is just a plain _rc++ operation. See chapter 9 of the sappeur manual. There are 16 rules pertaining the keyword "multithreaded".
As to you claim "Ref counting is a particularly inefficient (under most conditions)", my claim is that this is untrue for real-world word-processors, CAD/CAE systems, database servers, signal processing/collection systems, guidance and control systems and so on. Because all of these are nowadays using C++ with Boost smart pointers, typically. Some (such as the CAD market leader) uses C++ and macros for smart pointers. I don't know about AI/Lisp and the like, but I suspect the claim comes from them.
Granted, C++ systems don't need to use so much heap as Java/C#, because they can allocate on the stack, which is a major efficiency booster. Sappeur is modelled after that.
>" if you've inserted the static analysis that removes the checks. I'm guessing you haven't because you haven't answered my question of whether you do"
I do not do any significant optimizations at the moment, yet the performance is very good. Of course, I ultimately have to quantify it - that's absolutely true. I simply removed the checking code from the generated C++ code and found the performance difference to be minimal (less than 10 %). But yes, I definitely need to provide benchmarks and look forward to do so with much confidence. If the result is "20% less performance than plain C++" I would be absolutely happy.
"Also my dangling-stack-allocated pointer, wut about it."
That one worked nicely from day one. Every Sappeur pointer is a smart pointer; on allocation it is NULL. When you assign something, it will increment the refcount of the object in question. Upon leaving the current contect, the destructor of the smart pointer will be called and decrement the refcount. It works exactly the same as refcounted Boost pointers.
If you know how (properly coded) C++ classes and smart pointers work, you know Sappeur works. All Sappeur does is to ensure you write memory-safe code. You could think of Sappeur as a "generator of memory safe C++".
Re: Ok folks, once again
> A) Plain C or C++ definitely is *insecure*. More than 60% of exploits can be traced to C or C++. See StuxNet, PING of death, Morris Worm, RTF-based exploits, PDF exploits.
Nope... This is result of the programmer not doing things he/she should do (aka checking memory allocations, input data size etc). Which takes more time/expertise to do it right, which increase costs, which the boss won't like. There you go.
Lemme argue a) anyway.
Yes, C and therefore C++ too allow you to fsck up by failing to check your boundaries, ending up with stuff in memory that doesn't belong there, and so leaving room for people to abuse that gap deliberately. That's an inherent trait of those languages but not an inherent fault of the languages; if you use these languages you're supposed to know and guard against it.
Pointing fingers there just isn't very useful, as even a supposedly safe language like PHP does have serious trouble. Like, oh, allowing externals to scribble over internal variables. That was sold as a *feature* and golly did it leave a security hole wide open. Yes I know there's a "fix" for that, but it's not the default and moreover the designers of the feature hadn't managed to work out the implications. Nevermind that there's still plenty of code around that simply doesn't work with the fix in place.
Not so in C, where it was a deliberate tradeoff. Deciding to use the tool gives you the responsibility to deal with the consequences. Even though the impact of failing to check your input or your buffer sizes (strcat() et al.) and moreover that while connected to a hostile network wasn't known back then, a defence PHP just plain doesn't have.
And then there's SQL injections.
So, no, I wouldn't say that by eliminating that concern for the programmer from the language by design is automatically a good thing. It merely seems to attract language designs that fsck up in other "interesting" ways. Even PHP-based things risk memory exhaustion if they don't put some sort of cap on input sizes, so they need to check anyway. Only this time after the allocation has happened already. The whole thing just became that much less efficient.
In the end, you-the-programmer have the responsibility to not trust any input whatsoever until checked to conform to your expectations. Only then are you allowed to carefully turn those expectations into assumptions and act on them. That goes regardless of language used. Your fingerpointing, in short, is at most 60% of the story, but probably less than that, as your argument fails to address the tendency of a better fool popping up to assault your foolproofing.
Re: Sappeur Stack allocation, Cyclic References, Bounds Checking
Stack Allocation:An Sappeur object allocated on the stack will call it's destructor upon leaving the current scope, as in C++. If it is a pointer or contains pointers, these will decrement the refcounts on the objects it/they are pointing to (called RAII in C++).
Stack allocation is extremely fast and efficient, because the memory is almost always already in the cache and will soon be freed.
Pointer cycles are indeed an issue with C/C++ and Sappeur. You, the programmer, must break them up. If you don't do this, a leak will occur. Fortunately, tools such as Valgrind or IBM's Purify can tell you about them.
Bounds Checking: Currently, Sappeur bounds checking is quite "naive" and there are no optimizations whatsoever. Still, this is not a pressing issue; memory allocators are a much more important issue for real-world programs.
With Sappeur, you can create small programs ("filters") in the unix tradition, which will launch in a few milliseconds, do their work in a few milliseconds and then die. Can you do this in Java ?
frank@frank-desktop:~/sappeur/output$ time ./sprprog.exe >oo;wc -l oo
> Stack Allocation: ...
3 functions, f, g & h. f calls g. g stack allocs something (call it x). g terminates, returning a pointer/reference to x back to caller (technically, x 'escapes'), f. f now calls h which stack allocates a shedload of other stuff all over x, effectively trashing it. h returns to f. f now does stuff with x which is, unrealised by it, total garbage. Crash and burn.
read up on Escape Capture.
> Pointer cycles ...
don't happen if you have a decent GC engine. Which is also going to be more efficient than refcounting anyway (the efficiency of which you haven't addressed). And if you leave it up to the programmer to find unwanted run-time behaviours, they won't get caught reliably. Best way is probably to have a fallback collector, perhaps mark/sweep, triggered occasionally to pick these cycles up. Then again, you might as well (on average) just stick with mark/sweep from the beginning... or even better, generational... but then your RAII idiom pinched from C++ isn't going to work reliably... which is why it's often deprecated in GC languages... which is why modula3 carefully and deliberately closed off that avenue of misuse by make GC only apply to memory, not resources (a lesson repeatedly not learnt by language designers ever after).
Here, an website by an expert: <http://www.cs.kent.ac.uk/people/staff/rej/gc.html>. He also wrote an book which I highly, highly recommend - read the fucker before posting any more silly bollox about refcounting!
> Bounds Checking: ...
well, if you say so. A real-world benchmark on a decent app might find otherwise, but other than a trivial filter, the source of which I can't see, you've provided none.
gpg certainly does only work on local files.
root@BRIDGE:/usr/local/squid/var/logs# ls -l *.log
-rw-r----- 1 squid squid 11350103 2011-04-01 11:31 access.log
-rw-r----- 1 squid squid 46765524 2011-04-01 11:29 access_unfiltered.log
-rw-r----- 1 squid squid 55224 2011-04-01 11:30 cache.log
-rw-r----- 1 squid squid 196443 2011-03-04 14:58 cache_unfiltered.log
-rw-r----- 1 squid squid 16290905 2011-04-01 11:31 store.log
-rw-r----- 1 squid squid 76547872 2011-04-01 11:27 store_unfiltered.log
root@BRIDGE:/usr/local/squid/var/logs# time sort cache.log >/dev/null
root@BRIDGE:/usr/local/squid/var/logs# time fgrep info cache.log >/dev/null
If I can sort a 50k file, and search for a particular phrase in it, using standard C-compiled Linux tools, in less than a quarter of the time your program needs to count instances of a similar phrase on a tiny CVS file, you have bigger problems than a benchmark.
BTW: That was executed in the middle of a working day, on a production system, using Slackware 13.1's standard tools on a desktop-class machine that was also running our Mediawiki / Apache Intranet, firewall/filtering/Squid cache/VPN server for 450 users, incoming/outgoing fax-to-email system, SMS reception and transmission, a Jabber server and several MySQL databases and network monitoring functions, not to mention a network-accessible 3Tb software RAID over Samba for the whole root filesystem (while logged into via SSH from the other end of the site).
4ms? That's enough time to perform 10,000,000 cycles on a 2.5GHz machine. Some people really need to check before they start benchmarking crap like that (and the fact that your program is "cvsbygroup.exe" tells me you're using Cygwin or you're used to using Windows, which might explain something).
@Lee Dowling: You are soo nice
Lee, it would help you to be a little bit less aggressive, especially when you deal with humans as opposed to computers. Thank you.
Regarding your arguments: You perform a sort and an fgrep against a 50kByte file. I did something different, namely a GROUP BY operation on the csv file. We are comparing apples and oranges here. Also, I ran on a 32 bit x86 machine and I don't know whether yours was a 64 bit box.
Finally, my argument was that Sappeur is more efficient than Java, but also memory safe. Your argument missed that one completely.
>"and the fact that your program is "cvsbygroup.exe" tells me you're using Cygwin or you're used to using Windows".
I am accustomed to using windows and all kinds of unix from HPUX to MacOS X.
Most of the time I use Linux, but as am engineer I am always open to different concepts and
I do not believe Linux or Unix in general is the end of meaningful operating system
innovation. Actually, MPE, MVS and MPE do have some strengths on the security side neither
Windows nor C-based Unixes will ever match.
The reason for
$ mv csvgroupby csvgroupby.exe
was simply to avoid a name clash. Insults do not help in a rational discussion...
Right, let's try and wrap this up
I've got plenty of other stuff to do (including tracking down some firefox bugs I've run into - looks like mem exhaustion not checked for, and two race conditions. Wonder how sappeur would prevent those those?)
> I do not consider refcounting "full" GC, as the programmer must break cycles upon deleting a data structure.
And if they don't then mem will bloat till the app fails.
"But they break the cycles - it's their responsibility!" Well, it's the programmer's responsibility in any language to do the right thing, just as Renato aptly pointed out. C/C++ does overload the programmer with too much (IMO) but the principle still holds.
> Sappeur can indeed be used for ... w[h]ere the hard-realtime threads do not perform any heap allocations.
That would be true in C/C++ then, trivially.
> Sappeur can indeed be used for soft-realtime systems and for for mixed soft/hard realtime systems ..." because it uses reference counting? Because you can bound your delays with refcounting? Wrong. Your delays are proportionate to the number of items freed (ignoring destructors specifically for RIAA which add an overhead). If you lose a ref to the head of a linked list then the according to your descriptions, it will be be deallocated - which will cause item 2 to be deallocated - which will cause item 3 to... etc. Delay time ~~ length of list. Unbounded delay. No upper limit. Your claim fails.
If you had read the Richard Jones book I recommended then you'd have had that pointed out to you... and the solution, which is incremental freeing. But then if you attached RIAA destructors to items in the list then resources will not be freed predictably.
Sorry. I think that's a killer blow.
And that's why general GC + RIAA are a headache.
> All "full GC" languages cannot do this.
Dunno. There is research into realtime collectors (probably need programmer input to work well), maybe someone else can comment. Here's a quick link I found while failing to find the interview with a realtime GC researcher at IBM <http://domino.watson.ibm.com/comm/research_projects.nsf/pages/metronome.metronomegc.html>. Google for other stuff.
> my claim is that this[Ref counting is a particularly inefficient ] is untrue for real-world word-processors...
I don't accept this claim. We should test it using eg. a linked list traversal in c & sappeur. Not sure I have time. Looked for some benchmarks of RC overhead, couldn't find any.
> "Stack Allocation": In Sappeur you cannot obtain a pointer on a stack-allocated object...
then saying the programmer can stack allocate seems redundant. If it can be stack alloc'ed and the compiler can detect that trivially, then it's up to the compiler to catch that and optimise it because it's always a win (I think). Programmer should not be involved.
Trouble is, getting a pointer to a SAlloced object isn't always unsafe. If it's only passed up the calling chain rather than down and escaping (as I described re f/g/h functions) then it can be safe. If you prevent it, you block a handy optimisation.
> come up with a benchmark that argues for "stop-the-universe-GC". ...
not all GC is as you describe. They are pretty sophisticated things these days and can induce bounded delays.
> Rather, I analysed that non-trivial systems ... are implemented in C or C++ and do not use "full GC".
That's because c/c++ is in general unsuited to decent GC (necromatic horrors such as conservative GCs can be applied but have a cross and a stake at the ready). Look, I'm no fan of crap languages or their misuse, I just don't accept that sappeur is the miracle you claim. Let's be clear about that.
correct me if I'm wrong, but isn't this all non-allocated objects, just doubles & floats etc?
Will take a look but I'm a linux newbie so am struggling. Question though, what's wrong with me doing this in SQL, being as it's exactly for that and has no pointers etc It's a safe language and fast, so...
Summary: I'm not convinced. Reason I'm getting pissed off and losing my manners is from someone who clearly hasn't got so much experience in language design and theory wandering in, creating their own language (which is fine) then making loud claims about its competence (which isn't). I wouldn't try doing your job because I don't have your experience or knowledge. Read the Richard Jones GC book as a first start. Good luck.
GC researcher interview
Seems to have been removed, nonetheless here's the lady's blog <http://hollycummins.blogspot.com/> which will give you a useful comparison of GCness.
fb to google...
Zuck to Brinn: Ecks-kooz me, sah. Haf yoo evah hurd of da ee-man-seep-ayshun prok lam ation?
Brin: Ayyye dohn't lissen to hip hip...
second system effect
Confounding any performance comparisons is the "second system effect" - recoding ANYTHING, even in the same implementation, will boost performance. Compare the C++ implementation with a second PHP done on the other side of a "Chinese Wall" ? After dropping the "rarely used features" ?
That's not the second system effect.
You are right that recoding the entire stack of facebook PHP in C++ might or might not but would likely have yielded _some_ improvements at least, due to spotting inefficiencies and recoding them away. But that's not the second system effect. And moreover, it's not the bulk of what happened here.
The dropping the "rarely used features" part, as well as the doing away with most serialisation in the caching mechanism, are gains of the kind you point to.
But the bigger chunk of work instead is that they built a software tool to take that PHP and mechanically translate it into C++, compile that, then run it for good performance gain. The original code remains as-is.
Why not try and get it even faster, port it to fortran.
Why the joke ?
The old stuff is actually easier to optimize than C++. There exist lots of optimizer routines which can speed up scientific processing by orders of magnitude. Think of changing the order of nested loops of a matrix multiplication. Lookup Kuck and associates, Cray research, vector processors.
No LLVM at the back-end? I would have thought that would be a better way to go. There is a PHP front-end for it here:
All that effort
And for what? I just don't understand why anyone would ever want to use facebook or it's ilk. But good for them and I hope some puts this code to something that provides a real benefit to people.
Yeah, that Facebook eh
It'll never take off.
1. "Why don't they just program in C/C++ to begin with?" Why don't you? With your faster C/C++ Facebook clone you should take over the world!
2. "Why would anybody use Facebook anyway?" Because they're not special little snowflakes like you.
3. "PHP sucks!" And it's behind a lot of the most popular Web sites and services available today. Those PHP developers sure are dumb.
No, MediaWiki is not using HipHop. I recently said on wikitech-l that we should start working on HipHop support for a compatible release in 1.18 or 1.19, and we've started work on it. We certainly haven't finished.
Suck it. Just use C already.