Recently I met the analytics firebrands Revolution Analytics at their Bay Area offices. Outwardly, the facilities were conventional, and no one called me ‘comrade’. But what they showed me over the next couple of hours sparked my imagination and even a bit of revolutionary fervor. I’m convinced that predictive analytics is the …
R is great
It's like MatLab but for statisticians. SPSS, Systat (I am that old) and SAS are a straitjacket compared to the freedom of R. I used to do statistics in C (just wrote my own code for any methods needed), since R is around, I do not do this anymore.
I have to say that I have not found the limitations of conventional R yet - maybe people aren't thinking about their problems properly?
They've added some scalability features to the runtime system?
And a GUI.
Strange that they're in an agreement with IBM. IBM recently bought SPSS.
And where is the comparison to S+
Given that R started as an open-source version s-plus, the "article" does not really qualify as "journalism" without a comparision with the original comercial product
For researchers and statisticians, this argument doesn’t wash
Indeed. Some of us even run our R routines from within python scripts. Because honestly, R "data structure" is the dumbest thing I've ever seen.
R is like Perl
R is like Perl in many ways - very powerful and very flexible. You can do stuff with a couple of lines of R that would take pages of hideous SAS macro code.
But like Perl it has many idiosyncrasies that even its creators would get rid of if they were to do it again. There has been talk of a complete re-write (like Perl). And the technical documentation is possibly the most impenetrably difficult documentation ever created in the history of computer science.
R-A seems to be a closed source proprietary and commercial implementation - its like someone wrote a closed source faster version of Perl and tried to sell it.
R is not like Perl
Perl was designed to be flexible, versatile, and fast to write in (with an emphasis on code compacity). R is neither. In R it's damn nigh impossible to work on any dataset not in 2-d table format, the syntax is a messy, ambiguous jungle (plus it requires many, many times the "optimal" keystroke number), and R really does only one thing: statistics (and associated graphic representation). But that it does quite well.
Actually R might ("structurally", so to speak) be a programming language, philosophically it is a statistics framework that HAS a language. A bit like TeX, LaTeX, lout etc are languages (Turing-complete, even), but good luck if you want to use them for anything else than document layout.
Performance Remains to Be Seen
The performance statistics you quote are from Revolution Analytics, not from an independent analysis. I'm sure the folks at SAS can give you many examples where their computations perform many times faster than Revolution Analytics. When you report what a company tells you, it is not journalism, it is advertising.
Where do you get your information from? R can take advantage of multicore and grid environments through a wide range of options. The multicore SMP package is one example, Rmpi and snow another, while gputools makes use of Nvidia cuda processing.
The data limitations is architecture specific (32 bit), and R x64 has no such limitations. In fact, packages exist in the R community to efficiently and quickly handle large datasets (see for instance ff and bigmemory).
The HighPerformanceComputing task view lists the many packages available to parallel processing and large data handling.
Revolution Analytics has taken the work of the open source community, repackaged it and resold it with support as value added. This is fine under GPL, but PLEASE do not claim that they have made some enhancements which did not exist in R. This is blatantly false and I would be very sorry if such lies are propagated by Revolution. The least you can do is check you story before publishing.