Apple has embraced Hadoop, the open source distributed-computing platform based on Google's famously proprietary backend infrastructure. According to a recent Apple job listing entitled "Senior Software Engineer - Hadoop", the company is using or planning to use the entire Hadoop stack, from the HDFS file system and the Hadoop …
I thought apple had abandoned Java
Presumably Apple will be running on Linux the way pretty much everyone else does in production, unless they want to be the ones taking on fielding the bug reports related to running on OS/X at scale.
Not abandoned Java (totally...)
Abandoned writing the Mac implementation. Leaving it up to Oracle to support the platform as it does with Windows et al. (http://www.apple.com/pr/library/2010/11/12openjdk.html)
I say 'not totally' as even though they're not 'killing' Java per se they won't be supporting Java apps in the upcoming Mac App Store. Which some would argue is a death of sorts.
Yes, yes, slightly off topic and boring.
I'd be interested to know what they are going to use to serve it. Stacks of mac minis in racks? I think not. I also wonder if it'll be a Mac OS job or running under Linux? If it is Mac OS, will it be running on commodity Intel servers and/or under a virtualisation layer?
Sadly, I suspect it'll be running on Linux, I say sadly because I think the writing is on the wall for Mac OS Server.
Hadoop cluster requirements
MacMinis are way behind the CPU and storage. You can get 12 HDDs in a 1U, 8-12 cores, from a couple of x86 vendors, these are cutting-edge in Hadoop hardware, gigablit ethernet to the Top of Rack switch and then 10 Gbits from there. Having bigger worker nodes increases the likelihood of finding a slot for work by the data, and with multiple HDDs work can use one for input, one for output and one for intermediate (overspill) data.
Pretty much every production cluster claims to use Linux -usually RHEL or CentOS 5.x-, and the Sun JVM, and rarely the latest edition. Filesystem: ext3 with the noatime option. These big clusters try and stay in sync in OS/JVM versions as everyone wants to avoid finding bugs first. Some people (linkedin) use Hadoop on Solaris.
Hadoop is set up to build on the mac, and its easier than on windows, where you need cygwin installed. Nobody admits to running Hadoop in production on windows or MacOS, because of cost and because you get to find the bugs yourself -and fix them. And of course, even if Apple are secretly doing their own high end sever motherboards with the disks and CPU to compete with the datacentre-specific kit, apple would have to port their OS to their own or purchased hardware, with even more debugging fun.
Assume, therefore: Linux on hardware from somebody who can do proper datacentre kit. That is, unless the apple hardware team have just told the ops team that they need to come up with a plan to mount 1000 mac pro boxes on their side in an earthquake-safe form. Oh, and they need to get 5-11 extra disks into each box.
It amazes me how fast Linux moves in the datacenter.
If the rate stays like this, nobody will use Windows or UNIX in datacenters in a decade.
Apple is just another leech on Java
Apple officially "deprecated" Java on their desktops expressing that they don't want to spend any money on maintaining it.
Now they will become a big-time Java users in their datacenters, likely running the stuff on Linux servers.
I know, that desktops are different business than servers but seeing this, I would be hard pressed to believe any technological superiority claims of Steve Jobs regarding OSX and Apple hw in general.
One could say that Apple hw and sw is DEPRECATED since even them are moving to Java and Linux.
but what do apple run their servers on currently anyways?
Making it up as you go along?
[quote]We are looking for senior Hadoop engineer to be part of a dynamic team building highly performant and scalable applications.[/quote]
It's a perfectly cromulant word.
A Hadoop cluster as the one needed to manage sheer volumes iAds mightgenerate spans surely multiple racks.
You better have serious networking and decent throughput (net to disk and viceversa) to pump decent amount of splits and spills around your nodes. multiple disks are used natively by the Shuffler in Hadop to spread local I/O on each node.
Definitively this might rule the Minis out :)
Regarding OSX and Hadoop:
The story of "bugs" might affecting OSX and not Linux...pretty weak argument.
Hadoop is mostly Java-based except a10% code needed for stuff like encryption, compression and streams. Installing from source needs some packages but the same might apply as soon as you exit from the most supported Linux distros.
Personal experience is a 15 nodes cluster on OSX, not the biggest of this world, but it works and is even provisioned by Puppet. We noticed, however, that OSX native filesystem is sometimes quirky regarding performances and their NFS implementation needs some reworking.
Biggest advantage of. OSX, at least for my customer, was that nodes can be used as workstations when not crunching logs at night!
- Analysis Who is the mystery sixth member of LulzSec?
- Analysis Hey, Teflon Ballmer. Look, isn't it time? You know, time to quit?
- Murdoch Facebook gloat: You're like my $580m, 'CRAPPY' MySpace
- Tablet? Laptop? HP does the splits with Tegra-based SlateBook x2
- NASA signs off on sampling mission to Earth-threatening asteroid