Some of the UK Academic supercomputers run GPFS; costs a lot for 50TB of storage, whereas you can have that same amount of capacity without needing full time suits on site with Apache Hadoop with 200 TB of HDD (assuming 4 TB/server node, 1TB for local storage of OS and MR job temp files). GPFS relies on high end RAID storage and an infiniband backbone, and costs lot to expand.
But with infiniband it delivers fantastic datarates to any node in the cluster. Unlike Google GFS and Hadoop HDFS, which only deliver local disk rates to code running on a single node. With GPFS your disk bandwidth to a single node scales up the more disks you put on,
So, tradeoff. Hadoop: free, needs storage near your CPUs, "commodity" x86 and GbE -proven to scale to PB. GPFS: top of the line hardware, I doubt anyone could afford to scale it up to many PB today. Then XIV, which sounds like something in between. More commodity hardware, less scalability. Given the engineering effort being put in to Hadoop -no involvement from IBM in the filesystem, incidentally- it and the layers of code near it will have the mass petabyte datastore market. Someone had better be writing a bridge so that Hadoop can run MapReduce code on XIV storage, they way they have done for GPFS