oo
Looks like someone took a note out of Netapp's book.
" We don't have it so you definitely don't need it!"
Cue an announcement in 1 year when they finally get around to introducing it.
Storage array feeding of server flash caches is not needed for high-performance computing because network latency is negligible - according to parallel storage biz Panasas. Geoffrey Noer, Panasas's senior product marketing director, tells the Reg: "We are not under pressure from our customers to deliver more performance. In …
HPC has always been thus. I/O isn't random access to lots of little bits of data, it is massive broadside access to very very large lumps of ordered data. Latency of access is swamped by the transmission time. Optimising for access is simply missing Amdhal's Law. Most caching strategies don't apply. Data is very often only read once. Optimising data layout, order, prefetch, matching bandwidths, this is where you win.
Same for deduplication. Science data is inherently not internally correlated. Except for the case where finding hidden correlations is the entire point of the computation in the first place. Enterprise level dedupe doesn't get any traction at all, just slows things down and costs money.
Just the nature of the game.
An interesting challenge! The impact of latency depends on workload and data protection policies. Just tossing in a flashcache isn't the solution. Write intensive apps need to figure out how to duplicate data on different nodes, so that a flashcache failure doesn't cause data loss. This implies a fast, low-latency network like IB-RDMA. Where the caches sit is immaterial, since IO-Complete is when the remote copy is written. So, in this situation, a array based solution and a server-based flashcache would be similar in performance, assuming the array can service the writes to disk fast enough ..
Read operations are different. If the data is in the flashcache, it will be available much faster. The 'if' is the issue. With huge datasets in HPC, the cache may not hold the needed data. Again, the accessible cache can be expanded by networking between servers, but economics dictate that the cache to data ratio is low. So, the contention that, in HPC, flashcaches are of limited incremental value is likely correct.
However, based on work I've done, flashcaching metadata is useful in speeding up create/open/close operations, since the metadata is either on a single server (so that flashcache gives needed speed) or distributed (so that networked flashcaches help solve the coherency problem).
It looks like both viewpoints have some merit, but it does come down to usecase specifics.
Metadata. Ah yes. Good point about the metadata. That is going to be one place where flash might help. However it may start to be better to just cache it directly in system memory. There might simply not be enough metadata to warrant any optimised secondary storage for it at all. The breakpoint is likely a moving target. Reliability questions making things a little less clear cut.
I agree. There's a spectrum of usecases for metadata too. With COTS motherboard DRAM capacity expanding to 512 GB or more with Sandy Bridge, the opportunity to provide DRAM metadata caching in small cluster has improved. 512GB is still not enough for very large stores ( and the idea of a single metadata stack of such large size is daunting from a performance viewpoint), so some sort of tiering is likely to be needed.
It would be interesting to apply this debate to small and large cloud setups. Object store metadata grows to large sizes, so a flash architecture tends to make sense. This would certainly be the case in a store approaching Amazon S3 in size. Similarly large tiered streaming stores require a fast DRAM-based tier for hot sites, and a secondary flashcache tier also makes sense. In both cases, one can argue that the flash could be in either the server or the arrays, since both can have Internet protocols and connectivity.
I think this debate is a result of x86/Ethernet architectures becoming the norm in storage array front-ends, offering many more connectivity/topology options. Effectively, we are configuring storage to a new set of rules, and this creates great opportunities, and no small amount of heartburn, for the sysadmins and IT architects.
Actually Panasas is coming out with new director blades that use flash to store more of the metadata. They had this before, I believe they were series 9. The new ones though are suppose to really only put the metadata on the SSDs so it can push the number of operations up. So panasas does have a plan for flash..just not in the traditional way.
However they have performance issues.
1.) dealing with unstructured data
2.) cache coherency issues from either the client servers being to busy to the director blades being pegged.
3.) working with very small files.
4.) Ever have a double blade failure? While some processes/jobs can be run again if they have been running for months and months you don't want to start over.. Object raid6 or (Vertical parity 2) as an option when creating volumes..
Things I would like to see:
1.Panasas needs to figure out a way to load balance director blades dynamically.
2. Each volume having it's own service ID so you can mix and match manually the best load for each DB
An obvious fix for the performance increase/cache coherency issue is to increase the number of director blades for performance, but give them a dedicated 10GE or IB back-channel link to improve cache operations.
Load-balancing likely involves getting into the switch and controlling the load routing round-robin, if that is an option.
Of course, Panasas topology needs to support these types of approaches.
Hi! Some good comments here. The bottom line is that the requirements for Enterprise IT and HPC are normally very different from each other. So considering an Enterprise IT-focused flash-based storage system with features like compression and de-duplication, it’s pretty unlikely that such a product would be equally applicable to HPC storage. Flash is simply too costly for it to be used for the predominantly large file throughput requirements in HPC where storage is usually measured in a combination of GB/s and $/TB. However, there is a very interesting role to be played for flash memory in accelerating small file IOP-focused workloads in HPC. The same goes for speeding file system metadata performance which is useful for both large file throughput and small file IOP HPC workloads.