* Posts by random_graph

6 posts • joined 19 Nov 2009

Dell's new Compellent will make you break down in tiers... of flash


Re: dedupe block or file based?

The technology is variable block, sliding window plus compression, implemented in the file-system. So it happens at the file layer, but definitely not SIS. And yes developed by the Ocarina team in Santa Clara.


Posr proc vs in-band

We disagree with your assessment Anonymous. As you know Dell already ships products with in-band dedupe and compression, so this is not a technical barrier. In-band is an appropriate implementation for secondary storage. In primary storage, demand histogram invariably follows a 10%/90% (or even 1/99) rule. So like tiering and caching strategies, FluidFS implements a more elegant data reduction design that aligns with the information lifecycle, applying more aggressive savings to the 99% yet maximizing performance for the 1%. And of course your policy-settings are adjustable. [Dell storage dev]


Looking closely at HP's object storage: Questions Answered


What is actually implemented here?

It looks to me like Storeall is merely the marriage of Autonomy and the iBrix file system. But there's no reason to believe the integration goes any deeper than physical. They are merely running index, search, and the RestFul API on the iBrix segment servers. Ibrix is no more object or metadata aware than a Netapp filer.

In regards to what constitutes "Object", There's no reason Object and NAS need to be distinctly defined. The most popular object storage in the world is Isilon, which uses Reed Solomon and distributed placement algorithms with self-healing. Ok, the fact that it's accessed through a heirarchical FS puts it in the NAS camp. But even the file system interface can be merely an expression of object metadata. For example, Sun's Honeycomb could provide an NFS export whose heirarchy was purely a virtualization of a metadata schema...N file systems could be exported with N arrangements of the schema.

I fully expect to see future NAS (and SAN) leveraging more and more object capabilities, and the availability of RESTful APIs representing "the new converged" regardless of underlying architecture.


Does anyone really want to embed dedupe code?


Ocarina & Permabit

Great coverage on the respective issues and possible outcomes Chris.

Permabit and Ocarina are both thinking along the right lines; dedupe is a killer technology for primary storage, and it will increasingly be an embedded feature.

Permabit's comments about Ocarina seem a bit out of place though, "...but our technology is mature and being delivered..."

- Permabit was formed 9 years ago to make a better disk storage system. Dedupe was an feature added onto their object storage system <1 year ago...Now in the latest redo, they're throwing out the product to try to create success on the feature. Ocarina on the other hand started life focused 100% on data reduction for primary storage.

- Permabit's dedupe is <1 year old, and it's never shipped integrated with anything other than Permabit's storage box. Ocarina has been shipping for 2 years, and *every* delivery was integrated with someone else's storage system (Ocarina doesn't store the data).

- Ocarina delivers dedupe *and* compression (a big portfolio of algorithms at that), with proven results on over 1500 file types, including pre-compressed data and specialized data sets in different industries. Furthermore, the end-to-end strategy that Ocarina is talking about is really a next-generation architecture. Permabit's feature-now-product has a long ways to go in technical sophistication to catch up to Ocarina.


Ocarina compresses Flash and MPEG2s


Lossy & Lossless - both available

[I work for Ocarina]

Thanks for everyone's comments.

Just to clarify what we're doing; for web-distributed file types (GIF, JPG, FLV{h.264}) we optionally use lossy techniques. Some lossy opportunities we use are reduction of non-visual info (eg huffman table optimization), spatial optimization (aligning DCT quantization with the HVS), better macroblock management, motion compensation, and more intelligent VBR. Our intent is to never introduce visual artifacts and we even have some portrait studios (making big before/after prints) who have validated the algorithms. With this 'visually lossless' approach, we keep the files on disk in their native format so customers can capture benefits not just in storage savings, but also bandwidth reduction and page-load-time improvements.

For production workflows where loss generally isn't desired we apply a fully bit4bit lossless workflow and use all the proprietary compression we can for maximum reduction. For ingest formats like DV we can get 50% or more. For MPEG2 we're seeing around 20-30% at Beta customers...enough to be meaningful for say a broadcaster's archival system.

And we definitely don't rig the tests ;-) the results are based on customer data-sets only. We work across a thousand file types (so far) and no one here has time to craft a bunch of application-specific data-sets from scratch. Results will vary from customer to customer, and someone who is a real codec expert can almost certainly approximate our results on a specific file type. But we find in practice people don't do that, and that still doesn't provide a scalable dedupe & compression platform that also works on all the other 100 file types in a given customer's workflow, and integrates well with their existing storage system.

I wrote the white paper on Native Format Optimization that talks about the visually lossless approach. I think you have to fill out a form to get it, but you can check it out at www.ocarinanetworks.com


How can the storage industry prevent cloud bursts?


Cloud Storage means everything to everyone

Given the wide variety of definitions for the term, the intersecting justification for the technology can easily be made to be zero. Here are the propagating definitions:

1) Cloud = any web service where a user's data is retained (incl Facebook, Goog docs, etc)

2) Could = storage in a utility-based pricing model ($/GB/month)

3) Cloud = a storage technology with 'cloud' attributes; scalable, self healing, low cost, extra failure resilient, either implemented within an enterprise or over the Internet.

The last definition is the key one...Whether the storage platform is internally hosted or externally hosted, the customer requires technical due diligence and transparency. If they don't get it, they'll choose to walk away! Thus storage vendors (including SSPs) will learn that transparency and SLAs are a key requirement to maintain market share.

Everything regarding a) where it is hosted, and b) how you are charged for it are simply details of implementation, and a function of the providers' business models.