Feeds

* Posts by random_graph

10 posts • joined 19 Nov 2009

SAN-free, NAS-free? Scottish PHDs lift kilt on how they'll pull storage out of the aether

random_graph

Oh man, not another one of these

This thing has tried and failed and the past. Noobaa.com is another one, fortunately at least with the good sense to go after the consumer space rather than commercial.

One thing about P2P is that one must compensate for flaky and non-persistent end-nodes. 4x may sound like a lot of redundancy, but what happens when all the employees go home for the evening? Whoops! Even with things like erasure coding, P2P overprovisioning always needs to be prohibitively high. Try convincing that user he needs to contribute 100GB to get 25GB of p2p capacity (yeah but it's FREE!)

And to use this as a VSAN built out of WAN-connected laptops? Yeah good luck convincing that the CEO whose Exchange server just died that saving $25K by skipping that array purchase was the way to go.

1
0

Flash data storage: Knocking TAPE off the archiving top spot

random_graph

Multiple use-cases, multiple architectures

There are 3 drivers for long-term retention of digital content

1) I want to reuse, repurpose, and re-license (eg movie studio)

2) I want to analyze (eg Amazon and Facebook)

3) I want to preserve because that's what I do (US-LOC) or because I promised (Shutterfly)

Each brings different sensitivities in terms of performance SLA, cost, and data integrity. For use-cases 1 & 3, tape-based latencies are entirely acceptable as long as sequential performance is good enough for bulk data operations. Analytics will almost always need more consistent SLAs.

In all cases however, the placement, maintenance, and performance expectation for *metadata* is much more aggressive than the SLA and placement rules for asset itself. For these sorts of storage solutions, the query has always dominated as the first step in data IO, although to-date most often performed against a host database. In the future, storage solutions optimized for these use-cases should recognize that semantic through distributed indexing mechanisms implemented in flash. And although the LTFS file-system may be is useful at a component level, it is inadequate for the task at an aggregate solution level.

0
0

Death by 1,000 cuts: Mainstream storage array suppliers are bleeding

random_graph

It's those dang VmWare guys

Diversity in networked storage solutions was always a function of workload diversity. But with virtualization reaching 100%, the Hypervisor is the only client that midmarket storage vendors need to design for these days. Of course there are still differing performance profiles for different guest workloads, but the point is that the host environment (networking, provisioning, protocols...) has gotten much simpler. So these 3 things represent the bulk of all cannibalization of the legacy array market:

1) Practical adoption costs for using public cloud are much reasonable once you're virtualized

2) Startups can optimize strictly for the virtualized use-case (thus creating differentiation)

3) The hypervisors can handily disintermediate the value chain (EVO-Rail!) and put everyone else out of business (or force them into OEM servitude)

So what's left for the storage vendors? Probably just the vertical workloads (Web2, M&E, HCLS, GIS, analytics, PACS, etc etc).

0
0

It's ALIVE: Unstructured data upstart whips out data-AWARE array

random_graph

Interesting, Just not sure it's solving a real problem

Some interesting new things here, just not sure there's a large market for the unique capabilities. And without the unique capabilities being leveraged, it looks functionally like Isilon, just 13 years late. It seems like text search (not attribute), native audit trails, filter-based tagging, and data demographics are the unique things here. Every SysAdmin wants better data demographics and solid audit, but how many SysAdmins care about Ediscovery or content search? That would be the LOB owner or General Council most likely, people accustomed to host-side tools already. Another surprise is the commitment to proprietary hardware; you'd think the SDS buzz would drive them in the opposite direction. Guess it's that EqualLogic DNA in the founders.

Definitely not the first system to do cool metadata tricks or full-text index (see HCP, Caringo, Honeycomb...). The coolest thing they're doing that I see is by reverse-engineering the VMDK file format they extract embedded file metadata and allow individual file restores. The tagging is also cool. All bets are off though if they're not using the file interface. Unless they install a hypervisor plug-in, they can't get any intelligence out of the iSCSI interface.

0
0

Dell's new Compellent will make you break down in tiers... of flash

random_graph

Re: dedupe block or file based?

The technology is variable block, sliding window plus compression, implemented in the file-system. So it happens at the file layer, but definitely not SIS. And yes developed by the Ocarina team in Santa Clara.

0
0
random_graph

Posr proc vs in-band

We disagree with your assessment Anonymous. As you know Dell already ships products with in-band dedupe and compression, so this is not a technical barrier. In-band is an appropriate implementation for secondary storage. In primary storage, demand histogram invariably follows a 10%/90% (or even 1/99) rule. So like tiering and caching strategies, FluidFS implements a more elegant data reduction design that aligns with the information lifecycle, applying more aggressive savings to the 99% yet maximizing performance for the 1%. And of course your policy-settings are adjustable. [Dell storage dev]

1
2

Looking closely at HP's object storage: Questions Answered

random_graph

What is actually implemented here?

It looks to me like Storeall is merely the marriage of Autonomy and the iBrix file system. But there's no reason to believe the integration goes any deeper than physical. They are merely running index, search, and the RestFul API on the iBrix segment servers. Ibrix is no more object or metadata aware than a Netapp filer.

In regards to what constitutes "Object", There's no reason Object and NAS need to be distinctly defined. The most popular object storage in the world is Isilon, which uses Reed Solomon and distributed placement algorithms with self-healing. Ok, the fact that it's accessed through a heirarchical FS puts it in the NAS camp. But even the file system interface can be merely an expression of object metadata. For example, Sun's Honeycomb could provide an NFS export whose heirarchy was purely a virtualization of a metadata schema...N file systems could be exported with N arrangements of the schema.

I fully expect to see future NAS (and SAN) leveraging more and more object capabilities, and the availability of RESTful APIs representing "the new converged" regardless of underlying architecture.

0
0

Does anyone really want to embed dedupe code?

random_graph

Ocarina & Permabit

Great coverage on the respective issues and possible outcomes Chris.

Permabit and Ocarina are both thinking along the right lines; dedupe is a killer technology for primary storage, and it will increasingly be an embedded feature.

Permabit's comments about Ocarina seem a bit out of place though, "...but our technology is mature and being delivered..."

- Permabit was formed 9 years ago to make a better disk storage system. Dedupe was an feature added onto their object storage system <1 year ago...Now in the latest redo, they're throwing out the product to try to create success on the feature. Ocarina on the other hand started life focused 100% on data reduction for primary storage.

- Permabit's dedupe is <1 year old, and it's never shipped integrated with anything other than Permabit's storage box. Ocarina has been shipping for 2 years, and *every* delivery was integrated with someone else's storage system (Ocarina doesn't store the data).

- Ocarina delivers dedupe *and* compression (a big portfolio of algorithms at that), with proven results on over 1500 file types, including pre-compressed data and specialized data sets in different industries. Furthermore, the end-to-end strategy that Ocarina is talking about is really a next-generation architecture. Permabit's feature-now-product has a long ways to go in technical sophistication to catch up to Ocarina.

0
2

Ocarina compresses Flash and MPEG2s

random_graph

Lossy & Lossless - both available

[I work for Ocarina]

Thanks for everyone's comments.

Just to clarify what we're doing; for web-distributed file types (GIF, JPG, FLV{h.264}) we optionally use lossy techniques. Some lossy opportunities we use are reduction of non-visual info (eg huffman table optimization), spatial optimization (aligning DCT quantization with the HVS), better macroblock management, motion compensation, and more intelligent VBR. Our intent is to never introduce visual artifacts and we even have some portrait studios (making big before/after prints) who have validated the algorithms. With this 'visually lossless' approach, we keep the files on disk in their native format so customers can capture benefits not just in storage savings, but also bandwidth reduction and page-load-time improvements.

For production workflows where loss generally isn't desired we apply a fully bit4bit lossless workflow and use all the proprietary compression we can for maximum reduction. For ingest formats like DV we can get 50% or more. For MPEG2 we're seeing around 20-30% at Beta customers...enough to be meaningful for say a broadcaster's archival system.

And we definitely don't rig the tests ;-) the results are based on customer data-sets only. We work across a thousand file types (so far) and no one here has time to craft a bunch of application-specific data-sets from scratch. Results will vary from customer to customer, and someone who is a real codec expert can almost certainly approximate our results on a specific file type. But we find in practice people don't do that, and that still doesn't provide a scalable dedupe & compression platform that also works on all the other 100 file types in a given customer's workflow, and integrates well with their existing storage system.

I wrote the white paper on Native Format Optimization that talks about the visually lossless approach. I think you have to fill out a form to get it, but you can check it out at www.ocarinanetworks.com

0
0

How can the storage industry prevent cloud bursts?

random_graph

Cloud Storage means everything to everyone

Given the wide variety of definitions for the term, the intersecting justification for the technology can easily be made to be zero. Here are the propagating definitions:

1) Cloud = any web service where a user's data is retained (incl Facebook, Goog docs, etc)

2) Could = storage in a utility-based pricing model ($/GB/month)

3) Cloud = a storage technology with 'cloud' attributes; scalable, self healing, low cost, extra failure resilient, either implemented within an enterprise or over the Internet.

The last definition is the key one...Whether the storage platform is internally hosted or externally hosted, the customer requires technical due diligence and transparency. If they don't get it, they'll choose to walk away! Thus storage vendors (including SSPs) will learn that transparency and SLAs are a key requirement to maintain market share.

Everything regarding a) where it is hosted, and b) how you are charged for it are simply details of implementation, and a function of the providers' business models.

0
0