back to article We are entering the data-aware infrastructure era

Last week, during SFD8, I got to meet with two incredibly interesting storage startups: Cohesity and Coho Data. They do different jobs, since they target two different markets (primary and secondary storage) but both have developed one of the most compelling features you can find today on the market… And I’m sure they’ll be …

The only new thing is...

So what they've done is embedded the ability to "program" inside the storage controllers. How is this different from an external device accessing the data and performing miracles?

And, of course, what security is there if it's not part of an AD domain or Unix equivalent? Can anyone just "get at" the data now if they can access the APIs?


Anonymous Coward

Re: The only new thing is...

They'll probably market them under the Bullseye brand....

Silver badge

never used excel

Not a storage expert but have been managing 3PAR systems for 9 years, never once used Excel (or any other spreadsheet).

I do remember back in 2004 or 2005 I was at a company that had EMC (later HDS and later NetApp) storage and I saw the storage admin using excel a lot for laying out LUNs etc, I still recall that moment to this day, I saw that and I decided I had no interest in storage(I was doing servers/networking/etc at the time).

From what I read in the article nothing much excites me. Probably because only one of the companies I have worked for had data in the files that was worth some sort of analytics on them. However that company was in the business of analytics and wrote their own stuff (at the time entirely custom but they were trying to move to Hadoop). So I don't think they would give up their own stuff for some fancy storage array that is too limiting in what it's capable of doing(or not enough scale etc). Some people called them big data at the time(5-6 years ago) but I didn't. They were processing in the realm of maybe 10-15TB of fresh data a day(generally with a rolling 30 day window).

But when it comes to the vast bulk of my storage, which is entirely block based, whether it is vmware VMs, or MySQL databases, there's really no way for any storage array(that I can think of anyway) to have any insight on that kind of stuff, what's it going to do start up a mysql instance on itself and go through the tables without any insight as to what the app is, how it works etc? Mount VMFS volumes and look inside vmdks? what value does that have?

We have some file data but in the grand scheme of things it's probably less than 0.5% of our overall data set. Backups excluded of course, we have 50TB+ of deduped backups stored on a pair of HP StoreOnce appliances (~33:1 dedupe ratio) at different data centers. Backups are stored as files(NFS).

I also suspect many "cloud" companies to ignore this kind of trend too because they tend to like to do things on their own and not bring in canned solutions.

Anonymous Coward

Not just useful for data centres

I've had similar thoughts about my little home network. I've got a bunch of Raspberry Pi boards, each with its own external USB drive. I use them mainly for archival data and I've wired up an old PC power supply to some GPIO and a little circuit board so that I can use a software trigger to turn on the drives when needed.

It's nice giving the Pis something undemanding to do, like basic NAS duties, but I've wondered what else they could be doing when the power to the drives is switched on. Some ideas:

* periodically run something like shatag to calculate hashes for any changed files

* feed those reports into a central database that can do on-disk file de-duplication or identify files that only exist on one drive (and thus need more replicas)

* examine archived tar, zip, arj, etc. files and generate table of contents (and perhaps feed into hash database)

* re-compressing some files so that they take up the least amount of space

* full-text indexing on any documents found

* frequency analysis on documents (then ignore all frequent English words to get an idea of the specific terminology used in that doc; collate to try to cluster similar documents together)

* collect and collate metadata from selected file types (eg, video, audio files) or dir types (eg, git repositories)

I can see where the map-reduce model could be quite useful in several of the above. With Pis you don't have a lot of CPU or RAM to run, eg, complex databases, but they can definitely do plenty of pre-processing work and have a beefier machine collate the results.



To the new Buzzword. Now 1000% better than the old Buzzword!! With Lasers, Mike... Lasers!!

Silver badge

IT's a Big-Endian Thing and AI Field of Simple Complex Text Endeavour

Hi, Enrico Signoretti.

How do you think the next "big thing" in storage is working out for the likes of Intelligence Communities/NSAs/GCHQs, who have been doing their very own same big-endian thing for decades? Still struggling to pass the first hurdle and failing spectacularly to perceive and conceive?

IT's Sticky and Tricky, BaseData and AIMetaData, isn't it, and Springy Revolutionary too whenever booted and suited and geared for all manner of Quantum Communication in Established MalPractice Enterprise for Exclusive Executive Order Administrative Expert Export and Exploit Import, and a Veritable Venerable Mined Mind Field too, Virile and Viral and Venal and Venereal.


This doesn't make any real sense. I guess it would work for unencrypted file data, but most really useful data is either in a database and likely unreadable (and if sensitive also encrypted, so definitely unreadable) or these days in Hadoop where there's usually no storage box - it's direct attach disk to the hadoop data node.

Seems a database with a trigger on the table or an event filter for streaming data would

pretty much cover this use case without any of the complexity.


POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon


Biting the hand that feeds IT © 1998–2017