El Reg has managed to take a peek at an as-yet-unpublished white paper, written by former Sun Microsystems CTO Randall Chalfant, which claims the storage company's deduplication tech has near-zero latency and possibly offers the world's fastest inline deduplication. It works with 4K blocks of data. This is the sequence of events …
In many ways...
Sun was quite ahead of its time. The only sad part is that they didn't seem capable of selling all that, and that's not a healthy situation for a company.
I still recall seeing one of their "black boxes" ("portable" datacentre unit / container) at CeBIT up close a few years back. It was both amazing and impressive. Just get a crane to drop it; hook up electricity, water vents (for cooling) and a network cable and you're ready to go.
Re: In many ways...
Sun managed to pull defeat from the jaws of victory.
They had a range of good building blocks to assemble a storage system from:
- Decent x86 hardware.
- Solaris OS
- ZFS (which already has block checksums so de-dupe is easier)
All they needed to do was wrap that lot up in to a managed system. They did, and on paper it was great as it offered a range of features such as NFS, CIFS, snapshots, etc, without usurious extra-cost licenses so loved by NetApp users.
Unfortunately the system they came up (Open Storage) with was fatally flawed in a number of ways. The first and most stunningly dumb one is they did not use main-stream Solaris or ZFS, but oddly modified ones that made support and so on much harder and bug-fixes are not automatic or back-ported with any sense of urgency. Similarly they decided to have binary logs, rather than text-mode syslog style, so you as a user can't grep them for errors, etc.
The other was the main supervisory process appears to be single-threaded as it blocks on any error. Yes folks, if you have a fault more or less anywhere in the system, then SNMP and the user interface also blocks for ages, sometimes minutes, so you can't easily find out what is going wrong!
It should have been world-beating, but it was too little polish and too late to save Sun, and Oracle then managed to screw up the support quality even further and allow any opportunity to win in the storage sales area die, which is an achievement in itself :(
Re: In many ways...
that single threaded thing reminds me of Nexenta.. I haven't upgraded my Nexenta systems in a bit over a year now since I don't really trust them, too fragile. They are working, just don't breathe on them wrong.. Last week I needed to login to the management UI of one of the systems to create a new share (first time I've logged into it in months) and for some reason it decided to freeze up entirely(all NFS shares stopped responding), fortunately I was able to reboot it gracefully using the CLI. I didn't check the logs.
If we had more data that we needed on NFS we'd get a better solution but the amount of data on NFS is tiny(outside of backups say less than 100G), everything else is block on HP storage.
StorageTek not Sun
Sent to me so I'm passing it on:
Randall Chalfant was at StorageTek when acquired by Sun. Looking at his dates of service, it would be more accurate to identify him as ST rather than Sun.
Seems unlikely he was deeply connected to any of the ZFS work which was all done in CA at the time.
Judging from the two comments that are about Sun and not GreenBytes, it's probably too bad the article didn't mention that I was the CTO at STK for years, and then acquired into Sun as a CTO. With that said, I found the things mentioned above also frustrating. Cheers.
Greenbyte uses ZFS + OpenSolaris
Greenbyte has written the dedupe engine in ZFS. According to home users, the dedupe engine in ZFS is not that good, actually. Its main disadvantage is it eats RAM, something like 1GB RAM for each 1TB disk is recommended. Maybe this is normal requirements though for dedupe? Maybe all enterprise storage vendors that offers dedup always have plenty of RAM in their solutions? Anyway, for a home user, ZFS dedupe is not recommended, unless you have plenty of RAM. ZFS in itself works fine with 4GB RAM, with that little RAM, a very small diskcache ARC will be used, which degrades performance to disk speed - which is less optimal.