You should have thought of that!
And me, and everyone else.
I'm just imagining blocks of these in 1TB SSD format - oh man :)
Seagate is building hard disk drives with a direct Ethernet interface and object-style API access for scalable object stores, a plan which - if it works - would destroy much of the existing, typical storage stack. Drives would become native key/value stores that manage their own space mapping with accessing applications simply …
"Before all these fancy interfaces (when disks went though ribbon cables) frame grabber cards would have disks mounted directly on them and write to the disks as purely lien orientated digitised video data."
Well I've just learned something new today. Thanks for the post.
I am guessing they don't consider data object bigger than a single HDD then?
Presumably the protection against HDD failure is now based on object duplication, so a 2 times storage penalty, rather then something like RAID-5/6 or RAID-Z2 where you get a 1.2 sort of penalty?
That was my figure of 1.2 times (or thereabouts), say 6 disk for 5 disk's capacity = 6/5 = 1.2 or with double parity and more disks per stripe 12/10.
Always go double parity if you can, and scrub periodically, as a HDD-failure RAID rebuild is when the trouble starts!
I just lost(*) 2 drives on a 6-drive raid6 system simultaneously (or as near as dammit). Are you sure you want to bet on simple double parity?
For my home system I've been using RAIDZ-3 for a couple of years.
(*) 250Gb WD RE drives. Deader than dead things in Deadville.
Parity & RAID is a bet, based on the probability of multiple failures occurring at once. The quoted figures you get for availability are based on the assumption of statistically independent failures.
We all know that is bollocks, of course. As HDD are often from the same batch so may suffer manufacturing defects, and failure can be provoked by events such as fan failure, PSU surges, etc, that are common to the array.
So RAID != Backup and never forget that!
The trade off with going to triple parity depends on your work load and the CPU/controller, etc, but often it demands larger stripes to be efficient but that in turn hammers the IOPS capability. You can get a lot of that back with SSD for journals/ZFS Intent Log use though.
In most cases you get one failure and then others croak when the load of a rebuild kicks in, in that case double parity is a great help.
But you also get an array being powered off after years of use and a number of HDD just giving up the ghost and not spinning up, at that point you really are looking at a new array and restoring from backup :(
a) Another SAN. This time it is all ethernet, but that won't be routed through our regular switches, ergo another new SAN.
b) No filesystems. FS do offer benefits that kv stores can't. Snapshots, being able to browse, permissions, end-to-end checksumming...
c) No real difference over iSCSI or AoE and giving a raw block device to the kv store of your choice as backing storage.
d) Loss of bandwidth or ridiculously expensive adapters. Running this on GigE, you won't hit 125MB/s. Run on 10GigE, that will go up to about 1250 MB/s, and I'm not even including overheads here. Presumably you would have one, possibly two aggregated connections to the SAN switch, you're going to be hard pushed to even match the bandwidth in a single MiniSAS cable.
e) No mention of redundancy, which normally means if you want redundancy, store multiple copies on different disks. Eww.
"You don;t back up very large data stores (ie petabyte and larger sized collections of data). You keep two copies of it."
That only works until some twat types "rm -rf /" on a replicated system.
There is NO substiute for offline backups when it comes to disaster recovery.
Decent backup policies always keep at LEAST 2 copies offline, even when a new backup is being made.
No I'm not backing up multiple petabytes, but I'm not far off breaking the 1PB mark. The first hint is "don't limit yourself to s single tape drive", the second is "synthetic full backups are a good thing"
> That only works until some twat types "rm -rf /" on a replicated system
Yes, but that sort of twit doesn't usually know how to remove the snapshots and probably doesn't know they exist. Snapshots will save you from the twits and data corruption, but not fire, flood or anything that trashes your hardware, so I agree wholeheartedly with:
> There is NO substiute for offline backups when it comes to disaster recovery.
ZFS + LTO FTW
With the cost of "secondary media" dropping so much, and its density decreasing as well, the industry may soon approach the condition where everything is written out only once and it stays forever. You just buy more and more drives as you write more and more data, keeping all the "old" data as you go.
Sure there might be problems when going to higher density drives where you copy the larger physical storage media to the smaller one that has more room, but it comes close to the "what me worry" type of storage. You get a history of the data for free.
Of course keeping it all sorted out may be another matter, but that is just "programming".
"Also, if I'm not wrong, the UK research councols say scientific data is to be kept for ten years."
Which some goits translate as "in their original data format and media" - which means I have ~2000 first generation Exabyte cartridges around that said Goits won't let me throw away, despite not having an Exabyte tape drive to read the damned things (The data all fits on one LTO5 tape, of which there are several copies)
"Which some goits translate as "in their original data format and media" - which means I have ~2000 first generation Exabyte cartridges around that said Goits won't let me throw away, despite not having an Exabyte tape drive to read the damned things (The data all fits on one LTO5 tape, of which there are several copies)"
Agreed. I have been through a transition from LT03 media/drives to brand new LT05 abotu two years ago - happily remarkably easy on our particular HSM setup (SGI DMF).
But I do agree - its the lack of functioning read devices which will render data unusable.
One woudl hope (yeah, I am laughing hollowly too) that this new generation of object stores would support transparent migration onto new technology, whether or not it is spinning drives, moving tapes or solid state or whatever. After all you just have to move the object, right?
And yeah, I'm I don't believe it either, but you can hope.
They DO care about the issues of silent data corruption.
A single drive (even at the "enterprise" failure rate of 1 in 10^15 sectors vs 1 in 10^14 for "normal ") simply isn't reliable enough to trust critical data to.Parity, checksums and a healthy dose of integrity-checking paranoia are worthwhile.
"Data Integrity—Unfortunately, silent data corruption is a fact of life. With Kinetic Storage, data can be stored with comprehensive end-to-end integrity checks that ensure the data was received at the drive correctly, allowing the drive and the ultimate recipient to be able to guarantee that the data is still correct."
Nicked from http://www.seagate.com/tech-insights/kinetic-vision-how-seagate-new-developer-tools-meets-the-needs-of-cloud-storage-platforms-master-ti/
> Scientists and engineers care about their DATA.
Engineers have enough sense to let IT types do their thing.
> They do not care a jot about IT types rattling on about LUNs and SANs and choice of RAID levels
They will when their DATA goes bye bye.
Scientists are arrogant gits that treat non-scientists like sh*t. They will be the ones to whine when their data becomes a casualty of ignoring "irrelevant technical details".
As somebody who has a first degree in Physics AND Computer Science (Joint Honours) - I can assure - not ALL scientists are arrogant gits nor do we treat non-scientists like sh*t.
Having worked at RAL and various MOD establishments myself and my fellow scientists worked very closely with our IT infrastructure techies - and we worked very well together, for the benefit for all concerned.
Sorry that your own experiences have given you such a broad and very tar laden brush to wield.
doesn't coraid still use...raid ? and controllers etc -- file systems too though the file systems would reside on the servers that have the coraid system mounted(like most any other block storage medium)? the gist I get is Seagate wants to make the individual drives directly addressable over the network, with some sort of peer to peer replication.
I dunno if it is a good idea or not, it seems there's quite a bit of work that goes into large scalable object platforms, I'd be surprised if they could make a truly competitive product in the space.
No need for backups, 3 copies on a global cluster is all you need.
First really new idea in disk storage I've seen in a while, swift will need a far lower server:disk ratio.
The API might be open, but I bet there's a fair few patents in the process of producing the disks.
No need for backups, 3 copies on a global cluster is all you need.
Really? And when a minor glitch in an application accidentally deletes last months sales figures, obediently performed on all three copies by your cluster, where are you then?
RAID != backup. Get it tattooed somewhere obvious.
"RAID != backup. Get it tattooed somewhere obvious."
Who said anything about RAID????
3 copies is your working copy and two backups, but ones that you can access concurrently making your storage faster.
Like I said, Lots of OLD THINKING here.
Trying tattooing that on your face so you see it every day.
Seagate are full of /S.it/
1. It will ironically be slower, because of the speed limitation of the physical connection, the network protocol overhead, and I expect the difficulty in adding transparent caching.
2. NoSQL is over-hyped and completely ignores why transactional and relational storage are so important.
3. A single drive is a massive fail for data security; RAID is a lot more secure, faster to keep multiple live data copies, and I expect a lot faster to replicate to one or more mirror RAID! ZFS RAID is even better because it is a concurrent, transactional file system.
4. The limited capacity of a single drive will inevitably require messy capacity balancing and sharding schemes, which wreck all the so-called all-in-one benefits, and the limited capacity could even lead to wasted space for data which doesn't fit the remaining space.
5. What Key and Value sizes will be supported and how the devil can these be properly sized and optimised?
Not that stupid if all your criticisms are addressed?
1) Max sequential xfer is probably limited but real life xfer speeds are way slower.
Typical IOPS is around 300 for a fast drive - at 4k per IO that's only 1.2MB/s so IP is no bottleneck.
2) Not a criticism of Kinetic drive concept? Why are "transactional and relational storage" so important?
3) You may implement any fault tolerant scheme you chose on top of Kinetic and/or use Kinetics data mover.
4) What if multiple drives appeared at a single virtual IP?
5) In most use cases the object is much smaller than the capacity of a drive. If not - see 4)
In 5 quarters Seagate has spent more that $1.2Billion dollars on R&D.
In 30 ish years they have increased the capacity of a disk drive about 1,000,000x and interacted with the design departments of every storage company we've ever heard of.
In working on Kinetic they have worked with some of the world's greatest minds in Storage.
I think Seagate is not stupid and that Kinetic is absolutely awesome!!!!!
No it's not even a NAS - not really. After all a NAS generally exposes a filesystem onto which you place named files with SMB/NFS/etc. A NAS is ... useful! These don't even have that - unless you have no directories on your filesystem and you name all your files like this:
In which case yes, it's close to being a single-drive NAS.
It's more like a single-drive SAN, incompatible with the rest of the world. With at most 1GBps (that's 10Gbps!!) per device, the network ports will cost more than the drive itself. It's about 50-50 with a 1Gbps NIC on the device but then a drive will deliver 100MBps max (less than SATA). And you better hope the switch never fails, you'll temporarily lose access to data but not know what until you try to access it.
I'm sure this is a solution. But it seems someone forgot to investigate whether there was a problem in the first place.
"I'm sure this is a solution. But it seems someone forgot to investigate whether there was a problem in the first place."
It isn't solving a user problem.
A company who makes disk drives figures if the can stuff more of what it takes to make a storage system into the disk drive they can sell them at a high price and take business away from companies who make all the other stuff.
I can see ways a bunch of small self contained storage systems can't work as well as one big one, Ways it could work better than a big one are harder to think of except it could work out cheaper.
" NAS is ... useful! These don't even have that - unless you have no directories on your filesystem and you name all your files like this:
Yes, but it is an object store. Those are unique IDSs which identify - objects. such as digital photographs (or whatever). The metadata about the photographs is kept in a database.
why should we have meaningful filenames and a meaningful directory structure in this day and age?
For instance irods:
Some folks seem to be getting confused about copies vs backups. Backups always means changed data to me. You keep 'backing up' your data consecutively (keeping a snapshot of the previous data). That way you can go back and grab any previous data if needed. This should always be an ongoing thing that is preferably done to more than one location. Data corruption can creep in which is why you should always keep old data from the very first time you backed up. Once you are satisfied the old data is not required anymore then you can dump it if more space is required. This could prove expensive but it's better than losing your business because of it. This is also why RAID is not a backup!
Biting the hand that feeds IT © 1998–2019