If you think 3TB drives are big think on this; 4TB ones are coming. Isilon's marketing head, Sam Grocott, was asked today about customers moving old data to tape and said: "With 4TB drives around the corner will people bother?" They may not indeed as an Isilon clustered system built on 2TB drives could theoretically double its …
I'll be glad to see the back of tape backup
It seems such an extremely dated, slow and expensive technology. With products like Microsoft DPM making restore so much easier, and the abundance of cheap discs, we should be backing up to hard drives and putting them in fire safes/off site instead. Can't happen soon enough IMO.
....prefer AMANDA or Clonezilla myself!
Well, I use clonezilla to make an initial image of the hard disk right after I'm done setting up things, but I wouldn't use it for hard disk backups. The lack of the ability to make incremental backups couple by the fact performance is piss poor on certain setups counts against it. When it comes to periodic backups, I tend to revert to good, old-fashioned DVD-Rs.
Except that I/O doesn't always scale with spindles...
It's important to remember that more spindles doesn't always mean faster I/O, especially where low latency is concerned. Many of today's RAID-5/6 controllers wait for all spindles to respond back before reassembling the data packets after an I/O request. More spindles means more wait time and higher latency... while over all transfer rates are quick, things like web servers and Databases will do well to steer toward SSDs (or in the case of web servers, start running a large memcache)
"Many of today's RAID-5/6 controllers wait for all spindles to respond back before reassembling the data packets after an I/O request."
If D disks requires a max of T time to return a sector, then D*X disks necessarily can return X sectors within that same time t. All that would be required is a raid controller that can handle the combined bandwidth.
For sequential access, one could theoretically scale performance linearly with the number of disks.
For random access of individual sectors, the performance depends on whether or not the raid controller allows each disk to seek a different sector in parallel and whether random requests are queued synchronously or asynchronously (ie an app performing random synchronous read calls could not be accelerated). Assuming a full random request queue for evenly distributed data, I'd still expect the performance/time to scale linearly with number of disks.
"At the enterprise level today's arrays with 2TB drives could just double their capacity to seemingly fantastic heights. A downside is that disk I/O isn't getting any faster and the ability to stripe data across spindles to increase I/O rates will become more important."
Striping I/O allows you to balance the I/O better, but it doesn't solve the I/O access density problem which is caused by simple geometry, and the fact that we've got pretty close to the practical limits of disk rotational and seek speeds. Many of us already stripe storage arrays to within an inch of their life and use ever more sophisticated caching software and hardware, but it is rapidly meeting the law of diminishing returns.
Simply put, areal density goes up to the square of linear bit density whilst sequential read and write speeds only go up to linear bit density. Random access is essentially fixed. What this means is that if you quadruple density the achievable sequential read/write access density on a per GB basis is halved whilst the random access density is reduced by a factor of four. To read a 4TB drive from start to end is going to take over 12 hours. If you have to rebuild a RAID set containing several of these then it could take twice that long given that this is normally done "hot".
Given that multiple read/write heads aren't viable on HDDs (unlike tapes), then the only way of fixing this is more, and smaller drives (short of SSDs of course, which is hardly in the same price/GB space as 4TB drives). The random access storage area is in dire need of something fundametnal to break this I/O bottleneck issue (which applies to optical disk storage even more than it does to magnetic HDDs).
Parallelism is key
"Given that multiple read/write heads aren't viable on HDDs (unlike tapes), then the only way of fixing this is more, and smaller drives"
(I'll assume you meant multiple heads seeking different sectors simultaneously, since hard drives do have multiple heads already)
Why not have multiple heads on each side of the disk that seek independently of one another? Alternatively, mirrored drives could spin in a phase locked loop. Either way, you would halve the seek access time. However I suspect the aggregate performance is what people really care about.
"What this means is that if you quadruple density the achievable sequential read/write access density on a per GB basis is halved whilst the random access density is reduced by a factor of four. To read a 4TB drive from start to end is going to take over 12 hours."
I don't understand what the problem is. A hypothetical 10TB drive will always match or beat the performance of a 1TB drive. The random seek times should be identical, the sequential read will be 10x faster.
If one needs to improve the aggregate random seek times, one could use 10 * 1TB drives, the sequential read will still be 10x faster, and the aggregate seek time will be 1/10th for parallel asynchronous loads.
Consider a database server with high load on large datasets on a stripped setup across X disks. The db is not limited to sequential synchronous access, therefor it can issue numerous asynchronous requests in parallel. Each disk has probability 1/10 of serving a request (depending on stripping and record allocation), therefor, with enough requests, each disk could be kept busy in parallel such that there are linear gains over an individual disk.
I welcome any rebuttals, but I'd like those to address why a parallel asynchronous requests can not scale across more spindles in the same way that they can across a cluster of computers.
Hilarious pandering to an advertiser?
A guy from Netapp says that TV's are getting bigger and thinner next year, and HDS claims that Intel will be rolling out faster CPUs "in the future.
Is this really news?
...are all these >2TB drives going to computing sectors that are not reliant on old BIOS and LBA systems, seeing as how these systems are still architecturally limited to 2TB drives? Perhaps the increasing push of these big drives would encourage consumer motherboard makers to consider switching over to EFI (which enables GPT partitions that surpass the 2TB limit).
PS. I know there are supposedly ways to get around this, such as not using the drive as a boot device and so on, but I'm trying to keep things simple for the sake of argument.
Even more important
Even more important is the increasing difference between advertised capacity in TB and the real capacity in TiB. A 4TB disk is in reality only a 3.6TiB disk, or 90% of the advertised "capacity".
Either drive manufactures need to start advertising capacities with base 2 prefixes or computers need to start tracking them in base 10 prefixes but this nonsense has got to stop.
The world is gradually moving towards the use of base 10
File a bug report against any program that doesn't offer base 10 prefixes at least as an option.
No one could need more than 640Tb! ;)
The funny irony of that Bill Gates quote just keeps on giving. :)
I'm wondering if 20 years from now, that'll be, no one could need more than 640Eb! ;)
wow, that's a mind-numbing amount of data to lose when the drive heads crash into the platters.
Oh, wait, I already said that in 1996 when the 4 gigabyte drives hit the market. (hey, back then we programmed in Assembly and every byte counted, and four billion (american billion) of them was a staggering number).
Silly me, I thought I was in a time warp again...
Dropping hard drives.
Aren't the platters made of glass? Not the most shock-friendly material in the world. Okay, tapes are magnetically sensitive and slower than pigshit rolling uphill in Winter, but at least they can be shielded.
What's next? Caddies with a foot of shock absorber at all sides?
You don't have any of those really, really shiny, thin circles of steel around the office? From old hard disks that people took to pieces?
Hitachi uses glass platters for their drives, a technology they inherited from IBM when they bought over IBM's hard disk division. Unsurprisingly, that's why smarter people avoid the drives like the plague, particularly those who remember the entire IBM Deathstar hoohaw.
A 4Tb single drive seems to be an awful lot of eggs in one basket.
Same old problems as 4GB drives...
> A 4Tb single drive seems to be an awful lot of eggs in one basket.
Then buy another basket.
"Single" mustn't translate well.
You forgot to add...
"Not that big of a problem".
I don't think something as important as eggs will be kept in this basket, smaller baskets are fine for eggs and no one ever buys 1000000x more eggs than they need.
Other than that - a single 4tb seems to be an awful lot of porn in one basket. Then again, I'd just consider a crash as time for a refresh.
Is it just me, or are we reaching a point where the average individual simply cannot keep up with the amount of data they are storing?
It's all very well for companies and organisations to keep huge amounts of data, but for individuals?
It reaches a point where you have so much data, it's impossible to digest/sort/verify all of it.
I'm more than guilty of hoarding vast amounts of 'stuff' I intend to 'sort out later' - I've got about 30gb of personal photography stuff to sort through alone - thousands upon thousands of photo's I've taken I need to sort.
Without the abundance of storage space available to me, I'd have no choice but to quantify what I store - instead, I take the lazy option and just keep on adding more data to a pile I have no hope in hell of ever digesting.
That's probably just my bad housekeeping, but it does seem we're burying ourselves in absurd amounts of information. Worse still, software gets more and more bloated, file types get larger and larger, dare I say it, programmers get lazier and lazier, relying on seemingly endless amounts of bandwidth and storage space - "How could we live without it?"
Heck, we used to.
We're being spoiled by this abundance of storage, hoarding data like a veritable treasure store, data we'll never use, probably don't need, but will worry over.
Erm, I think I need another drink... now where's those 250 episodes of Lost I need to get through...
Some stuff is just inherently big.
> Is it just me, or are we reaching a point where the average individual simply
> cannot keep up with the amount of data they are storing?
Actually, short of my video collection, most of my stuff is very small. I can back up all of my important stuff on the unused disk space on Revos and Mac Minis being used for Media
Virtual machines can also get quite big too.
Dunno what everyone else keeps on their multi-terabyte drives.
Re: Data insanity...
Agreed. Part of the problem is that the drives are getting bigger faster than the speed at which we can scan through or copy the data
And do you have that problem that you look at the size of a file or a directory and it just looks like a big long telephone number and you have to squint at it to count the digits and then you still get confused about whether that's X hundred million kilobytes or just X hundred million bytes?
And when you get a new system you just copy the old system into a directory called /old, which of course itself already contains a directory called "old", and the chances of you ever getting round to sorting out the contents of /old/old/old are low low low but you know there are probably some crucial items in there that you never backed up ...
Large amounts of data...
I too keep large amounts of data, and nearly every drive has an "UNSORTED" folder, although I move these contents around a lot as and when I need space, and sort them from time to time.
I also have a lot of photographs I take, BUT... I am a bit more strict in sorting them out, using a specific folder layout.
Which is then archived per set and copied to "Photos_Backup", I also do preparation to help with sizing for DVD archival.
Of course though, over time I have "Photos_Raw\Unsorted" to deal with too ;)
As for hoarding, I keep a "Pending Deletion" folder, these types of folders I keep on my larger drives just in case the drive dies :p
One of the most annoying things about having a drive fail, is trying to remember what was actually on it! I have Locate32 running which will give me a "missing" search on it, would be nice if I could have just a backup of the DIR structure so I can read it :p
What's annoying me is that whilst you can not buy large format drives for very little we still see NAS boxes selling at exhorbitant prices. I mean come on. You can but a 1TB drive for a little over £50 but if you want to put 4 of them in a RAID box it will cost you £200 upwards (For a decent one). I know you can do soft RAID and use any old server but then you have the issue of power, heat, noise and bulk.
When are we going to see affordable NAS's for 4-5 drives?
The more expensive units you are paying for processing power, look at the high end ones and they quote processor speed and amount of RAM etc. Also if you looked at buying a high end RAID card for your own machine you will see why it's pretty expensive.
And for RAID you either want a high end processor or a high end hardware solution. Though generally you still want processing power.
So basically at the high end you are looking at what is pretty much a "PC", with a custom "embedded" OS.
Try a cheapish NAS vs an oldish say 2ghz P4 + 512mb RAM running FreeNAS distro (or others) - FreeNAS will wipe the floor with your cheap NASes.
Look at "JBOD" if you want cheap...
What about reliability?
It seems to me that hard drive reliability is way down lately. I have had 2 dead Seagate 1TBs and 1 dead WD 320GB in the last year, and 1 WD 320GB that keeps producing bad clusters.
If I had a wish it would NOT be for bigger and bigger hard drives, but more reliable ones. What are the manufacturers doing about that, I wonder?
Consumer RAID lagging
Consumer RAIDs rebuild at 1 to 20 MB per second. That makes rebuilding a 4194304 MB disk not so fun.
With that much data you need something with a bit more welly than RAID6.
Have 4tb drives road-mapped in Q4-2010, so either they know something you don't, or they've just got a fantasy roadmap
anonymous because I had to sign a NDA to read the roadmap...
You signed an NDA, which is a legal document, and then posted information from the subject of the NDA on a public website.
Assuming you are telling the truth (no reason to assume otherwise) you could be in bother if Dell decided to apply to the Register for the IP address and username details of your post.
Perhaps it isn't a big deal, but who knows how the legally type people will view it, bit of a risk just to throw your 2p into the ring isn't it?
Bigger hard drives next year than this year
More news at 11.
If a 4TB disc breaks, and you have to rebuild your raid-5 with a new disc, it is going to take a lot longer than 12 hours. Maybe a week or so. During that time, if you another disc breaks, you have lost all your data. Therefore you need at least raid-6 with such big drives.
However, if you look at the data sheet for any drive, it says "1 irreparable error in 10^15 bits". Om such a large disc, there are lots of bits. This means there will be corrupt data. CERN has a study which shows their data on Linux hardware raid had lots of silent corrupted data. Therefore, you need something better than raid, that also checks for corrupt data - which raid never does. But ZFS does this. Use ZFS if your data is as valuable as CERN's. CERN is now migrating to ZFS, to keep their data safe from silent corruption. Just google a bit on this, and you will see CERN studies on silent corruption of data.
With larger drives, you have more bits. Which means you have more corrupt data. With 3-4TB discs, this can be a real problem. There are studies showing JFS, ext4, ReiserFS, XFS, raid-5, raid-6 etc does not protect against silent corruption of data, because they can not even _detect_ data corruption. They are not designed for such large 4TB drives and the new problems they face. But studies shows that ZFS detects all errors and repairs, it is a modern file system.
Just ask me and I will post links / studies / etc on every claim I made.
Please post links / studies / etc. on every claim you've made.
Makes 3TB look like
Backup and RAM
OK, so you have 1TB of disk space. You are now free to create files which are GBs in size. Er... you want to read them afterwards? Computer says No. RAM is so far behind it's sad.
Recently on our Sun servers, Tomcat crashed and created a dump file - 4GB. Trouble is, I can't read this on my Windows XP box. What's the point?!
Backup. The more you can store, the more you're risking. 500GB of family photos. Even if you backup to another hard disk, you've got to keep that hard disk safe, and perhaps even replace every few years.
What backup medium is there that will guarentee that what's on it will stay on it? Holes punched into steel?
> Backup. The more you can store, the more you're risking.
> 500GB of family photos. Even if you backup to another hard disk,
> you've got to keep that hard disk safe, and perhaps even replace
> every few years.
You run Sun servers? So clearly you should know that a single backup is not reliable enough. In order to be sure that you've got a good backup you need to have a few successful attempts. Beyond strange untrapped errors during the backup itself, your tapes might be mauled by a gorilla while they're in the hands of Iron Mountain.
So take multiple backups.
Store some of them offsite.
Don't exceed the limits of the media (like don't reuse cheap tape).
It works for companies. It can work for you family photos too.
Size of the backup really doesn't change the issues.
- NASA boffin: RIDDLE of odd BULGE FOUND on MOON is SOLVED
- SOULLESS machine-intelligence ROBOT cars to hit Blighty in 2015
- BuzzGasm! Thirteen Astonishing True Facts You Never Knew About SCREWS
- Worstall on Wednesday YES, iPhones ARE getting slower with each new release of iOS
- Tor attack nodes RIPPED MASKS off users for 6 MONTHS