Data is big business. These days they've even started calling it “Big Data”, just in case its potential for unbridled magnitude had escaped anyone. Of course, if you have Big Data you need somewhere to put it. Hence storage is also big business. On the one hand this is a good thing, but that's just because several of my …
Stop crying about the cost of big data.
I don't know how many times I hear the crying at work about how a 2 terabyte server cost too much money. Hell, I have 2.5 terabytes on my server at home, and If I can aford it, I don't think some multi million dollar corporation that makes money off that data should be complaining.
Re: Stopy crying.
"I have 2.5 terabytes on my server at home, and If I can aford it"
Out of interest, is that available space or capacity?
But which is more expensive
The hardware for extra storage, or the staff time required to reduce the hardware requirement?
No arguments when it comes to 275 almost identical windows server images and the like, but when it comes to the other stuff...
I for instance, keep all non-trivial email, and its saved an awful lot of effort on numerous occasions when I've been able to go back and find out what why or who on decisions made previously... If those who will not learn from history are condemned to repeat it then it makes sense to have that history available.
Re: Stopy crying.
And what is the IOPS of that setup, and is that RAID10 or just RAID5?
Re: Stopy crying.
Yeah and I have about 9TB at home, but it's comodity SATA. I certainly couldn't afford 9TB of storage at the costs that a proper disk array such as a VMAX would incur, even the SATA disks in a VMAX.
That said, the fact that you specify the server as having disk suggests that you don't do SANs. You've probably not considered port costs, switch costs, ISL costs, redundancy, duplication of the whole SAN fabric, RAID/Mirroring, etc. etc.
Big data is big money and multi million dollar operations become multi million dollar operations by not chucking money all over the place on a whim, rather by questioning their costs.
Re: Stopy crying.
Exactly right - show me a properly costed business case every time.
No compelling business case, no money for big data!!
Big Data Article?
I would hope that an article on the Register talking about 'Big Data' would be jumping all over the industry for their recent hype machine and latest fad. 'Cloud' is so last year, 2013 has obviously been designated 'Big Data' year.
It's amazing that that amazing breakthrough happened in Big Data late last year...oh it didn't.
Well maybe no-one has had massive amounts of data before...oh they have.
Well maybe there wasn't a way to store and manipulate it before...oh there was.
Data warehousing has been around since I can remember, if a company has only just realised they have large quantities of data just because of a flashing Intel advert then they must have been hiding out somewhere dark.
'Big Data' the worst of the buzzwords so far...
Re: Big Data
Big Data very little to do with data warehousing, it's about the processing of unstructured data (as opposed to nicely defined tabular data)
You try storing billions of ad hock images, raw web log files and signals from remote devices in your relational database and you might start to understand what Big Data is about
"What's interesting is that even today it's rare to see a software product's data sheet cite the IOPS (per-second storage operation capacity) requirement of the product"
How would this really be possible for most products? There's so many variables that the figure would be largely meaningless.
If you think deduplication is a no-brainer, you've just never tried to implement it. Like Fat Data itself, it's a tool that can be used well (reducing storage cost) or poorly (killing system performance), and users deserve to be educated about the difference.
What I've found interesting as a solution is to create large thin provisioning pools, use a system such as EMC's FAST to move hot tracks up and down disk tiers from flash drives at the top to SATA at the bottom and provision 1:1. This spreads the data across a large amount of spindles, which gives you speed and IOPS, it works quite well, you also don't need to worry about running out of space in your thin provisioning pool, which is always a worry for me.
...is a term that makes me think of trendy teenagers verbally masturbating over SSD benchmarks in their gaming PCs. It doesn't mean anything real; the performance you get out of a SAN is going to depend more on the workload you give it on top of variables like how you partition it, which filesystem you use, the size of the controller caches, the underlying network media and so on. The big name vendors all have their own proprietary technologies that dictate, on top of these variables, how well they map on to the underlying technology. Raw hardware capabilities mean very little in real world environments and that's why vendors are reluctant to harp on about them. EMC, Dell/Equalogic and Netapp might quote similar figures if pushed but the experience you'll get with each platform will be markedly different in a fair comparison.
Don't simply blindly expand the capacity of disk on your SAN.
If you are dealing with big datasets (and I do) you should look at a Heirarchical Storage Management system.
You can select a secondary tier of cheaper SATA disks, or a tier of MAID (didks which automatically idle down when not used) and a tier of tape in an automated library.
Less often used data will be pushed to tape automatically.