At a previous job we had a similar - but not identical - problem with a machine in Boulder. In our case there was one more step. Because of the thinner air, we got less cooling. The warmer temps made the PSU less efficient, causing brownouts which manifested as transient errors on our internal communications links. The fix turned out to be a slight adjustment the the ratio between temperature and fan speed. I was the guy on-site, but kudos to the hardware folks back on the east coast for figuring it out.
50 posts • joined 9 Oct 2009
Are they also asking for an investigation into White House staffers using Confide? Of course not, because this isn't about infosec or policy. It's purely a matter of attacking the other team and defending your own.
So all of those accusations against Hillary, or the claim that there were millions of illegal immigrants voting, should also be ignored until proven, right? Ditto with your accusation of lying. But you're missing one important thing: some information is dangerous to disclose. The evidence has been given to those whose need to know exceeded the risk of that disclosure, which does not include you. It takes a tremendous ego for someone to believe they are the sole arbiter of truth, and that they personally must be convinced of a statement's truth before others are allowed to consider it. Nobody's being thrown in jail based on rumor. It's OK for people to claim and believe what a preponderance of evidence - both public and vetted but not disclosed by our elected representatives - suggests.
As far as I can tell, this is just EC2 with features removed to enable a simpler pricing model. The fact that many of these features become available again through VPC peering suggests that it's a separate (someone else's?) data center. But the price isn't really going to destroy Digital Ocean etc. Looking at the 2GB level, which is the lowest they all have in common and is what really constitutes a starter system:
* Digital Ocean - $20/month for two cores and 40GB SSD
* Linode - $20/month for one core and 24GB SSD
* Vultr - $20/month for two cores and 45GB SSD
* Lightsail - $20/month for one core and 40GB SSD
Lightsail is below median for cores, at median for storage, all for exactly the same price. Without benchmarks - especially storage benchmarks which IMX have shown a 2-3x difference between providers or even instances within one provider - it's hard to know which is really the better deal. The real take-away here seems to be that Amazon was feeling pressure at the low end.
The overcommit at issue on a storage server is probably not VM overcommit (or oversubscription) but process-memory overcommit. If you allow memory overcommit what you're saying is that the system can allocate more virtual pages to processes than it can actually back up with physical memory plus swap. It's kind of like fractional-reserve banking, and we've all seen what happens when that goes too far. Everythingl works great until there's a "run on the bank" and every process actually tries to touch the pages allocated to it. Since it's not actually possible to satisfy all of those requests, the kernel picks a victim, kills it, and reaps its pages to pay other debts. It's just as evil as it sounds. It works to a degree and/or in some cases, but IMO it's an irresponsible default made worse by the fact that the Linux implementation has always tended to make the absolute worst choices of which process to scavenge.
In a virtual environment, things get even more interesting. You can allow memory overcommit either within VMs or on the host, or both, and that's all orthogonal to how you size your VMs. Where most people get in trouble is that they oversubscribe/overcommit at multiple levels. Each ratio might seem fine in isolation, but the sum adds up to disaster. The OOM killer within a VM might take down a process, the OOM killer within the host might take down a VM, you can get page storms either within a VM or on the host, etc. It's much safer to overcommit in only one or two places, and then only modestly, but those aren't the defaults.
Way to play the false-dichotomy and appeal-to-authority cards, Nate. I've been a Linux user just as long as you claim, and a UNIX user for a decade before that. There are other options besides a crash or hang. I even mentioned one already: don't overcommit. If there's no swap (really paging space BTW but I don't expect you to know the difference since you don't even seem to realize that allowing overcommit increases the page/swap pressure you so abhor) then memory allocation fails. The "victim" is statistically likely to be the same process that's hogging memory, to a far greater degree of accuracy than any OOM-killer heuristic Linux has ever implemented. If you want to avoid paging, limit your applications' memory usage and don't run them where the sum exceeds memory by more than a tiny amount (to absorb some of the random fluctuations, not steady-state usage). If you fail to follow that rule, adding overcommit will just push the problem around but not solve it.
There are cases where overcommit makes sense. At my last job we had users who'd run various scientific applications that would allocate huge sparse arrays. Since these arrays were guaranteed to be very thinly populated, overcommit was safe and useful. However, for general-purpose workloads overcommit makes a lot less sense. For the semi-embedded use case of a storage server, which is most relevant to this discussion, it makes absolutely no sense at all. Unconstrained memory use is the bane of predictable performance. Turning performance jitter into something that's easier to recognize and address is actually pretty desirable in that environment, and that's what disabling overcommit will do.
I feel bad for everyone involved. For the customer, the reasons are obvious. For Maxta, this is all too reminiscent of experiences I had working at small companies, and especially in storage. One of the main culprits seems to have been bad controller firmware. Even companies that control the hardware sometimes have trouble with that one. When you ship software to run on hardware the customer controls, the situation becomes impossible. The second issue sounds like the good old Linux "OOM KIller" which was an incredibly stupid idea from the day it was conceived. At both of my last two startups, we ended up having to disable memory overcommit because of the havoc that would result when the OOM Killer started running around like a deranged madman shooting random processes in the head. To be sure, Maxta probably could have done a better job controlling/minimizing resource use, but I know that's a difficult beast to fight so I'll cut them some slack. Put both of these problems in a context of confused business relationships and expectations, and it's no surprise that a disaster ensued. The lesson I take away from this is that vendors need to keep the list of Things To Avoid complete and up to date, while customers need to be clear and open about what they're doing to make sure they don't fall afoul of that list. Amateurs and secret-keepers have no place in production storage deployments.
NT, or XP?
I think the proper analogy is XP, not NT. NT was a new architecture, separate from the legacy 3.x/95/98 codebase. XP was the reunification of these divergent streams. Similarly, Android represented a bit of an architectural departure with its unique JVM-based userspace. Andromeda will represent the reunification of that with the more traditional architecture of ChromeOS (so traditional that I'm running full Ubuntu in another window on this Chromebook right now).
Still trying to get a handle on what Andromeda will mean for us Chromebook users, BTW. *That* would be an interesting story to delve into.
"The Realm Platform works with Java, Objective-C, and Swift"
So of course the image shows PHP. Yeah, I know, it doesn't matter and only a geek-pedant would notice or comment on it. Still, perhaps not the best design choice.
Re: I never quite got containers...
Containers are pretty useful, but the idea that they should all be stateless has always been STUPID. Any non-trivial application has state that has to be stored somewhere. Making it "somebody else's problem" only creates a new problem of how to coordinate between the containers and whatever kind of persistent storage you're using. If one provisioning system with one view is responsible for both, subject to the constraint that the actual disks etc. physically exist in one place, then it actually does simplify quite a bit of code.
It's misleading to say Red Hat Gluster Storage will be available this summer, or to imply that it's just now competing with Portworx et al. RHGS has been available for years, since before some of those others issued their first press release - let alone wrote their first line of code. It's just the new version that's coming.
So you finally admit that there's such a thing as open-source storage, but only to shoot it down with more mentions of proprietary competitors. :sigh:
Re: Regulation is sensible the article is not
Why do you insist on comparing vaping only to cigarette smoking? That's pure cherry-picking. Nobody has disputed that the all-vaping world is better than the all-cigarette world, but neither is the world we actually live in. Vaping needs to be considered *on its own merits* and not just in comparison to something we all know is bad. Doing X and vaping carries some risks that doing X alone does not, for all X. Those risks, which are and are likely to remain better known/understood or controlled by vendors than by consumers, are a legitimate subject of legal/regulatory interest. If you think these particular regulations are too draconian, the constructive response would be to suggest alternatives. Trying to dismiss all possible regulation makes you seem like an ideologue, and trying to suggest that vaping is a net public-health positive makes you look delusional.
I'm not going to disagree with you, there. Centralized trust doesn't work any better than centralized anything else. The only thing I'll say is that the browser makers have made the whole thing even less secure than the design allows by shipping certs for all these shady companies - many of which are clearly just arms of equally shady governments in various forsaken parts of the world. A chain of trust can still be strong if the links are all strong. It's a problem that this becomes hard to guarantee as the chains get longer, but it's also a problem that the browser vendors *knowingly* include weak links in the bags they provide.
Thanks for clarifying that.
The one nugget of truth in the article is that the list of CAs built in to browsers etc. is ridiculous. I had occasion to look recently. I'll bet at least half of those organizations are corrupt or compromised enough that I wouldn't even trust them to hold my hat - let alone information I actually value. Anybody who wants a signing cert for MITM can surely get one. That really does cast doubt on whether HTTPS is really doing us all that much good, but it's important to understand exactly where the weak link in that chain is.
Looks like Dunning/Kruger to me
As with many things, the first level is easy but then things get much harder. Can I build a simple database? Sure I can. Can I build a fully SQL-compliant database with a sophisticated query planner and good benchmark numbers? Not without some help. Can I build an interpreter for a simple language? No problem. Can I build a 99.9% gcc-compatible compiler that spits out correct high-performing code for dozens of CPU architectures? Um, no. Similarly, building a very simple storage system is within reach for a lot of people and is a great learning exercise. Then you add replication/failover, try to make it perform decently, test against a realistic variety of hardware and failure conditions, make the whole thing maintainable by someone besides yourself . . . this is still a simple system, no laundry list of features to match (let alone differentiate from) competitors, but it's a lot harder than a "one time slowly along the happy path" hobby project.
I'm not saying that the storage vendors deserve every dollar they charge. I'm pretty involved with changing those economics, because the EMCs and the NetApps of the world have been gouging too much for too long. What I'm saying is that "build it yourself" is a bit of an illusion except at the very smallest of scales and most modest of expectations. "Build it with others" is a better answer. Everyone contributes, everyone gets to benefit. If you really want to help speed those dinosaurs toward their extinction, there are any number of open-source projects that are already engaged in doing just that and could benefit from your help.
Am I crazy for thinking that "Motion Picture Ass" had nothing to do with saving headline space?
Re: Nothing new @pPPPP re MP3 players.
Once you have files, objects are easy. The difficulty lies in going the other way.
Not so fast ;)
Enrico, the problem with the idea of high-performance object storage is that the S3-style APIs are not well suited to it. Whole-object GET and PUT are insufficient. Most have added reading from the middle of an object; writing likewise has been claimed/promised for a long time, but is still not something developers can count on being able to do. The stateless HTTP protocol is also inherently less efficient than what you get with file descriptors and a better pipelining model. Frankly, a lot of the object-store implementations aren't up for a performance game either. The most charitable way to put it is that the developers were prioritizing other features such as storage efficiency. I'll be a bit less charitable and say the whole reason most of them got into object stores was because they're easy, so they wrote their code with inefficient algorithms and languages/frameworks. That lets them get to market earlier, but the downside is darn-near-unfixable performance issues. The main exception is Ceph's RADOS, which has an API more like NASD/T10 than S3 and which was designed from day one to support upper-layer protocols that demand higher performance.
Throwing flash at an object store won't let it catch up with block or file storage that's also flash based. It might be higher performance than it is now, but it will still be slower than contemporaries. It's going to be really hard for anyone in that mire to get beyond the tertiary role.
"If I have more resources than required"
Might as well stop there. That never happens for long. Where there's capability to spare, new workloads will be added until that's no longer the case. It happens with CPU, it happens with memory, and it happens with storage. Always has and always will. The real question is how to maximize the value of the IOPS you're providing when you're providing as many as you can. That means letting higher-value IOPS (e.g. for higher-value apps or tenants) take priority over lower-value IOPS, and that's QoS.
Besides the fact that what they're doing is no different than what GlusterFS (which I work on) and Ceph have done for years, they start off with two lies.
(1) Their FAQ claims that GlusterFS uses a centralized server, which is not true.
(2) They claim to be open-source, but when you follow the link a big fat "coming soon" is all you'll get.
Outfits like this come along every damn month. And they disappear every month too, when they find out that gaining and retaining users is harder than getting a few mentions in the trade press. There's no reason so far to suspect this one will rise above that vile crowd.
The application containers themselves might be stateless, but they almost always need access to shared persistent data somewhere - web pages, customer records, calculation inputs and outputs. That can be a whole separate island of specialized hardware or bare-metal servers, but why not use the storage already within the container infrastructure? That gives your storage servers the same benefits as your application containers, and allows seamless sharing/balancing of resources between them.
BTW, Gluster (on which I work) has been able to do this since approximately forever, and we have many enterprise customers using this approach. Some of them have even presented publicly about their experience. Nice to see Portworx following our lead.
Storage has always been a hard place to make a living
Especially for startups. It's one of the first places that enterprises look to cut costs, and one of the last places they're willing to experiment. And it has become a crowded space. The folks at Coho are great, but I could say the same about a dozen other startups of the same vintage. They can't all succeed. In a way, this is a side effect of lowering the barrier to entry. Now that scale-out software on top of commodity hardware (even if it has a fancy faceplate) is more competitive with specialized hardware, it seems like everybody and their brother has a storage startup with a new take on where the "real" storage problems are and how to solve them. Some of those ideas are truly new, and truly great. Some aren't. The problem is that it's hard to tell which is which, so when the lifeboat's too crowded and companies start getting thrown overboard it's not always the ones who should have been. Sadly, technical merit and business value don't usually count as much as cozy relationships with investors, analysts, journalists, and (just once in a while) "whale" customers.
Data ONFIRE? Heh. Good one.
Why do these articles only ever seem to compare against *proprietary* solutions? Another basis of comparison for semi-open-source RozoFS would be truly-open-source Gluster (on which I work) or truly-open-source Ceph, both of which already have erasure coding too. Based on experience with that, I'd say *it doesn't matter* which erasure-coding algorithm involves more addition or multiplication because those calculations are only a minor factor in overall performance. The amount of data that must be transferred, either during normal I/O or during repair, matters far more. The coordination overhead matters even more than that. If you have two clients trying to write overlapping blocks, and they don't coordinate properly, then half of the servers get erasure-coded pieces of one write and half get erasure-coded pieces of the other. This isn't even "last writer wins"; anyone who tries to read that data subsequently gets *garbage* back. The #1 determinant of performance in such systems is how they avoid this issue for every kind of operation (including both data and metadata with all of the atomicity/durability guarantees that must be met to keep users from screaming).
If the Rozo folks want to brag about their erasure-coding efficiency, let's see some actual performance data. While we're at it, let's talk about the scale at which things have really been tested. Anybody can claim hundreds of nodes and multiple exabytes but AFAIK no project in this space has ever successfully run at that scale on the first try. They *always* run into new failure modes and performance anomalies that never appeared at smaller scale and that often require substantial new subsystems to address. Then they find out that customers at this scale are going to want tons of other features as well. Some of these are still only on Rozo's roadmap, after having been shipped years ago by competitors. Others, especially related to multi-tenancy, are still missing entirely.
I think what Rozo is doing is very cool, and I wish them all the success in the world, but let's not lose sight of the fact that there's a *long* row to hoe before even the best ideas turn into a competitive storage solution. They sound a lot like the Ceph folks did *five years ago*, but Ceph (with far more resources at hand) is just now making the transition from bleeding-edge to enterprise-ready. It's not because they lack talent, I can assure you of that. It's just that these problems are *hard*, and solving them takes a lot longer than Evenou and Courtoy seem to think. I'd love to hear from the RozoFS developers about when *they* think RozoFS will be competitive with what's already out there.
Let's not overgeneralize. *This time* he didn't name names. On the other hand, he still did make some pretty strong inferences about "whoever" wrote the code, and "whoever" isn't hard to discover. That's well beyond just criticizing the code.
On the other other hand, I've been on too many projects that *didn't* lay down the law this firmly. Developers are a sneaky lot, and they tend to have their own agendas. They'll keep sneaking in code that they know is crap, if it lets them mark more of their personal tasks complete. If nobody is watching, or nobody responds strongly enough to put the fear of God into them, the result is a codebase that slowly rots into irrelevance. I do think Linus and (even more so) certain other Linux kernel developers behave in some pretty toxic ways sometimes, but as we try to improve that situation we still need to remember that bringing the hammer down once in a while is strictly necessary to maintain any kind of quality. It's all in how it's done, not whether it should be done at all.
A welcome development
This particular case involves siblings (as of last week), but I suspect we'll be seeing a lot more of this kind of thing even among non-siblings - yes, even among current rivals - in the next few years. Among other things, it means folks like Isilon will be forced to compete on the basis of software quality instead of relying on custom-tuned hardware to give them an edge in performance comparisons. Bring it.
(Disclaimer: I'm a Gluster developer)
"Amazon S3 is designed for 99.999999999% durability" (i.e. every put has 11 9s durability)
That's really about availability. It says nothing at all about when data is guaranteed to hit stable storage. You do know what "durability" means in data storage, don't you?
"few year old Beta level Ceph benchmarks are not a good measure,"
Ah yes, that's no true Scotsman all right. You asked for citations, I provided them, now you demand different ones. At least those actually compared Ceph to Gluster, on the same hardware. The document you cite only compares Ceph to itself. Why would you assume Gluster has been standing still, and wouldn't also perform better? That's convenient, I suppose, but hardly realistic. Making comparisons across disparate versions and disparate hardware tells us absolutely nothing.
"as the Gluster architect you are not clean from bias"
And I disclosed that association right at the beginning, because I believe in being honest with people. You're still moving the goalposts, citing "evidence" that's unrelated to the actual topic at hand, ducking the issue of how NFS overhead *plus* impedance-mismatch overhead can be less than NFS overhead alone. You haven't even begun to address the problems inherent in trying to provide true file system semantics on top of a system that has only GET and PUT, different metadata and permissions models, etc. This isn't personal, but misleading claims often lead to wasting a lot of people's time if they're not challenged. If you think object-store based file systems are such a great idea then you need to grapple with the issues and provide some facts instead of just slinging mud.
Re: Not so fast
"Object even S3 provides Atomicity & Durability as base attributes"
Simply untrue. You were talking about making the file store sync *on every write*. Object stores provide no guarantees on every write, because they don't even have a concept of every write. That's the flip side of any API based on PUT instead of OPEN+WRITE. At the very worst, an apples to apples comparison would require only an fsync *per file*, and even that would be requiring more of the file store than the object store. Can you actually cite the API description or SLA for any S3-like object store that makes *any claims at all* about immediate durability at the end of a PUT? Amazon's certainly don't, and that's the API that most others in this category implement.
"Would be happy if you can point me to a benchmark to back your thesis which can shows Gluster significantly knocks out Ceph"
"not fare to pick on a cloud archiving product like S3 to make perf claims."
Except that such "archiving products" are the subject of the article we're discussing. What's unfair is comparing a file system to an object store alone, on a clearly object-favoring workload, when the subject is file systems *layered on top of* object stores. All of those protocol-level pathologies you mention for NFS will still exist for an NFS server layered on top of an object store, *plus* all of the inefficiencies resulting from the impedance mismatch between the two APIs. If the client does an OPEN + STAT + many small WRITEs, the server has to do an OPEN + STAT + many small WRITEs. The question is not how a file system implemented on top of an object store performs when it has freedom to collapse those, because it doesn't. The question is how it performs when it's executes each of those individual operations according to applicable standards and user expectations, which set definite requirements for things like durability.
The only "religion" here is faith in the assumptions that support your startup's business model. It's not my fault if those assumptions run contrary to fact. I'm just pointing out that they do.
Re: Not so fast
"if you disable the client cache or sync() on every IO to be on par with object atomicity/durability (required for micro-services)"
S3-style object storres make *no* guarantee about consistency or durability. There's a word for the kind of tuning you speak of, hamstringing one side to meet a requirement for which the other is held exempt. It's called cheating. It's a way of *massively* skewing the results to favor one side, and it's why methodological disclosure is so important. Please compare apples to apples, then get back to us.
Re: Not so fast
The issue of implementing file semantics on top of weak (S3-style) object semantics is not just an implementation choice. It introduces an architectural need for an extra level of coordination, which any implementation will have to address. There are richer object APIs that offer better performance (Ceph's RADOS is one), but that's very much not what most object-store advocates (like Enrico and now Trevor) are peddling.
As for Ceph being faster than Gluster, I'll take that with a *big* grain of salt. I've seen many such comparisons, and even made a few myself. Anyone can cherry-pick a configuration or workload that favors one over the other. That's why disclosing such things is important. I have literally *never* seen such a comparison that made such disclosures and didn't contain blatant methodological flaws, and which favored Ceph. Not even from the Ceph folks themselves. Maybe if someone who worked for one of the RDMA-hardware companies (and who should have disclosed that fact before making claims) had done special tuning, and was comparing RADOS to Gluster+loopback, they could come up with such a result, but it wouldn't mean anything. Without details, I'm inclined to call BS on that one.
Lastly, yes, one get can be more efficient than lookup, open (note the order), read, etc. That's great for file-at-a-time access patterns. A few people care about those. On the other hand, that difference pales in comparison to the difference between writing a single byte in the middle of a multi-gigabyte file vs. having to do a get/modify/put on the whole thing. Chunk up the files into multiple objects and you're back at multiple requests for the whole-file case, plus a metadata-maintenance problem that starts to look like the one file systems already solve.
Layering files on top of semantically-poor objects always leads to problems. Solving those problems either destroys any potential performance or scalability advantages you might have started with. That's why most such systems have gateway SPOFs and bottlenecks. In fact they look a lot like distributed file systems fifteen years ago, before we figured out how to solve exactly those problems in a reasonably elegant and efficient way. Those who do not know the lessons of history, etc.
Not so fast
The elephant is the room is performance. Object storage pushers try very hard to avoid even measuring it (as shown by the near total lack of benchmarks). If you layer a file system on top, it gets even worse. Part of that's de to the overhead of pushing your bits through an HTTP-based protocol, losing and having to recreate half of your state at every request. Even more comes from the extra work you have to do to implement stronger file system durability/consistency semantics on top of weaker object store semantics. Of course, you can always cheat by not actually meeting all standards or expectations applicable to file systems, and most object store pushers do, but it still puts them in a poor position relative to systems that implement those semantics and protocols natively. Object stores aren't going to displace NAS until they can at least get into the same ballpark on performance, and I'm not sure that will *ever* happen.
Disclaimer: I'm a Gluster developer. We took the saner approach of implementing files natively and objects on top of that.
Re: a couple easy predictions
"Cisco will make an acquisition to get into data storage."
I'd be very surprised if they only made one. They'll probably mess up a couple before they get one to drive any real revenue.
Interesting piece, Chris. If you don't mind, I'll try to add on a bit based on my perspective as a developer in this area.
Traditional big-box on-premise storage vendors also face another pair of closely related threats: open source and roll-your-own. The relationship between something like Isilon and something like Gluster (which I work on) is obvious, so I won't dwell on it. The relationship between something like Isilon and something like AWS is also obvious: more AWS usage means less Isilon sales. The relationship between Gluster and AWS, or any of several similar things on either side, is more nuanced. Sometimes people abandon their own open-source scale-out storage in favor of AWS services. Sometimes they deploy that same software within EC2. It's both a threat *and* an opportunity.
That brings us to roll-your-own. If you were to look under the covers at Amazon's storage offerings, I'm sure they'd look an awful lot like what's out there in open source. Ditto for Google. Ditto for Facebook. And Twitter, and LinkedIn, and so on. The fact is that the techniques for doing a lot of this are now pretty well known. Many of those techniques were developed are refined at the aforementioned companies, each of which has rolled their own not once but several times to address various needs and tradeoffs. I've seen a public presentation from GoDaddy - not generally regarded as a company in the vanguard of storage research - about their own home-grown object store. I know of many more that I can't talk about. Perhaps the biggest threat to both traditional storage vendors and someone like me (or my employer) is not some one new product or project but the general idea that scale-out storage software can be assembled rather than developed. That doesn't mean there'll be no place for people who know this stuff and can assemble those parts into a smoothly functioning stack, but we'll be providing less of a product and more of a service. As in so many other areas, increasing levels of automation might put customization in the hands of more than the elite.
"Good morning, madam. What kind of storage system would you like me to build for you today?"
"We're committed to working with 'independent' third parties who will accept (explicit or covert) remuneration to run whichever benchmarks we want however we want them to ensure that our products prevail in 'objective' tests."
I've been in the storage game a while. I have (to my shame) worked at companies where I got to see just how 'independent' most test labs and analysts are. Good will and integrity didn't pay for those Porsches I saw in the parking lot, folks. This is just a new player, not a new game. I can't help but wonder whether some of the anger is because this new player is overdoing it so much that they've brought unwelcome attention to everyone else hiding under that same rock.
Yes, marketing terminology is dumb ... and that's all "server SAN" is. "Lash together storage from multiple servers and present it to the cluster" is a concept that existed for quite a while. Why claim victory for "server SAN" instead of the broader category, except as a marketing move?
Server SANs aren't going to be the answer unless/until they deal with the issue of server-resident storage being lost on server failure. As soon as you start replicating to avoid that, you're in the same territory as the existing scale-out and "hyper-converged" vendors which can implement the exact same data flow to/from the exact same devices. (Disclaimer: I work on GlusterFS, which is in this category.) If all you need is the speed of local storage without availability, you don't need a server SAN; you just need plain old local storage managed however you see fit. If all you need is availability without the speed, you're back to traditional SAN or NAS. The whole point is that sometimes people need both, and server SANs are hardly alone at that intersection. In fact they're the new arrivals struggling to piece together a real story.
TBH, I think "server SAN" is just a marketing term for something that was already possible (and often done) technically. Maybe that marketing allows the virt team to take ownership instead of working with people who actually understand storage, but I guarantee that will end in tears when data gets lost. Server SANs are the "peace in our time" of the storage-infrastructure wars.
Re: unlawful testing
Destructive testing can be forbidden by a lease/loan contract, as can merely opening the case. Reverse engineering can be forbidden by a purchase contract as well, as can resale to a specific third party (or at all). If a front company acquired a unit, they might well have violated their contract with Pure by allowing EMC to put that unit through the wringer. I'm not saying that's what happened, but "unlawful testing" isn't as absurd as it sounds.
If you think deduplication is a no-brainer, you've just never tried to implement it. Like Fat Data itself, it's a tool that can be used well (reducing storage cost) or poorly (killing system performance), and users deserve to be educated about the difference.
Re: Yes, package up other people's efforts made for free
I'm not going to get into the general philosophy or practice of open-source business models, but I will point out that "other people's efforts" doesn't really apply here. All of the projects that have any chance of meeting Chris's description are funded by someone. Red Hat is spending millions on GlusterFS development. Ceph has Inktank, Lustre has Intel (plus DDN and Xyratex), OrangeFS has Omnibond, etc. Nexenta might be the exception, as most of what they're selling is code developed at Sun, but at least it seems like Oracle isn't pursuing that market themselves. No authors are getting ripped off here.
Do you know who most of those projects *are* taking advantage of? The US taxpayer, whose money has been used to provide development resources and/or publicity for all of Ceph/Lustre/OrangeFS. Without that unwitting and unwilling support, none of those projects would be where they are now. I for one am glad that they are, even though I compete with them, and I consider it a good use of government research dollars, but someone with more of a small-government attitude than I have might find cause for complaint there. The transition from public sector to private is tricky, and one could well argue that too much government money has gone into Lustre pockets particularly.
Re: Gluster is not bad but...
There are many better places to discuss that, John - ideally a bug report, but also the mailing list, IRC, etc. The point *here* is that, even if some people misuse it or even if it actually is technically deficient in some way, GlusterFS has proven useful enough to enough people that it belongs in this conversation. It's not like other storage products don't have bugs and missing features too, and people who might say those preclude serious consideration. How does single-digit IOPS sound to you? Or corrupting data? I've hit both of those in other projects, without even trying, but I know those other projects can fix their bugs just as we can fix ours. The question is not which project *deserves* to become the Red Hat of open-source storage based on its current state (which we can discuss elsewhere), but which *is likely to* as it progresses over the next few years, and in that context it seems remiss not to mention Red Hat themselves.
I used to joke about this with AB Periasamy, founder of Gluster. He was using the line about Gluster (the company) becoming the "Red Hat of storage". I disagreed, saying that Red Hat should be the Red Hat of storage. Turns out we were both kind of right. ;)
But seriously, folks, it is kind of weird that you got through this article without even mentioning GlusterFS a.k.a. Red Hat Storage. Whatever you might think of our ability to "cause the major vendors a headache" that's clearly the intent and there's a lot of resources behind it. You even mention the company, but not the product. If I were only a tiny bit more cynical, I might think it was a deliberate snub posted for the sole purpose of giving a rival more exposure.
Disclaimer: in case it's not clear from the context, I'm a GlusterFS developer.
Don't replace the king, replace monarchy
Completely agree, Matt. Fragmentation might have its drawbacks, but diversity - the other side of the same coin - is absolutely essential during the disruptive phase. Just yesterday, I saw yet another post about how a particular technology area (in this case storage) lacked a dominant open-source technology. I've bemoaned the lack of any such alternative myself many times, but I disagree with the author about the desirability of having a *dominant* open-source alternative. I think there should be *many* open-source alternatives, none dominating the others. They should be sharing knowledge and pushing each other to improve, giving users a choice among complex tradeoffs, not delcaring themselves the new "de facto standard" before the revolution has even begun in earnest. We don't need another Apache or gcc stagnating in their market/mindshare dominance until someone comes along to push them out of their comfort zones. Being open source is not sufficient to gain the benefits of meaningful competition. One must be open in more ways than that.
P.S. wowfood, I've just started switching my own sites from nginx to Hiawatha, also mostly because of security. While I don't have any specific tips to offer (except perhaps one about rewrite rules that I'll blog about soon) you might be pleased to know that it's going quite well so far.
What you say is not true, googoobaby. There is a stripe translator, not enabled by default but only a CLI command away, that will stripe across multiple bricks (which can be on multiple servers).
Cost Benefit Analysis
First, I agree with AC#3 who pointed out that a Symantec marketing person might be a less than reliable source of information or insight on this issue.
Second, I think it's just as unreasonable to assume that encryption is too expensive as it is to assume that it's free. People should weigh both the costs *and* the benefits of using more vs. less secure storage, and measure those against realistic requirements. Most people and businesses should "default to secure" with respect not only to encryption but also to authentication, allowable locations and mandated retention/destruction of data, etc. because the cost/likelihood of compromise is just too high. If data has to traverse someone else's network or sit on someone else's storage, and performance goals etc. can be met with encryption, then encryption should probably be used even if the system would be "more efficient" without it.
Third, I'm hardly a disinterested party myself here. I'm the project leader for CloudFS (http://cloudfs.org/cloudfs-overview/) which addresses exactly these kinds of issues - not only at-rest and in-flight encryption which are both optional, but also other aspects of multi-tenant isolation and management for "unstructured" (file system) data. Of course, I'm not alone. The "senior partner" when it comes to storage security/privacy has to be Tahoe-LAFS (http://tahoe-lafs.org/trac/tahoe-lafs) which provides extremely strong guarantees in those areas at the cost of modest sacrifices in performance and functionality. Other entries in this area range from corporate-appliance players such as Nasuni and Cleversafe down to personal-software players such as SpiderOak and AeroFS. Enabling different tradeoffs between security, performance and usability is an active area of research and commercial competition, and we should all be wary of "this is the one answer" FUD.
Disclaimer: I'm an "associate" at Red Hat, but not speaking for Red Hat, yadda yadda.
EMC has nothing in the scale-out NAS space? Look, I worked on MPFS at EMC and developed a more-than-healthy loathing for the Celerra group. The Celerra might not scale out as much as the Isilon stuff, but it's architecturally not that dissimilar and it scales out plenty far for most folks. To say that EMC has *nothing* is simply inaccurate.
That said, I hope this rumor is not true. I had the privilege of working with Isilon gear and Isilon people some at my last job. I came away impressed, and it would be a serious shame if Isilon fell into the hands of the Celerra thugs. The likely outcome is that they'd pick over the technology for the few nuggets of IP that will solve their current self-inflicted problems, claim that the problems never existed and that they invented the IP themselves, then throw the rest along with all of the people in the trash. It would be an ignominious fate for such fine folks as Isilon has.
The claim of $100K for the entire system is not credible. A petabyte using the very cheapest commodity drives would cost approximately $85K, and that's just the storage. Unless Intel literally gives them 500 Atom processors for free, plus they get great deals on everything from that storage to memory and power supplies, plus they sell the thing for zero profit, $100K isn't achievable. My guess is that the reporter got things wrong, quoting a price for one system and a capability for another. Either that, or they're just snake-oil salesmen. Personally I think Smooth Stone - also linked from TFA - has a much more credible story.
Disclosure: I used to work for SiCortex, which was in a similar space. As far as I know there's no relationship (positive or negative) to either of these other companies, but I figured I'd mention it anyway.
Just what we needed
Thanks a lot, SwissDisk guys, for making sure that users will run away from a fundamentally sound and useful idea because of your lousy implementation/operations. You've screwed up not only your own business but other people's. Users, too, will now be so fearful that some of them will cobble together their own ad-hoc solutions instead and probably lose more data total than ever would have happened with cloud storage implemented and operated by competent people. In a just world, after all the damage you've caused, you'd be prohibited from ever offering services like this again.
The real nonsense is...
...the idea that because something is a cluster it doesn't have any intrinsic scalability issues. What bollocks. Lots of clusters have serious scaling issues because their communication protocols are poorly designed, leading either to a bottleneck at one "master" node for some critical facility or to N^2 (or worse) communication complexity among peers. It's not at all uncommon to find clusters that fall apart at a mere 32 nodes. Yes, it's also possible to design a cluster that scales better, but the difficulty is domain-specific. In a storage cluster, consistency/durability requirements are higher than in a compute cluster supporting applications already based on explicit message passing with application-level recovery, and the coupling associated with those requirements makes the problem harder. It's *possible* that XIV has solved these problems well enough to scale higher, but only an idiot would *assume* that they can or have.
As I already pointed out, it's a moot point this particular case because they don't need to, but in other situations the gulf between theoretical possibility and practical reality can loom quite large.
Pretty simple, really
Having worked with several parallel filesystems in the past, it never even occurred to me that there would be only one XIV. Shared-storage filesystems just aren't very common nowadays, and GPFS was never such. There will be many XIVs, connected to many servers, perhaps in slightly-interesting ways to facilitate failover and such but generally not much different than if the storage were entirely private to each server - the base model for most of the parallel filesystems in common use. Scaling XIV up or out was never necessary to support this announcement.
Now, for anonymous in #2: thanks for the IBM ad copy, but your claim of uniqueness for GPFS wrt knowing about multiple kinds of storage is simply not true. I'm no fan of Lustre generally, but it has long given users the ability to stripe files within a directory tree across a particular subset of OSTs. As of 1.8, they also added OST pools which give users even more control in this regard. PVFS2 and Gluster also offer some control in this area. Ceph is conceptually ahead of the whole pack (including GPFS), though it's still in development so maybe it doesn't count. In a slightly different but related space, EMC's Atmos offers even more policy-based control over placement. It's an area where GPFS does well, and it's a legitimate selling point - not that this is the place for "selling points" - but it's far from *unique*.