EMC's Pat Gelsinger is proposing unified storage/server systems that span the globe and function as a single virtual resource pool, using YottaYotta technology. The details, such as they are, are in an EMC webcast presented at an analyst briefing event last week. It asserts that EMC will be able to overcome limitations of …
The dream never dies.
It never ceases to amaze me the twists and turns these corporate types will go to get their hands on your data. This isn't a scheme to distribute date globally, this is attempted blackmail. I run my business on my own systems, I own the data just as I do the copyrights etc. There is no way that I will allow my business's future to be held hostage by people such as Amazon, Microsoft or now EMC. PCs are called PCs for a reason, "personal computers".
Anyway how can you trust anyone who takes liberties with the English language such as "and we architect our solutions around these traditional obstacles." What I think he is trying to say, is that they will design the architecture of their product to avoid these problems. Just a guess though!
If you had read the article, you would have noticed that this is designed for public or private "cloud" computing. EMC are not making a data grab attempt, it's a product for other companies to buy. This is the sort of system that *very* large corporates would like, not pissant cloud service suppliers who don't really give a kack about downtime, latency or file sharing.
Information is Power .....hence Third Party Control Grabs.
"No one would have to wait too long for the first sight of the data or to access its full content. The caching system controls would ensure that the overall system never lost track of which version of which piece of block data was correct." .... Chris, some would recognise that blocked data was adding to information latency problems.
And InterNetworking Thought Propogation works at speeds far in excess of Visible Light because if you Imagine Intelligence posted on a Page and millions/billions of viewers accessing that page, does the Information travel fantastic total cumulative distance in seconds ... and certainly a lot further than light which will have traveled only 186,000 miles in a second.
The Power and Speed of Good Ideas Shared is QuITe Biblical in Relative IT Terms ...... and knocks Einstein's Equation for six and right out of the Field.
"Anyway how can you trust anyone who takes liberties with the English language such as "and we architect our solutions around these traditional obstacles."" ..... nematoad Posted Monday 15th March 2010 12:31 GMT
It could well be the case, nematoad, although it would likely be plausibly vehemently denied by conspiring parties, that the nasty electronic truth about perverted purveyors of subverting information, which delivers/delivered unfair and exclusive naked short selling advantage is "we architect our solutions for these traditional obstacles."
And I wholeheartedly concur with ..... "It never ceases to amaze me the twists and turns these corporate types will go to get their hands on your data. This isn't a scheme to distribute date globally, this is attempted blackmail. I run my business on my own systems, I own the data just as I do the copyrights etc. There is no way that I will allow my business's future to be held hostage by people such as Amazon, Microsoft or now EMC. PCs are called PCs for a reason, "personal computers"."
Large block storage only
I worked for YottaYotta for several years, but retired before the EMC buyout, so I'm not fully familiar with how the various aspects have been integrated.
The article correctly states that this is all being done with block storage - the YY back-end that does the long distance coherency is only at the raw data block level - it is not aware of things like files and directories. I don't know the details of how things are being configured, but the YY code can do distributed RAID-1's, so access to data does not need to all go to a single site - multiple copies of blocks can exist at multiple sites.
I also happen to agree with "nematode" - I do not want large corporations getting their grubby hands on my personal data. However, I have no fear that this EMC/YY project is intended to do that. The YY stuff includes expensive hardware to do its work - it does not run on PC's. So, rest easy on that aspect. This stuff will only see your personal data if you give that data to some corporation that happens to use this stuff on their storage network.
Multiple masters = worse!
".....but the YY code can do distributed RAID-1's, so access to data does not need to all go to a single site - multiple copies of blocks can exist at multiple sites....." So, then that's multiple master copies that all need to be kept in synch. Having multiple copies does not get round the problem that signals can only travel at the speed of light, in fact it makes it worse as you now have lots more signals to keep all the masters up-to-date. And if all it does is RAID1 copies, how is it better than any of the current crop of block copying software such as EMC's old SRDF? Whilst I can see this working in a city-wide or continental cluster, any greater distance is going to mean massive lag on commits. Imagine sitting in front of your Word doc, pressing "save" and then having to wait ten minutes for the commit to complete, and then having a message coming up saying "saved failed" because another user on the other side of the World has beaten you to the save by a few milliseconds. Strange, I never thought of Gelsinger as the type to start thinking the laws of physics just don't apply to him, but maybe I was wrong.
Not multiple masters
This is a very strange place to be arguing about this kind of technical thing, but the point has been raised.
You are assuming multiple masters. That is incorrect. There are multiple copies, and, depending on the configuration, multiple sites can write, and that can indeed introduce latency to sort it out. But, it is typically not much more than the round-trip time between the two writing sites. I don't think even worst-case was ever more than a second or so, and typically much, much less.
For a while, YY did consider SRDF as a competitor, but it later ended up as more of a partnership thing - each has their advantages, and situations where they are best. Since I left before the joining, I don't really know more on that.
If you want to get interesting, think about allowing multiple writers when the sites are out of communication. Then bring the sites back together. Fun stuff happens, and that's when there can be considerable time to resync things. Depending on how you have configured things, even that doesn't require all reads or writes to be blocked, however (e.g. if you have manually or automatically chosen a master for that situation, it and sites that did not lose communication with it can go ahead as normal, if memory serves me correctly.)
"....You are assuming multiple masters. That is incorrect. There are multiple copies...." Multiple masters or multiple copies, it matters not - both cases require lots of communication to keep every copy up-to-date with changes otherwise you could have two copies with different data both being offered to users that could affect a decision or calculation. For example, suppose we have an online ordering system which allows companies to order off one global account from multiple countries (think something like DHL courier services). If Customer A's global account has an order limit of $1m, and they hit that limit in their Brazil branch, you need to get all the local copies of Customer A's accounts updated fast otherwise an order could also be processed in the UK which could take them over their limit. Or, worse, consider an ordering system where you are offering a finite resource - not having all the copies updated could lead to an over-commitment. In either case, as each transaction came in, you would want to lock the account so other orders couldn't be made until a commit had completed on the first order, which means not just update traffic (which would mean latency) but also lock messages to all copies (more latency). Multiple masters or multiple copies, the speed of light rule still applies.
".....If you want to get interesting, think about allowing multiple writers when the sites are out of communication. Then bring the sites back together....." Sounds like what I would refer to as a "split brain" problem, where you don't want a cluster which normally shares a single database instance splitting and both sides thinking they have the active copy of a database - both sides could commit conflicting changes and would not be able to be synched back together when the cluster recovers. Which does bring us to the point of would you use such a distributed system for a high-speed, low response time database system such as a billing platform? Probably not. But if you had a simpler task where a higher response time is not an issue, like a records store for a CRM system, then you probably could afford to use file locking to make sure the central record is only being updated by one user at a time and most users wouldn't notice the lag on a query. I suppose it all boils down to what applications EMC see this being used for.
As for hardware, I can see EMC using CISCO's UCS seeing as they have already said nice things about it, then they could offer a pre-built appliance. They could offer it as software to sit on anyone's blades or racked boxes if customers are resistant to the idea of CISCO hardware but I can't see that being an issue (the EMC badge buys a lot of trust). No, the bigger problem is that I don't see this as particularly unique, especially if it is SVC-based, as both IBM and hp could cobble together competing devices pretty fast, possibly even Soreacle. If there proves to be a big enough market to make it attractive, that is.