Microsoft added single instance storage (SIS) to Exchange 2005 and has now removed it from Exchange 2010, ensuring that duplicated applications will be stored in all their redundant, space-gobbling glory on Exchange server's disks. Why has Microsoft made this apparently retrograde step? A Microsoft Social Technet entry reads: …
Colour me unimpressed
People using micros~1 software for internet email deserve everything they get.
They've got a point
I can't see this ever running into terabytes of wasted space. Microsoft have got a point - the single instance ratios have been declining for years; most of the time I rarely see this above 1:12 and and sometimes it's worse than 1:3 which is next to useless. It's becoming fairly clear that SIS isn't as useful as it was; Exchange can handle up to something like 50 databases per server. The computing power required to manage SIS is probably much more resource instensive than just adding extra storage. Besides which, SIS is of very little use to inbound Internet mail which is nearly all unique to users. It's probably a change for the better, overall.
"A Microsoft Social Technet entry reads: <snipped>"
That has no sense at all, doesn't matter if you have multiple tables or just one, if you have multiple copies of a mail you can't consider them as one because they probably are different (if you have twice a mail with exactly the same headers then you've got a problem).
I sorta agree with the assesment that cpu cycle for compression/decompression are more important than the few kb saved in an attachment, even when the tendency for morons to send huge word documents just to say that they will be at the meeting is taken into account. And I shall also point out that the smallest hard disk you can buy now is over 500 Gb (!)
Huge corporations with tens of thousands of users shouldn't run an Exponge server on an el-cheapo dell pc with few megabytes of disk, even with the duplication factored the server should have enough space to store all that crap.
hard disk size
"And I shall also point out that the smallest hard disk you can buy now is over 500 Gb (!)"
It's ironic that you talk about not basing Exchange on desktop PCs in one breath then talk about low end / desktop storage as if its the only option out there the next. (And you'd still be wrong, I think).
500Gb is actually quite a large size when you're talking about high performance/reliability enterprise disk storage based on interfaces other than SATA.
I just took delivery of a whole crate of 300GB 2.5" 10K RPM SAS drives, so yes, you can get smaller. (We use them in our SAN... )
And the multiple email problem is connected to people sending the same mail to several recipients at the same time. (You are aware that you can write more than one address in the TO: Copy: and BCC: Fields, right? )
Maybe I've not kept sufficiently up to date on mail technology, but this doesn't sound good. This could make for substantial increases in the size of archive stores (e.g. Enterprise Vault), as they simply support the methods used by Exchange, such as SIS. Upgrading from 2007 to 2010 will result in a substantial increase in DB size, surely? Also, you'd need to configure more aggressive archiving to prevent an increase in the rate of consumption of storage. Naturally, this means more files will be subsequently retrieved from archive, resulting in higher utilisation of the archive. So your archive needs to be online-speed as well. No DVD archiving or anything. Do you think Netapp bosses are planning new Ferraris already?!
(Happy to be corrected by an Exchange specialist somewhere?)
What you are looking for here is a proper combination. Do not get more aggressive with your archiving since archives should not be looked at as a replacement of mail servers (more complimentary.) I think just about every archiving vendor out there does SIS independently, so look at it as a way to keep your more active data where it ought to be, and move the dormant stuff out to a lower TCO storage. Take advantage of the larger mailbox size to set a more realistic larger storage limit. This will have the effect of eliminating the end users' need to create PSTs, but you certainly should not look at this as a way to get all the PSTs back in. You could do that with your archive vendor, or just set an organizational policy to freeze them.
Realistically, you just want to be able to hold your Exchange servers to the size of their storage tank so to speak. Plan your DAGs with some realism, and get some help here if you are unsure. Depending on the mail profile of your organization, the DAS story might not be exactly what you are looking for. Balance this on the cost of the managing the storage, the risk of not noticing individual failures, and the current workload of staff. I commented lower on this topic.
Exchange Server 2005
There's an Exchange Server 2003 & an Exchange 2007, but no 2005. (There's an SQL 2005 though) - this Microsoft Blog post says SIS was added to Exchange 4.0 - http://msexchangeteam.com/archive/2008/02/08/448095.aspx
SIS has been de-emphasized in Exchange since 2007..
What's the news here? Also there is no such product as Exchange 2005. I presume you mean either 2003 or the aforementioned 2007.
Seeing JBOD is now an option for Exchange now with the release of 2010, and actually recommended in certain configurations, I wouldn't be worrying too much about SIS.
Single Instance Store
The value of this decreased when disk space became cheap. Most of the issues SIS helped with can be addressed by either sensible hardware specs or sensible Exchange polices.
As for enterprises, they probably get little value out of SIS because:
a) they're running multiple servers and databases per server and SIS is per database so multiple copies probably already exist.
b) users get moved between servers due to staff relocation/grooming which results in loss of single instance
c) I understand that SIS is the root cause of some issues with the Entourage client, so moving away from this means better "cross-platform" support for very small values of cross-platform..
Question of use case
Well surely, for a general purpose mail server SIS sounds like a bad idea. However Exchange is mostly used by office workers and that's a whole different use case. Essentially they will send a gigabyte file to 500 recipients and each one of them will send it to others people just because it contains pictures of cute puppies.
Single Instance Storage has been around in Exchange for considerably longer than Exchange 2005 would have existed were it a real product and not something you made up.
I must admit that my first impression when I read that they dropped it was that it was a bad idea, but on considering, Microsoft do have a lot of data about how Exchange is used and what is and what isn't efficient. I'll be following comments about storage space from early adopters with interest, but I can only presume they wouldn't drop a feature unless they had lots of data to support the decision.
There were Server, Site, and Organizational models for SIS in Exchange 5.5. Cannot remember 5.0. Obviously Organizational was barely (read: not) usable if there were a larger number of servers in a distributed topology. Server at least had the benefit of being able to have same server SIS. This disappeared with 2000, but the significant advantage of multiple databases per server was well worth it in terms of centralization. This was during a time when the recommended strategy was to add another Exchange Server to improve branch office performance. At that time I ran into customers with huge numbers of Exchange servers and star hub topology from hell.
Here we go again with the ebb and flow of central versus decentral.
Guess we will see how this all shakes out when the Marketing hits the market.
SIS there since the beginning
"Single Instance Storage has been around in Exchange for considerably longer than Exchange 2005 would have existed were it a real product and not something you made up."
Single-instance store existed in the beta versions of Exchange, such as the beta test release I used prior to its first commercial release. It was touted as one of the key differentiators to Lotus Notes. (The other key was how much less overall traffic it saved compared to Notes. Personally I was impressed by how much easier it was to administer.)
The value of SIS depends on the size of the groups you're spamming^H^H^H^H^H^H^H^H connecting with. Culturally people are becoming spam-adverse, even within companies.
And since Active Directory and Exchange user lists are now aligned, I can see how any streamlining of Exchange Server may have a likely flow-on effect with server AD. Might have been a good move overall.
The author seems to forget the diminishing effect of deduplication because Single Instance Storage (SIS) only works in the same mailbox database. With the (acceptance of) growing size of mailboxes you already have less mailboxes per database, which causes less SIS bonus. Since compression works on all databases regardless of content and - as tests indicate - with a zero performance penalty, things will improve.
Exchange Footprint is Too Big
The cost of hosting exchange is rising exponentially. SIS was at least a step in shrinking storage requirements for Exchange. Exchange storage requirements are becomming huge and the processing requirments are having to increase to cope.
The trouble is that the space saved with SIS is nothing compared to the huge emails created by Exchange/Outlook/Word. Have you ever seen the amount of wasted space in typical Exchange based emails? Couple that with the way users tend to just reply to emails - the size of the typical exchange/outlook email is becomming huge. If I check on a typical folder I can easily spot 20 Meg messages containing the one word response "Thanks".
Because the typical cheap SATA disk is now 500GB does not mean that you can just grow your storages indefinitely - the cost of storage is still high because now you have to make the SANs that much faster in order to be able to search the huge amounts of data - increasing the processing requirements and making the whole thing slower.
Storage is cheap.
Deal with it by buying more.
Backup is cheap.
Deal with it by buying more. Oh, oops.
Time for a mail server with a git repo as its backend. SIS for free, and highly efficient replication.
Put each user in their own branch, and they can clone just their own mailstore too.
it's a different architecture...
As far as I'm aware SIS was gone in 2007 apart from for attachments anyway. There's no such thing as exchange 2005 as other posters have commented.
SCC (Single copy clusters) have gone in 2010 replaced with DAG's (Database availability group) which is the evolution of 2007 CCR (Cluster continuous replication). This means realistically you are going to go for 3 copies of a database.
Now you could put those copies on expensive SAN, but the since the IOPS requirements of exchange 2010 is much less you can use SATA disks in DAS. HUGE cost savings here despite needing 3 copies of a database. Now for the SATA disks your looking at 1 or 2 TB of storage available per disk, each disk has a set amount of IOPS available and each user has an IOPS requirement based on their mail profile (sent / recieved / size). Mail just sitting in their mailbox doesn't really put additional load on the server.
So you end up with a user count per disk. let's say that's somewhere in the 100-200 user region, and you plan 1 DB per disk. The DB can then grow to 1TB or so, (not the 100GB reccomendation in 2003) and the extra space users are getting (25GB mailbox anyone?) is basically for free as there's no IOPS requirement for data just sitting there.
The 1TB example above assumes no RAID, but with 3 copies of a DB you effectively have RAID database managed at the exchange server level not the disk level. Much more flexible :)
The requirements for 2010, disk wise are therefore SIGNIFICANTLY less than for 2007 or 2003, to the extent it's got storage vendors shaking in their boots because they're not going to be selling as much high end SAN with big arrays and expensive, fast disks.
The "falling" price of storage meets the immovable object of managing that storage. As to the lower IOPS, how are you planning on combining on-site with off-site replication, and how are you going to deal with it when it runs interminably instead of ever completing successfully? (just discarding the log file at the end and starting another) Then toss a failure into the mix, so now the disks are getting hammered into longer and longer response times (thereby further increasing the risk of failure)...and they are JBOD which scattered data all over the drive in the first place. Then suddenly a physical failure at data center 1 requires a resynch from data center 2, and the log files are not complete due to aforementioned synchronization delay. It might be a Distributed Availability Group, but they still have to maintain database consistency.
While we are at it perhaps we should mention if/when errors start occuring on the disks. How about you wait until the edb's are corrupt since you did not know there was a problem? I am sure we will get a notification in the event logs or SNMP traps about the DAS...right? We knew to look for that right? No worries, we always can resynch from DC2 right? Ooops, larger mailboxes mean more data to send back over the wire.
None of the historical challenges are going away.
the higher GB/Disk of SATA means the physical footprint (Disk count) of a lot of installations will go down. There should be longer MTBF failures on 7.2k drives v 10k's or 15k's so you end up with better reliability at the physical lvel.
replication is all automatically handled within exchange at the HT's by replaying messages and a failure of a DB (Bringing it up on another node) should result in minimum impact to end users, even if it's in a different DC and the sync is delayed.
if we're talking different dc's the I believe the MS reccomendation is at leat 2 copies per DC so any re-seed required should be local and fast (generally). if it is goin to take a while who cares? theres another couple of working copies in anoter DC that users are onnected to.
any problems with the database and it gets flagged up in EMC so no worries there unless you never go in EMC in which case you probably shouldn't be hosting exchange inhoue. Again you are right with the more data over the wire on re-seed cross DC but net speeds have been growing along with disk sizes, even if you need to upgrade the link.
MS have / will have millions of mailboxes they're hosting themselves going onto E2010, (BPOS / Hotmail / Dedicated) and you can bet your life they've plugged numbers into various calculations on multi copy DAS vs less copies SAN, with the SATA solutions coming out a clear winner.
So when we upgrade our exchange server (admittedly that could take years) all our users who send copies of documents to each other instead of using shared folders are going to fill our storage in no time..
And as for the "storage is cheap" answer, enough storage for legacy hardware isn't!
Make it optional dummy
"Most servers support multiple databases, so the efficiency gained from SIS is less and less as time goes on."
The obvious solution is to make SIS optional. Not to remove it completely. Let admins choose whether or not they want SIS enabled on a database. Exchange server admins should know better than Microsoft whether or not SIS is appropriate for their environment.
Have to agree with MS
Large enterprise implementations limit users' ability to add attachments anyway, plus large installs will have multiple databases. Storage is cheap anyway.
SIS makes sense to me
the majority of email data will be internal, and if you have set things up correctly you will have set up groups of users within the same database on the same server (within geographical limitations) anyway.
Storage may be cheap(er) but storage management isn't, and even if it were why should I have to buy more disks, tapes and the like just because MS have decided to change the way they do things, without any logical reasoning for doing so.
MS like to look after their friends
IE the storage companies.
As for it's explanation of this behaviour surely *how* it keeps track of duplicate attachments is an issue internal to the Exchange system. It's not liek MS won't completely change a systems internal architecture if it suites them.
A question for you MS Exchange sysops out there.
The CEO of a Very Big Corporation (Lockheed Martin, NBC, BBC etc) sends a memo to *all* staff.
Its got the text.
His contact numbers, as an attachment.
The corporate logo, as an attachment.
So *how* much of this *nneds* to be duplicated, and howm uch duplication *can* you eliminate?
Now what about incoming emails with these sorts of blocks. If its the same block from the same source (with the same contents) why should it be stored *again*. Note I believe that this sort of processing should be selective as some sort of filter process. But what do I know?
How do you figure that MS look after their storage company mates? Yes, this might sell a few more disks but rather than previous versions of Exchange needing potentially fairly high performance SAN's, with advanced function software to handle replication, etc as email is now mission critical, the recommendation for Exchange 2010 is large, slow, dumb disks - there's no revenue for the salesman or profit for the company in them so hardly a good thing. Exchange 2010 has moved more of the storage function into the application in terms of creating multiple copies, and the IOPS requirement is down, so the default implementation for Exchange 2010 will be DAS with 1TB drives. Cheap and cheerful!
Why was SiS removed?
Personally, I don't care about needing 20% more GBs. Since these database changes allow me to have 50% more users on a single server. The author of this article obviously fails to understand why this decision was made.
In somewhat larger deployments, traditionally one of the first bottlenecks is disk I/Os. The solution is to add more servers and move mailboxes to the new servers. Exchyange 2007 had changes to the database structure which resulted in 70% less IOPs than Exchange 2003. Now Exchange 2010 reduces this again with up to 50%, which allows you to have a lot more mailboxes per server. At the cost of losing SiS.
So what about SMB customers? Well, typically the amount of maildata is not a major concern for these companies. So when they choose to buy Exchange 2010, does it matter if they have 300 or 400 GB of data? Don't think so.
Larger companies? They understand that they need less spindles or can even switch to cheaper disks and still have great performance for their users.
Its Microsoft Remember?
Microsoft has always done a "bait and switch" with email.
First they introduced MS Mail using other vendors file services so they could get in the market and lied to their customers about how many "post offices" could run on a single server. Then they told everyone to go to Exchange to fix everything. Any installation above a trivial size resorted to storing mail on the end users workstations and/or implemented retention policies because they could not handle the volume of stored mail on "legacy" email services.
There will undoubtedly be an update later to do some kind of de-duplication or a recommendation to use some de-duplication storage specific to Exchange(I believe there already is actually.)
Frying pans and fires? no just fried braincells.
So to cure a speed problem that can be lived with, they re-introduce a far worse administrative space problem!
When are they going to start curing the problems rather than just curing the symptoms?
Microsoft did the right thing
Let the storage hardware vendors take care of SIS and dedupe. BTW DrDedupe reported on this in Sept-
Storm in a teacup
Exchange 2010 has been designed to maximise I/O performance by using sequential I/O rather than randmon I/O where ever possible. To enable that they have had to do away with SIS.
However, a large number of storage vendors are now adding dedupe to their products. Block level dedupe is likely to be much more efficient than SIS. If you really want the storage efficiency of SIS you're probably much better off doing at the hardware layer.
- +Comment Trips to Mars may be OFF: The SUN has changed in a way we've NEVER SEEN
- Vid Google opens Inbox – email for people too stupid to use email
- Back to the ... drawing board: 'Hoverboard' will disappoint Marty McFly wannabes
- Pic Forget the $2499 5K iMac – today we reveal Apple's most expensive computer to date
- Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...