For compliance, provisioning, risk assessment and auditing of users access and storage have you considered an IDM solution?
Compliance. Was there ever a word to strike such terror into the heart of the average techie? (OK, “Audit”. But don’t blame us, we didn’t want to say it…) Juggling the often conflicting requirements of your budget and compliance is enough to give anyone a headache. So help us out with a question, if you would be so good. Email …
For compliance, provisioning, risk assessment and auditing of users access and storage have you considered an IDM solution?
Wikipedia gives several suggestions, what are you talking about:
Compliance Storage?- Centera by EMC.
Storage space issues?- De-dup - any of the top storage vendors.
Policy/culture issues? - deploy vitality training highlighting the cost savings of weeding old data.
There must be huge stockpiles of unused floppy disks somewhere that nobody can sell. That could help with costs.
Backup and restore wouldn't be much slower than BackupExec, I feel.
Seriously though, better, more efficient mailbox storage is the way forward. Of our ~1,500 staff, quotas are always hit and it is down to emails being sent out willy-nilly where the recipient isn't usually relevant but is on a big distribution list. A more efficient database would cover this. 1 mail sent out to thousands of people in the same domain, shouldn't mean thousands of copies of that mail. It should be one copy of the original email with an index referring to the recipients and storing message state, ie read, unread, deleted and so on. Once an email is forwarded, yes it becomes a new email and we go around again.
I thought MS Exchange worked this way, but I have some doubts, at least as to the precise implementation.
Just a suggestion.
Exchange did work that way (single-instanced storage), with a couple of caveats that I won't bother going into here.
Microsoft moved away from it, mostly due to performance issues. Single instancing means you have to "garbage collect" your mail stores, to remove the mails that have now been deleted by all owners. It also means creating new copies transparently (from the user's POV) when a user edits the email somehow, which they often manage to do accidentally.
Basically, it costs in performance terms and makes the DB messy, and fast large disks are now cheap. So Microsoft decided to ditch shared storage and go all out for speed. I think that was with Exchange 2007, so any recent Exchange implementation/upgrade will lack SIS.
I know CCmail used to do this, and do it well. Atachments were singled out too, held once even if appearing to be duplicated in replies and forwards, or dragged into completely new threads.
I've see no evidence that exchange does any of this.
The term "compliance" is bandied about in data lifecycle management briefings and product notes like confetti at a wedding. However... The first question organisations (or individuals) need to ask is a completely non-technical one:
With *what* are we expected to be compliant?
Industry "Best practice"?
Also there's a trade-off, particularly for the public sector - the need to ensure that Data Protection laws are followed (that is, only keeping what is necessary particularly pertaining to individuals) but at the same time ensuring that this does not make Freedom of Information requests impossible to fulfil. You might find that statutory law means some piece of data has to kept for 7 years, but if it isn't going to be used (and pertains to an individual) it should not be retained. What a dilemma.
And when that FOI request arrives, all this data has to be produced, at which point the individual is likely to ask *why* you were keeping it.
So, before thinking about the technicalities, think about the reality - what do we need to keep, why, and for how long?
........then the storage ain't your problem anymore.
Of course you'll find yourself at the mercy of the Roomba then.....
Getting users to understand what the retention policy is, is a good start, despite their annual attestation that they know the difference between a relevant, and non-relevant record!
At where I work there is a default of "Keep Everything, its the only way to be sure", and sure enough it's led to capacity issues, stability issues (PST Files total around 33TB, without what else is stored in eVault mail archive solution).
Training and understanding, and enforcement of the retention policy is key, and for that you need proper tooling to understand the age of your data, as well as the content. then you can work out whether it needs keeping or not.
Work out the maximum fine you could be hit with for non-compliance, then see how much it would cost to comply. If the cost of sticking two fingers up to the regulators is lower than the cost of complying, then choose the former.
Users don't understand storage costs when wrapped in a service, because world+dog knows 1TB costs £60 from PC world. Not thinking about the additional costs that make up the service, like high availability, DR, the backup hardware costs, manpower costs and paying Iron Mountain to look after your tapes.
Define an initial retention policy:
What are the legal requirements?
What are the corporate requirements?
Now look at what you already have and analyze its usage:
Online in mailboxes
Near-line - perhaps tier n or eg PST archives scattered around
Off-line on punched cards or CDs perhaps.
Analyze its distribution by age and size/number of items
Look at what resources you have available:
Disc, tape, optical etc etc
Consider DR/BC and restore from backup times from each tier of the store
Can your resources accommodate your policy?
If yes - great
If not - change your policy, increase resources, transmute the data: can you make use of dedup or compression or change how you store data - ie move more of it offline.
If what you have will not do what you need and you are not allowed to change what you need then there is a conflict that no amount of hand waving will make go away.
Its a simple engineering problem, not rocket science.
Golden rule is to know what you need. It does sound a little 'bleedin obvious', but it is far better to play this game from a position of knowing the rules than from ignorance as getting it wrong can wipe out any savings you may have made and then some.
The higher end compliance needs (SOX etc) are rare outside of financial companies and a clear policy saying what you will/will not keep covers most lower end compliance requirements.
Once you have defined what you need to keep then think how likely you are to need to access it. Unless you need daily access then just stick it on tape using a setup within your control, passing this to a 3rd party to look after can raise issues of audit trails etc and almost certainly will raise prices. Yes it may be a real pain to pull it down onto a PC for those nice people in Legal to have a look at it but it is a DAMN sight cheaper than even the most basic 24/7 disc system and allows you to have one tape(set) per day. It also reduces the chances of losing data when some numpty pulls the wrong disc to fix a RAID.
If you are pushing files to tape on a daily basis and have emails set to auto-flush after a set period of inactivity (another no-brainer which somehow seems to get missed) then you may have a mountain of tape but you will have a slimline email system and users will be impressed with the speed it all runs at.
OK I made that last bit up, users are NEVER impressed with IT.
This is a cultural issue, not a technical one.
Basically, if you're doing things properly, you shouldn't need the emails.
Most of the documents people are looking for should be stored somewhere else other than email. A document management system, a network share - whatever works for that team/group/organisation.
But email is attached to a person. Just because Bob closed the Acme sale, should you have to keep Bob's email forever? Even after he's left? No, the documentation for that sale - the terms, the contract, etc. - should be somewhere that ISN'T BOB'S MAILBOX.
However, people are lazy feckless ****holes who just don't get this, so we end up having to rummage through their crazy personal filing system to find a document that should have been stored somewhere properly.
The best, easiest way to reduce your email storage costs - for both compliance purposes and otherwise - is to follow three simple steps:
1. Make it easy to get stuff out of email and to somewhere secure, shared and useful.
2. Have low quotas.
3. Single-instance on commit to the compliance archive, to reduce storage costs.
If, for legal purposes, you have to keep every message for n years, your storage costs are always going to be high - because you'll be grabbing everything at the router, rather than relying upon backups from the user mailbox.
From-the-router style compliance archiving, even with single instancing, is ruinously expensive - and you should probably look at systems like Centera for the belt and braces you'll need.
For anything less, the solution is simple - get the stuff out of email. Make it gross misconduct (a sackable offence) for employees not to be doing that.
Your problem will then be document management / information management, but that's a nicer problem to have than "there's an email we need, might be one of these eight people, we think it was sent in 2008, why would you need an exact date?".
Make the cultural change, it'll save you loads of money.
Have all of your staff use the free mail services, then you can delete it all...
Compliance, What compliance, it was never in our possession.
... is to observe that most email doesn't need to be big, and most doesn't need to be sent at all. For the following, I'm assuming that spam doesn't need to be saved at all, so the following deals with stuff you'd like to save. Not saving at all is very effective and not sending in the first place gets there most expediently, so reducing the volume starts at the very beginning.
On a technical level, each email comes with a (supposedly and usually) unique identifier that other emails can refer to, and proper email clients can use to sort out what refers to what, then arrange the email display to make this clear. That means you no longer have to put all the emails that went before under the few lines you're sending back as your contribution. I mean, you're *replying*, meaning that they have the original filed, or not, but that's not your problem.
On a usage level, stick to plain/text instead of resending the same stuff yet again as html. And refrain from using email as a general purpose file transfer service. That really buggers up the storage.
On a procedure level, be careful what automated stuff you send around, what it looks like and whether the recipient(s) really need all that (deleting takes time so might not happen and then there's another little bit more to store until eternity), and of course how you structure your internal mailing lists and such. Some companies get so much "to everyone" emails supposedly written by humans that they have no need for automatic report generators to get flooded with non-spam crud.
On a policy level, teach the users proper email quoting and maybe a bit of "think before writing". A little thought early on could easily save many, many reiterations of increasingly frustrated exchanges of emails that, read between the lines, really only ask "but what did you mean, really?"
Put another way, write *for* the readership, not *at* them. It's been said that most "communication" inside companies is about how bad we communicate. That's low-hanging fruit right there.
On a taking it further level, a course in writing (and English or whatever the lingua franca of the company is) could improve readability of emails even more, further reducing the volume and leaving more time for each individual email, reducing the need to back-and-forth and with more time left to do actual work, possibly even upping the productivity.
Personally I never had a storage problem for my own email even before I got out of a job because the job was "being the BOFH". Like I said, not getting sent crap means not having to store it. The one guy hogging most of the storage was the guy insisting on using outhouse express (last user of any mailclient from that vendor in the company, too) for his 30k-odd and >>>>2GB mailbox (single folders, anything else was too hard) which micros~1 explicitly had documented as "don't do that", yet he would hear not of reason.
Instead he'd twice tried to force me to drop everything there and then, rip out the (functioning just fine, thank you) FOSS IMAP server and replace it with sexchange, after which the CEO finally told me he'd tell him (the CFO, naturally) to stuff it and don't interfere with IT any longer. This extreme pointed me to something I, upon further observation and reflection, think is all too true: The more you are a nitwit, the worse you use the technology. Being an utter luser with your webbrowser or somesuch usually doesn't bother the rest of the company, but email is a problem because you share the goodness with other people.
And yes, that probably means that if you have an email storage problem the company is likely full of nitwits. Colour me hopelessly optimistic and believer in the fundamental goodness of mankind that I suggest educating the lusers as part of the strategy to tone the flood of crap down a bit.
 You never had to, and it was more commonplace to not do that when email was so new people still received training in how to use it, or were techies and could figure this stuff out without being told and in fact expected such features as natural arising from how email works.
 Yes, awful business-speak. To illustrate we could do with less of that, too.
 What it says about the schooling system that its graduates don't do well there to the point that professors suggest no longer marking down university work for poor language I'll leave for some other discussion.
 Or pulling awful tricks like thunderbird+enigmail does when forwarding an alternative(plain,html) email and then also encrypting: Destroying the html part, encrypting the text part, then sending that "ascii-armoured" thing twice, once as quoted/printable plain text, and once surrounded by html tags even with a cutesy little bit of css in it to decorate the thing. The ticket documenting this behaviour was marked as "worksforme" and "invalid". The redmondian scourge isn't the only clueless thing posing as an email client around, though it does have the largest following of clueless devotees.
Stop using HTML for email. That will solve quite a few storage/retrieval issues right there.
If you want proper auditing, then the documentation should come through a verified channel requiring password/certificate and date-time stamps.
$MEGACORP enforces a 5Gb limit for network profiles, including shadows of 'my documents' and of outlook email.
It then tells us to use rtf format emails. Most company-wide emails are html format.
Then there is a retention policy that, as far as I can see, means we have to keep everything except birthday-cake-at-the-watercooler announcements. For >6 years. The retention policy includes rules on email naming conventions that I have /never/ seen the management respect, and on trimming quotation of older emails that I have /never/ seen the management respect either.
My .pst files are around 10Gb each, and I have been given a 500G LaCie to shadow them on. In five years no-one has once asked if I am doing that.
According to the policy failure to retain emails is my fault.
I've worked for idiot corporations like this. All piss & wind, and blame the lowest paid if it all goes wahoonie-shaped.
Leave them. Get a job in a small company.
There were no automatic, essentially no-maintenance mail archives available when I got serious about mail storage, so I wrote my own. Its portable, being written entirely in Java and using PostgreSQL for storage. It has equally portable enquiry and database weeding tools. The benefits of this approach are:
Automatic: an 'always_bcc' Postfix directive sends copies of all incoming and outgoing mail to the archive. Postfix needs no maintenance, since all its tuning and index maintenance is automatic.
Fast: I can find almost any e-mail in under 30 seconds in a 120,000 message archive. The search tool can search on address, subject, body text, date range or any combination of these.
Convenient: regular mail folders can be kept small since its generally faster to retrieve a message from the archive than it is to hunt for it in my mail reader's folders
Automatic whitelisting: I also run Spamassassin. A plugin for it whitelists anybody who I've sent mail to and is in the archive.
If you keep mail for more than a month or two, this is the way to go.
what do the cubicle slaves do?
We use Symantec's Enterprise Vault solution.
Allows you to "vault" emails to the EV server, so they only take up a miniscule 1KB of space in your Inbox, but can be retrieved in full (inc. attachments) on the fly when required.
It also performs de-dupe and we've found up to 70% in HDD space-savings thanks to this as well.
Send and accept only plain text RFC5322 compliant messages.
Store attachments and delete them from the message (base 64 encoded files take up a third more space).
Don't top post.
Quote only the text required to put your reply into context.
Don't quote the whole of the preceding dialogue including signatures, disclaimers etc.
Ban wordy disclaimers, privacy statements, etc. Max. 4 line sig files.
Plain text only (as I previously posted).
Don't send Word attachments when plain text or PDF would do.
...Susan in Accounts really needs all her email in pink Comic Sans with a 100kb animated gif in the sig. She insists life as we know it would end.
more oddly, BIGCORP usually insists on branding everything, and using 32 line signatures with logos
There will always be a need to archive message for compliance purposes, FSA, SOX, SEC all need email stored in a easily accessible and searchable archive.
With email growth continuing to grow, moving the storage to the cloud/sas makes a lot of sense.
We recently moved to Mimecast and along with their additional functionality for business continuity, makes it worthwhile
before it can be used against you.
wgetis broken and should DIE, dev tells Microsoft