back to article NHS reply-all meltdown swamped system with half a billion emails

The NHS reply-all email fail last year involved 500 million emails being sent across the health service's network in just 75 minutes. A test message sent on 14 November to what an unfortunate "senior associate ICT delivery facilitator" thought was a local distribution list she had created instead went to all 850,000 people …

Page:

  1. Anonymous Coward
    Anonymous Coward

    from what ive seen its just outlook web access . these "Accenture" people must've built an exchange server and charged , ooh who knows ... 50 million?

    ..and then failed to config it to "limit the impact one email " etc

    1. Test Man

      No, NHSMail2 is Windows Server and Exchange Server (2013, possibly), with Outlook for client access on Windows 7 PCs (people off-site can use Outlook up to 2016 to access mail securely). Exchange Server comes with Outlook Web Access for web access, if configured to do so.

      1. paulf
        Coat

        Perhaps it was intended? FTA: "The report revealed that half a billion emails crossed the NHS network between 0829 and 0945 that day, against the usual traffic volume of three to five million emails per day. It also claimed that the service "did not crash at any point", though it confessed to "significant service delays for the majority of the day"."

        Underpaid tech: <Coughs> Sir, the stress test on the new email system is complete.

        PHB: Did it pass?

        UT: It took more than 100x normal traffic and did not crash at any point.

        PHB: Great. Sign it off so we can send the invoice to HMG and get paid.

    2. TheVogon

      "these "Accenture" people must've built an exchange server "

      Presumably a very large one for 850,000 users...

  2. hatti
    Facepalm

    Email snafu

    Back in the days of dial up, I once had an issue where a client sent me a 72Mb attachment, I phoned the isp and they managed to kill off the offending message after some discussion.

    After this I then phoned the client to advise that I had not received her message, whilst on the phone, just before the conversation got to the bit about file size and dial up, she hit re-send and I had to repeat step 1 with the isp. FEK!

    1. heyrick Silver badge

      Re: Email snafu

      ? Couldn't you just have telnetted into the POP3 server and done a DELE on the offending message? That's what I used to do, back in the dialup days, for much the ane reason.

      1. hatti

        Re: Email snafu

        This was outside the remit of my knowledge back then, but thumbs up for the tip. All I need now is a flux capcitor and a car that tops out over 80mph so I can pass this on to the younger me. Cheers

        1. The IT Ghost

          Re: Email snafu

          Years ago, had a contractor (soon to be ex-contractor) who fell for the "we'll give you one share of stock for every two people you send this email to", and sent the email to everybody in the company, individually, not to any "all" group...he selected each person by name, apparently to ensure he got "credit" for each one. My sites (I ran systems for six divisions) happened to be still running Microsoft Mail. The servers were all fine, but the email crashed the client software. This was the days before Windows 95, when you still had to contend with the limited "conventional memory", and MS Mail client had a smaller memory footprint than Outlook. As LAN admin, I had full Outlook, so I had the "pleasure" of cracking every mailbox one by one and removing the offending email so the users' client software would work again. I printed one copy of the email...the "to" line filled five full pages of printout, WITHOUT the headers.

    2. Mr_Pitiful
      Mushroom

      Re: Email snafu

      I'm impressed, 72Mb over dialup! must have taken a week to upload

      I'd have just taken off and nuked it from orbit!

    3. Anonymous Coward
      Anonymous Coward

      Re: Email snafu

      Back in the 90s I worked for a firm where the windows/email admin had made himself SERIOUSLY unpopular through a complete lack of social niceties and utter inflexibility plus dobbing people in to management at what he perceived as the slightest infringement of company email policy.

      Anyway, his chickens came home to roost when an email macro virus showed up. It slowly started making its way around the company followed by a worried sounding message from Mr Admin saying please do not forward these emails. And in a stroke he signed his own demise - for that day and half the night anyway. Naturally we forwarded it around the entire company as many times as we could. By the time we got bored the email server was on its knees and so was he. Childish? Probably (we were all in our early to mid 20s), but this guy really had it coming and no sympathy came his way from anyone , including management who considered his warning email a red rag to a bull considering how he was disliked.

    4. Scorchio!!

      Re: Email snafu

      Speaking of email size.. ..in 1997 I received my first big spam. It was from a fool in Earthlink who put recipients on the CC: line rather than the BCC: line. It was fecking huge. I had a 56k modem by that time, but it took a long time to DL. More than an hour. I later found a POP3 scanner after that and took to deleting mail first. These were the days when Demon insisted users put their REAL email address in Usenet From: lines. Unbefeckinlievable.

    5. Anonymous Coward
      Anonymous Coward

      Re: Email snafu

      Mwah.

      You haven't lived until you have screwed up an M4 macro and set up a sendmail.cf which creates a mail loop without counter.

      Hell hath no fury but two misconfigured MTAs firing messages at each other - as a newbie email admin I was glad I could just rip the ethernet cord out of the back. At the 10Mb speeds we had in those days, Linux would run out of disk space FAR quicker than it would run out of resources, and with a mail loop that doesn't take long..

      Anyway, those were the lessons we learned before we were allowed to go near production :).

  3. Alister

    "strict controls must be in place to limit the volume of any one email sent by an individual user or local administrator".

    One of these requirements that looks so easy on paper.

    "A requirement to put in place compensating controls to limit the impact of fucking idiots who respond with Reply-All to a test message" would be good too.

    Where's the tick box in Exchange for that then?

    1. wolfetone Silver badge

      ""A requirement to put in place compensating controls to limit the impact of fucking idiots who respond with Reply-All to a test message" "

      Those "fucking idiots" are too busy saving your life pal to worry about clicking the wrong button on an email.

      1. Ol' Grumpy
        Coat

        One imagines they had to resuscitate the "senior associate ICT delivery facilitator" when she found out what she'd done! :)

      2. HmmmYes

        The person was a an ICT bod not a DR.

      3. Alister

        Those "fucking idiots" are too busy saving your life pal to worry about clicking the wrong button on an email.

        Gimme a break, they were probably all admin staff.

        Oh and FYI, I used to be one of those that does the life saving at the sharp end, and I still know not to click reply-all to a group email.

        1. wolfetone Silver badge

          "Gimme a break, they were probably all admin staff."

          The fiancee's mother is a nurse who spends her days visiting those who are housebound or bone idle lazy. She got the email.

      4. Anonymous Coward
        Anonymous Coward

        Saving your life out of the goodness of their hearts.

        Just as the people who make the metal frames for the hospital beds are doing out of the goodness of their hearts, or the people mining coal in China which is used to supply the electricity to hospital the hospital.

        Grow the fuck up.

      5. The IT Ghost

        So every single person who has an NHS email is a life-saving medical professional? There are no clerks, payroll people, people who handle buying more tongue depressors, nobody who...well, you get the idea. For every doctor and nurse there are probably 5 support staff who barely know one end of a stethoscope from the other, and certainly don't spend all day "saving your life".

    2. Anonymous Coward
      Anonymous Coward

      That particular tickbox...

      Is next to the one labelled "Tick Here to not be ripped off by (insert out-sourcing company name here)"

  4. Anonymous Coward
    Anonymous Coward

    I seem to recall when we moved from CCMail to MS Exchange many moons ago, something similar happening to our 2,000 users. We deliberately changed the names of the big mail groups so they're slightly trickier to accidentally select and by default there is a default a catchall email group for for anyone clicking on the "To:" button and hitting Send without thinking!

    1. Trixr

      Are you serious?

      In Exchange, you can lock down a list to only allow specified senders. It takes about 5 sec, via the GUI or Powershell. We generally control these via another group -e.g. $list-senders are the only ones allowed to send to specified list. Even if one of the permitted senders screws up, the Reply-Alls don't go far.

      If you have any smarts, and you're actually allowing end-users to set up email lists, you'd run some kind of script on a schedule to check for email-enabled groups with (recursive!) members > $number and verify that all of those have sender restrictions on them.

      For the NHS, the fact the storm went on for that long is appalling - it should have taken approx. 2 mins to lock the list (assuming someone had to logon to a box to set the restriction). Give it 15 mins for someone to verbally raise the alarm... (although, again, if end-users can set up the lists, you'd expect some pretty gnarly monitoring to be in place to actually raise an alert itself, even just seeing if the queues are filling up.)

  5. chivo243 Silver badge

    this beats a pssh

    https://forums.theregister.co.uk/forum/1/2017/01/27/netapp_creates_reply_allpocalypse/

    pssh

    300 messages is nothing.. Wait until you get 30,000 from a reply all - out of office - mail bomb...

    By a long margin!

    1. Prst. V.Jeltz Silver badge

      Re: this beats a pssh

      "Wait until you get 30,000 from a reply all - out of office - mail bomb..."

      Didnt you post that elsewhere on Reg a couple days ago?

  6. batfastad

    All England rule

    "A software configuration error meant that the system applied an 'All England' rule"

    Tories/UKIP are probably gunning for just that!

    1. Aladdin Sane

      Re: All England rule

      'All England' rule - no fancy clothing on court, just tennis whites?

  7. Dan 55 Silver badge
    Devil

    "This functionality is still to be delivered by Accenture,"

    Yep, sounds like a typical agile fuck-up.

  8. Anonymous Coward
    Anonymous Coward

    Not Accenture

    The problem was caused by people hitting "Reply All" and not by Accenture.The system could cope with many emails sent to everyone. It was those who felt they needed to tell 850K people that they didn't think they should have received that email who caused the problem. Accenture's share of the blame may be the same as the manager - 850K/500Million or 0.17%

    1. Freddie

      Re: Not Accenture

      If it was part of the brief to prevent mails being sent to large numbers of people, and they failed to deliver the brief, then that would suggest a cock-up on their part, no? I know reply-all is an enraging habbit, but usually it's a rare mistake - just not rare enough in 500 million rolls of the dice.

    2. Strahd Ivarius Silver badge

      Re: Not Accenture

      It was Accidenture that misconfigured "my organisation" to mean "the whole world", no?

    3. gw0udm

      Re: Not Accenture

      The issue here was that it was not obvious what had happened and definitely not obvious that it had gone to the whole NHS. All you saw when you received it was a handful of addresses, maybe 7 or 8. So if you did Reply All to that group you would not have expected the maelstrom that followed!

      It was only after a couple of hours that the true extent of what had happened became apparent, by which time it was much too late.

    4. Paul Crawford Silver badge
      Facepalm

      Re: Not Accenture

      The problem was caused by people hitting "Reply All"

      And would that happen to be the default choice by any chance?

      As an aside, I have seen email groups where reply address is set to be the list, so even if you hit "replay" and not "replay list"/"reply all" you still end up spamming everyone and you have to manually copy/paste the sender's email address if you simply want to reply to them.

    5. Fonant

      Re: Not Accenture

      The problem was caused by people hitting "Reply All"

      Nope, the problem was that people replied to the single email address of the Dynamic Distribution List, which was supposed to be configured to only include a few people but in fact included everyone. Most certainly a problem caused by the system, not the users.

      1. Mike Cresswell

        Re: Not Accenture

        Umm, the article clearly says that the system admin sent the message *to* the problem distribution list which means the sender would have been the system admin and not the problem distribution list. In every implementation I've seen, a "reply" only goes to the sender of the message and it takes a "reply to all" to go to the entire distribution list.

        So it seems to me that the issue *was* people doing "reply to all", a problem which is as old as email.

        To use an old joke, "We try to make our systems idiot proof but they just keep making better idiots"

        1. Anonymous Coward
          Anonymous Coward

          You have not seen

          all the implementations I have seen.

          (That would be reply to list, which would require the client to know it was a list, that would be needed. I'm fortunate not to have to know whether that is implemented in Exchange/Outlook/Web, but it seems to me it would require cleverness to build that in.)

    6. allthecoolshortnamesweretaken

      Re: Not Accenture

      You are Peter Thomas, and I claim my £5!

  9. Anonymous Coward
    Anonymous Coward

    'Ye Old Exchange 5.5 Bombing of 16GB limits

    I remember in my early admin days that one user ALWAYS bombed the e-mail server at the private school I worked at. Guess who? The PR lady for the school who was also a student there (so we had to tread carefully when giving a telling off). Every single end-of-term, they'd be a huge circle e-mail of attached raw images (straight from the DSLR) with a copy of the newsletter text to the "All Staff" group (roughly 200 at the time). Then all the OOF replies and externals who always copied back in the image attachments on reply. This was in the age of battling the 16GB limit of the Exchange 5.5 Standard server.

    My boss always fumed up, rung me (or visa-versa) and we both go storm the office where the PR lady worked as soon as it happened. Never learn't until we got our new shiny Exchange 2007 server.

    Anyway, in light of this piece of the story... "The local admin selected the "only in my organisation" rule, which she thought would restrict the distribution list to her South London clinical commissioning group.

    "A software configuration error meant that the system applied an 'All England' rule rather than one including only the administrator's organisation," continued the report on the snafu. "The administrator would not have known that this had occurred."

    What happened to competent sys-admins that would test the result first before acknowledging the task is complete to the user? Shouldn't have to put blame up the chain to a more senior admin if they messed up the config. Those things should be caught and dealt with to stop this stuff from happening. Shame NHS won't move on from e-mail either and use chat-based apps or something similar to conduct communications.

    Die e-mail DIE! It's a curse to all in IT.

    1. hmv

      Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

      You'll get me replacing email with chat applications when my manager agrees that I don't have to do any work.

      Making it easier to interrupt me at _my_ work to help out with _their_ work is not something I intend doing.

      1. Robert Baker

        Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

        "You'll get me replacing email with chat applications when my manager agrees that I don't have to do any work.

        Making it easier to interrupt me at _my_ work to help out with _their_ work is not something I intend doing."

        You'll get me to replace e-mail with chat when flying pigs land on the frozen plains of Hell. Chat is a text version of telephoning; e-mail has many advantages over both, such as being able to compose my response and not having to reply immediately.

    2. Anonymous Coward
      Anonymous Coward

      Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

      Do you work for Atos?

      http://uk.atos.net/en-uk/home/we-are/zero-email.html

      If not you could slide into a job easily.

    3. heyrick Silver badge

      Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

      "Shame NHS won't move on from e-mail either and use chat-based apps"

      Hmm. Is there one that works on all sorts of equipment, is easy to use, reliable, and doesn't store loads of shit off-site (and quite likely off country where it can be "analysed for marketing purposes")?

      1. salamamba too

        Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

        and can reliably archive all messages for years in case of law suits,r change management etc

    4. A Non e-mouse Silver badge

      @A/C Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

      What happened to competent sys-admins that would test the result first before acknowledging the task is complete to the user

      From the original article:

      The administrator would not have known that this had occurred.

      i.e. The administrator was testing what they had setup before handing it off to the users. But the system was so broken, not even the local administrator knew what was going to happen.

    5. Dan 55 Silver badge

      Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

      Why are you ripping Exchange a new one then ending with "Die e-mail DIE! It's a curse to all in IT".

      No, your problem is not e-mail, your problem is Exchange.

      1. Anonymous Coward
        Anonymous Coward

        Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

        Both Exchange and E-Mail are equal problems. Need a lot of expertise to setup in big organisations and keep on top of. Of course processes stop messy setups, but is there any need to spend big bucks on huge e-mail server farms anymore?

        E-mail for all its compatibility is awful for inter-company productivity. I'd personally not like to have to deal with them in my line of work. Apart from a few good clients, no-one has written any intelligence around them or organised e-mails better. Outlook is a pile of poop too for it's very small set of good features. Rather not sit behind e-mail either (unlike some who hint at it) as an excuse for not moving my butt to get something done because I've genuinely let the email item rot with a lack of system to come back to it. With better chat/helpdesk/organisation tools out there now, why rely on e-mail?

        @heyrick - Zulip, OneTeam, Rocket.Chat - there's plenty of opensource ones for self-hosting that doesn't need reliance on Atlassian/Slack. Although both are very good products.

        I personally don't get the dogged determination to defend e-mail. There are ways of stopping this sort of issue happening (although I appreciate get mistakes get made - I've made them myself in the early days).

        1. Dan 55 Silver badge

          Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

          As well as an open protocol and storage standard, e-mail is better than chat in that more thought has probably been put into the original e-mail and you're not expected to reply right at the moment you receive it. IM just derails your thought train and you're expected to drop what you're doing to answer (which could take half-an-hour of back and forth).

          1. Anonymous Coward
            Anonymous Coward

            Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

            Depends how you use the chat utility. There's no requirement to respond straight away (you can log-off/go to an away status). I love the fluidity of a chat/ticket-system utility over e-mail. For slow inter-3rd-party comms, it's fine. Inside a business? Not so sure now. This is where a business/org like the NHS, there much better ways to distribute global comms and segregate team communication.

            I don't care about the thumbs down guys. It's almost a survey of how many of the old-school like e-mail which is fine. As a comms tool for a business, it's abused much so by lack of training on how to use it + it's cumbersome/clunky. Seen it, managed it, done it.

            Gutted there aren't many or any who take an opposition to e-mail.

            1. Jamie Jones Silver badge

              Re: 'Ye Old Exchange 5.5 Bombing of 16GB limits

              If there are issues with your implementation, fix your implementation. Don't go all ALSA on it.

              There's a reason I refused to take phone calls, like IM, it's a jump-to-the-front-of-the-queue-i'm-more-important-than-what-you're-doing-now system

  10. An nonymous Cowerd

    stuff happens

    some wellwisher recently sent me a mail message, with a small xml file pretending to be a "meeting.ics", this small file contained a list of 19k+ addresses, all in rsvp-mode

    24 Apr 2015 17:26:01 +0200\n

    From: remote.participation@xxx\n

    To: an_onymous_cowerd@xxx\n

    Reply-to:\n

    Subject: Adobe Connect - Meeting Invitation to "Meeting Room L2"\n

    MIME-version: 1.0\n

    . . .

    BEGIN:VCALENDAR

    removed much stuff that is evil

    . . . 19 thousand respondents

    PRIORITY:5\n

    X-MICROSOFT-CDO-IMPORTANCE:1\n

    CLASS:PUBLIC\n

    BEGIN:VALARM\n

    TRIGGER:-PT15M\n. . .etc

    good to see that it might have only disrupted services for a few hours, I post this info here as I've already described the event widely & openly and I've left the field of crypto/internet-security/balance-of-privacy-vs-security/ for something more peaceful!

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like