back to article HPE 3PAR storage SNAFU takes Australian Tax Office offline

The Australian Taxation Office (ATO) has confirmed its more than two-day outage which has still downed customer-facing sites and systems was caused by the failure of new HPE 3Par storage units. Sources close to the blunder told Vulture South the ATO's disaster recovery plan may have failed, with petabytes of data at risk after …

  1. Paul Crawford Silver badge

    What was that Skippy? Was it 3PAR kit you say?

    Were they taking a leaf from Kings Collage London on this? Unlike KCL they probably will want users to keep thier own records:

    http://www.theregister.co.uk/2016/11/15/after_kcl_kills_uniwide_backups_staff_get_order_to_never_make_their_own/

  2. Mark 65

    Update

    Tax is replacing its end of life EMC kit with HP gear as part of a 2010 A$1.26 billion contract

    Should have probably just updated to newer EMC under the threat of moving to HPE. Actually moving doesn't seem to have worked too well.

    1. Down not across Silver badge

      Re: Update

      Or perhaps some other vendor may have been better choice. 3PARs seem to have issues and more importantly HPE seems to struggle to work out what from what I've heard from other enterprises as well. EMC, HDS etc just seems to work (or at least their support can work out what is wrong without entering into "let's switch bits until it works" game).

  3. AnAnonymousCanuck

    EMC Just Works?

    Hahahahahaha!

    Thank you for my morning pick-me-up.

    I love EMCs, they're wonderful for billable hours!

    AAC

  4. Aristotles slow and dimwitted horse Silver badge

    In my experience...

    The HPE kit in itself is ok. The problem as always with any facet of dealing with HP (now specifically HPE) is with their own internal *cough* professional *cough* services and management people, and the multitude of various generally uninformed and unempowered subcontractors that they use to fill their own resource gaps with.

    Taking that on board, then at first glance I'm not surprised at this story. But considering who the customer is - I would than also be looking hard at the client side IT management and digging further into those risk assessments, migration plans, dress rehearsals and backout tests that should have been completed and signed off.

    Again, all that said... IT's complex 'innit and things do go wrong.

    1. Scuby

      Re: In my experience...

      The trick is to make sure you deal with original pre-HP(E) 3PAR Techs.

      I've been running 3PAR Arrays since 2007 through 5 Generations of the product and have never had any issues with them (in part due to having the right people available to field my calls and make sure they get escalated to the correct team on the rare occasions there has been a problem.)

  5. Anonymous Coward
    Anonymous Coward

    let me guess - who answered the phone ?

    The called HPE local support who called HPE global support who called HPE USA support in Calif who called engineering who called some poor slob in India if they knew how to recover it.

    That is life in high tech now adays.

    1. Slackness

      Re: let me guess - who answered the phone ?

      Very true wot she said ^----<<< After nearly 12 years at HP & HPE I am afraid that is the truth. Pass to India and the support workers are so silo'd in their specific capability, it's about getting the call passed or closed to fit measured resolution / pass the buck metrics.

      1. Anonymous Coward
        Anonymous Coward

        Re: let me guess - who answered the phone ?

        "Level 12 Helpdesk"

        Now with more foreign speech patterns!

        1. Anonymous Coward
          Anonymous Coward

          Re: let me guess - who answered the phone ?

          The ATO randomly called numbers on their list of infrastructure vendors.

          HP picked up first. Congratulations!

          CIO: "Today we have a storage problem."

  6. Anonymous Coward
    Anonymous Coward

    Are you trying to tell me they didn't summon the vast powers of ITIL processes

    1. Anonymous Coward
      Anonymous Coward

      Make fun about ITIL as much as you like. ITIL ensures that technical half wits still can get a job as a member of the Change Advisory Board. They can sit at round table and yap about how things were better back then. They can also dig deep in their repertoire of previous stuff ups to remind the current system admins of all the things that could possibly wrong.

      As the risk adverse types they get to play devils advocate - all the while not carrying any accountabily, responsibility or the skill to actually do something useful.

      Most importantly - the change request form must be conpleted properly - or it will be rejected.

      The annual ITIL refresher course has been booked 3 years in advance.

    2. Richard 26

      They probably have less formal processes in Oz, instead of a full CAB, they just need to get Bruce, Bruce, Bruce and Bruce to sign it off.

      1. Wallsy
        Stop

        Bruce CABs

        They're no less painful down here, even if they are all called Bruce. A colleague of mine had his change request rejected due to lack of a back out process, which would be fine except that the change was to reboot a server.

        1. Anonymous Coward
          Anonymous Coward

          Re: Bruce CABs

          Meanwhile, the vendor sales team and the CIO are out on the piss to smooth things over. Hold on ATO is government - they are out on a "technology briefing"...

          All the while - people spend time in the office putting an acceptable level of self flaggelation into writing, subtely attributing blame on anybody, but the ATO.

          There will be words such as "frameworks, process improvement, monitoring, best practice, putting strategies in place, technology alignment......" yawn.

          The Bruce's are happy, because they "told you so" during the last CAB.

  7. CheesyTheClown

    Problem with SAN in general

    I was recently told by a colleague of mine his company was about to upgrade firmware on their SAN controllers due to performance problems on a nearly exabyte SAN. I asked "Do you have a mirror?" And he said they have backup but not a mirror. I asked how long it would take to restore the backup and the number was nearly a month. I asked whether they have fully verified the contents of their backup and he said not recently because it would take a month just to stream the data from the backup.

    The problem with SAN is that it centralized all problems. It's a single point of failure. The performance of even the fastest NVMe SANs are very very slow compared to distributed file systems.

    They managed to do the upgrade it will now take about 6 weeks to run the rebuild on the array. The rebuild is destructive and they will have no idea whether the problem is fixed until it is done. They also don't know what caveats will be introduced from the upgraded firmware.

    I don't experience these problems because I run two distributed file systems. One for performance and one for transaction oriented journaling. I have about 1Tb/s bandwidth between the two systems which can easily be saturated during transfer operations. What'a best is that my system cost less than a 10th of what his system cost per byte and instead of adding new disk shelves, I add disk, bandwidth and performance for each expansion. Instead of replacing SANs, I simply remove obsolete nodes and add newer and more efficient ones.

    Trick one: Don't use VMware. Linux based GlusterFS systems only work with iSCSI or fiber channel which is slow and doesn't scale. VAAI NAS isn't available in Linux because of VMware's stupid policy of locking out open source developer.

    Trick two: If you absolutely must run VMware, use Oracle Solaris for storage. Unlike EMC, NetApp, 3Par, etc... it can actually do proper scalability for performance and capacity. Consider Oracle Infiniband for the storage interconnect. Take classes on ZFS. Use Oracle servers. If you can afford $15,000 per blade for VMware, you can afford Oracle servers for storage. Oh... and don't use Infiniband for networking VMware or NSX. The CPU cost is too high.

    1. Anonymous Coward
      Anonymous Coward

      Re: Problem with SAN in general

      I think you need to learn the difference between a SAN and an Array.

    2. Gerhard Mack

      Re: Problem with SAN in general

      "The performance of even the fastest NVMe SANs are very very slow compared to distributed file systems."

      Not according to any of my measurements. With several of our servers our Compellent SAN + 8 gbps FCAL link outran the local disks in some of our older servers. Meanwhile, GlusterFS on 3 nodes with local storage actually cost me a contract when it was outrun by a single NFS server.

    3. egar

      Re: Problem with SAN in general

      What did you use in your two distributed file systems?

  8. Pompous Git Silver badge

    We understand this is the first time this problem has been encountered anywhere in the world...
    Nope. I never heard of an untested backup failing either? Ever...

    1. Adam 1

      Trick question

      There's no such thing as an untested backup.

  9. Matt Bryant Silver badge
    WTF?

    Hmmmmmm.

    "....."Our primary back-up systems, that should have kicked in immediately, were also affected," acting chief information officer Steve Hamilton said in the written statement...." So, was that badly configured backup servers then? Didn't they do a failover test as part of the sign-off? Having seen 3PARS easily handle failover for many years, I'm more than a bit doubtful this was actually a 3PAR problem rather than a corners-cut-to-save-money problem resulting in a bad and untested design.

    1. Anonymous Coward
      Anonymous Coward

      Re: Hmmmmmm.

      The replication link was on the NBN? </joke>

    2. Anonymous Coward
      Anonymous Coward

      Re: Hmmmmmm.

      They probably had a backup system, but it was still sitting in a box.... the vendor won't mention it because they want to maintain the customer relationship. The ATO won't mention because they don't want to look stupid.

      Well, maybe not the ATO, but AGFA Australia pulled off that stunt in 2010 when the Impax Medical Imaging system went down.

  10. Anonymous Coward
    Anonymous Coward

    Coincidence or pass the tin foil hat?

    http://www.theaustralian.com.au/business/economics/ato-circles-firms-in-2bn-fight-over-avoidance-profit-shifting/news-story/1a969fef137f23571b994758a530cf83

    http://mobile.abc.net.au/news/2016-12-14/federal-government-to-crackdown-on-cash-economy/8118844?pfmredir=sm

  11. Anonymous Coward
    Anonymous Coward

    I'm familiar with this deal.

    NetApp was lowballed by HP, and EMC didn't bother bidding at all.

    Something or other about cheapest bidders, anyone?

    1. Anonymous Coward
      Anonymous Coward

      Re: I'm familiar with this deal.

      "NetApp was lowballed ..." ??? Really???

      A few years ago somebody died in police custody, so the AFP installed CCTV cameras in cells in rural Australia. NetApp provided the storage. They sold 20+ systems without hotspares and told the AFP that's ok, because NetApp uses RAID6.

      That's how low NetApp will go to win a deal. It's not just about winning a deal. It's all about "how" you win the deal.

      1. bitpushr

        Re: I'm familiar with this deal.

        NetApp deploys its E-Series platform for video surveillance. It uses RAID-DDP, not RAID-6, and DDP sets aside a portion of each disk as spare space. Therefore disks are not dedicated for spares, and rebuild speeds are dramatically improved.

        http://www.netapp.com/us/technology/dynamic-disk-pools.aspx

        Disclaimer: NetApp employee.

        1. Anonymous Coward
          Anonymous Coward

          Re: I'm familiar with this deal.

          DDP is Raid6, just with faster rebuild times.

          https://kb.netapp.com/support/s/article/how-do-volume-groups-differ-from-dynamic-disk-pools-in-e-series-storage?language=en_US

          It's not much different from what the HP EVA did years ago.

          Still, DDP does not jusitfy selling Next Busieness Day Hardware Replacement into a non-service region.

          You can get as technical as you like. That AFP deal was dodgy as.

  12. Anonymous Coward
    Anonymous Coward

    I wonder

    with petabytes of data at risk

    Petabytes... I wonder how many tax returns that is... Better yet. I wonder if mine is among them. I wasn't looking forward to a tax bill at Christmas...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020