Something Not Quite Right with ATO report
EDS won the initial ATO outsourcing contract in 1996/7 and was bought by HP, a hardware vendor, in 2008 to become HPE and this years' merger with CSC created yet another entity, DXC.
Now we have a hardware vendor acting as an I.T. services provider, which creates confusion over their preferences for new hardware.
1. I've never seen one byte of data stored on a SAN, they are NETWORKS only, so why do the ATO / HPE claim their "SAN" stores data and has drives?
Either the HPE & ATO don't know the difference between a Network and Storage Array or Device (!), or they are deliberately mis-using terms for a reason. This is not a rookie mistake.
2. Sure, cables fail, but all your cables don't all suddenly need replacing after 6-12 months, nor do they suddenly become "stressed" if properly installed and maintained.
From the scale of the task, we can infer large numbers of cables, perhaps all or most, were replaced.
Why is the ATO (or HPE/DXC) insisting on replacing the 3PAR hardware and a full forensic tear-down and investigation, including of all the "fibre optic cables", if there has been no unusual event or physical damage?
That action wouldn't be justified on technical grounds alone, but would result from a serious legal dispute between client and vendor over liability. EDS was know for its
If the root cause suspected is physical damage, whoever ordered the action that caused the damage will be responsible.
Yet nothing explaining this unprecedented forensic examination is contained in the ATO report.
3. If the ATO is talking "Cloud", it's quite bizarre.
The 3PAR 20850 device named is a low-latency, high-performance All-Flash Storage Array that must be locally connected to hosts to be usable.
The ATO already extensively uses VMware to manage its workload and move executing instances between hosts, even Datacentres.
They already run a "Cloud", so is this code for outsourcing these operations to another supplier - which would mean breaking the existing contract with years to run.
I've not seen the EDS / HPE / DXC contracts, but I'd expect breaking them would cost the ATO dearly in time and money.
The ATO refers to the 3PAR Storage Array as a "20850 SAN" - if you look up the device, it's an All-Flash Array, not a Storage Area Network.
This mis-use of the term "SAN" is consistent - they report they replaced "an EMC SAN" with the 3PAR.
EMC have always just made Storage Arrays, never switches and Fibre Channel network gear.
Mentioned in passing, as a label in "Figure 1", is another device, "XP7". There is a HP storage array with that product designator that provides PB using HDD's, not Flash.
How does this relate to the system, given it gets mentioned just once? Why is the XP7 included at all in the diagram if not part of the functional system?
What devices & cables are the SAN (Network) and what are the Storage Devices?
Why do the ATO & HPE conflate these specific terms? "Ignorance" is as daming as "Obscuring".
What about HBA's, switches, routers & "directors" that are in the network? We hear nothing of them.
Are they using Fibre Channel (16Gbps?) or FCoE with 10Gbps ethernet interfaces (or faster)?
If it's FCoE, are they using HP or CISCO as their fabric? Or someone else entirely?
Where were the fibre problems reported by SNMP between?
Cables aren't active devices, they don't log errors themselves, only the devices that attach to them can detect & report errors.
We hear "cable", but is that a permanent cable between patch panels or patch lead(s)? (panel to panel, panel to Array, panel to Host)
So what devices were logging the SNMP errors over the 6 months prior to the first outage?
Where do they sit within the environment?
If we had 3PAR HBA's logging internal errors to its drives over local, non-SAN links, that's very different to a host connecting via the Network to the 3PAR controller.
The ATO reports takes special care to mention "data paths", disk drives and "SAS" (Serial Attached SCSI), but never cares to provide any sort of explanation of their importance/relevance or diagram of connections.
No, that's not restricted information in a simplified document. No reason to suppress that level of detail - they've disclosed other very specific technical details.
Over decades, I've never seen an optically connected drive, the HDD & SSD's I've seen have only ever had _copper_ connectors. SAS is electrically the same as SATA, but allows dual-ports and daisy-chains.
The "state of the art" is fibre connections between backplanes, into which the devices are plugged with copper connectors.
That'd make sense for either SSD or HDD drives in either 3PAR 20850 or XP7.
It's where you'd expect "data paths" to run: from Array Controllers to drives/shelves. That makes these "stressed" cables internal to the 3PAR Array, not part of the Network.
Hosts connect to the SAN via at least dual connections, while on the other side, the Array controller has multiple connections to the SAN router / director for performance & reliability.
Were the fibre cables that caused errors on "data paths" internal to the 3PAR Flash Storage Array or within the SAN (network)? HPE & the ATO keeps that obscured.
The whole point of the 3PAR, in fact any Array, is to hide the individual devices from the hosts and create virtual error-free devices of any size.
With Network attached Storage Arrays, there aren't any direct "data paths" from a host to a drive - where the errors described occurred.