Feeds

back to article NetApp and TMS involved in Virgin Blue outage

Virgin Blue has fingered Texas Memory Systems as the cause of its 21-hour airline reservation system crash in Australia. The airline's reservation system crashed at 8am on Monday in Australia. The cause was a hardware failure in the computer set-up running the New Skies Navitaire software, which was hosted by Navitaire, an …

COMMENTS

This topic is closed for new posts.
Silver badge
Pint

Navitaire has a solid reputation ...

in the smaller airline business and it's popularity is born out by the number of carriers using it,

Surprisingly some of the larger res systems use banks of relatively small computers to deliver what they call service. Mind you, travel agents still have to wait wait weeks for some back office services. One of the three larger res systems still diddles carrier charges and forgets to credit agencies with their full commissions.

1
0
Grenade

The problem is most likely the "Accenture" bit

Seeing as Accenture was involved, that's going to be the real "root cause" rather than any specfic implementation of tech.

Seen them screw too many pooches, then finger point elsewhere. Which they're doing again here.

0
0
Silver badge

Accenture?

That is all.

0
0
Thumb Down

Mushrooms

Non-stop!!!

Could someone explain him what is enterprise escalation means?

0
0
Silver badge

Outsourced? Tested?

The most obvious thing to wonder about is the details of who and how the out sourced system is managed, do they have sufficient calibre of staff?

But more fundamentally, from our experience, is did anyone actually test/simulate hardware failures on the system before deployment to find out (a) if the system properly detected/handled the errors properly, and (b) to verify the procedure to recover the system both exist and actually work.

The fact a vendor tells you they are fault-tolerant, can fail-over to a backup/cluster member, blah-blah-blah, counts for SFA - test each step yourself!

0
0
FAIL

failover

[quote]

Navitaire isolated the failure to the device in question quite quickly, but the decision to repair the device proved "less than fruitful and [it] also contributed to the delay in initiating a cutover to a contingency hardware platform." A failover process that should take around 90 minutes took the best part of a day.

[/quote]

There's your problem right there. Okay, hardware failures happen, we know this, that's why you build a failover / warm standby / whatever environment so that WHEN the completely unexpected unplannable for failure happens it's not a complete disaster.

Given that you've already got the failover environment, when the dreaded happens, USE IT! No point having it if you don't use it. Muppets!

(speaking from experience when our live storage decided to wipe all its config... 2 hours in we decided that a repair would take too long and switched all services to the DR site).

0
0
Gold badge
Thumb Down

Run by a subsidiary of Accenture

Hmmm.

In the commercial world this is likely to loose them the contract.

Good thing governments are *so* forgiving of these little glitches.

0
0
This topic is closed for new posts.