A bungled upgrade at UK payment processing firm Protx left thousands of online merchants unable to take payments on Wednesday. Reg readers reported that they've being unable to process payments since 0600 BST at a result of the SNAFU. Two report being unable to reach the firm by either phone or email in an attempt to resolve …
Just the half of it
Today's outage is just half of the issue with ProtX.
Over the past two months we've tried contacting ProtX to learn about how their new payment system is going to work with all the "3D Secure" stuff Maestro are rolling out along with changes ProtX themselves are making to their recurring billing system (whereby a client signs up and we bill them monthly, quarterly etc without the client re-entering card details), but no one at ProtX seems to know what actually is changing.
ProtX is cheap compared to alternatives, namely the likes of WorldPay who charge an awful lot in transaction fees, but with 10,000 businesses using their system, ProtX need to get their act together. If they need to employ more qualified staff then for god sake do so, theres nothing worst than a company sticking their head in the sand ignoring the clients and they shouldn't be afraid to increase prices if we see real results. The next cheapest payment gateway we could use would be almost five times more than we currently pay ProtX.
3D secure doesn't work either!
I have been running and testing the 3D secure integration with both Datacash and Protx and have finally switched it all off!
There are too many bugs at the banks end of the process and they pass 'unknown' error message back to the providers who pass them back to me. I am supposed to get my bank to fix it - are they kidding me?
The protx upgrade is just part of the problem - their email the day before insist you enable 3D secure after the new upgrade but they denied there was anything wrong with before the release. So how can they fix it if they do not acknowledge it?
So far I've lost £2k and counting.
To be fair...
It was fixed within 12h, it was a major upgrade and looking at their history of up-time this accounts for a miniscule proportion. I actually spent half the day waiting for them to come back online as I was carrying out an upgrade of a client's website and couldnt do any live testing til then.
The servers were actually up all day, its just that payments would take up to 5 minutes or simply die. If you were lucky you could get maybe 1% of payments through.
They were keeping everyone updated via their status website, which has had 3 postings today: the first saying they were down, a repeat saying still down but working on it and then the fix notice early in the afternoon.
All in all, its a pain in the arse and it cost a lot of people a lot of money. At the end of the day, should a customer expect 100% uptime from anythin? Does anyone use multiple payment gateways for guaranteed uptime?
Re: To be fair...
“via their status website”
Where, O oracle, lies this status website? It's not linked to from anywhere obvious on their site and Google is unable to locate it, nor have their mentioned it (to my knowledge) in any of their newsletters.
Contrary to your belief, a 12-hour outage of a payment system is a *major* deal. An hour? That's a bad day. Two is pushing it. Anything more than that is beyond a joke.
Re: To be fair...
Mo, Protx homepage, left hand side - VSP Monitor. OK, so it doesn't say "Status page" but it's not exactly hidden either.
Otherwise, agree with you entirely. This was a massive f up of the highest order.
Several years ago, HP pushed out the concept of five nines and "the industry" embraced it. Well, other than Microsoft. Financial institutions really have to keep high up time. And have to do it 24x7.
To be down 12 hours is totally unacceptable. A small shopkeeper might do a couple of hundred (name your currency) and hour throughout the day. That's still 2400 that he/she is out on sales. And a customer who may go to another store the next time. Imagine if you ran a bar and were unable to process transactions for even half that down time. That is real money. Even the least expensive processor is still pretty expensive. We pay $10 a month, $.25 a transaction and 3.2%, and this is for a non profit, which gets a discounted rate.
I wonder if there will be any litigious fall out from this.
I've had the odd problem with SecureHosting, but nothing lasting more than half an hour (if that). 12 hours is ridiculous, especially after being warned of the problem after testing. If any of my suppliers refused to answer support calls I'd find an alternative immediately, regardless of whether it cost 5 times the price.
Never mind, I know of one major bank of the big four Down Under in Oz , that took well over 9 months to figure out a simple problem with the payment of telephone accounts to some 6 odd million customers by credit card! During this outage the Telco was never out of pocket by the way!
Imagine the delight of many customers seeing 3 past phone bills being posted to their credit card accounts almost a year later , after most had thought they had paid their bills in full !
Mind you , you could imagine the volume of complaint letters to all the Banks within Oz that was generated ,who were not amused by this simple mistake ! It was so high the Oz Post Office had to employ a number of extra's just to carry the big bags of snail mail!
It's all about managing risk
You have to anticipate that such an upgrade could cause an outage and balance the effect of that with the benefits of upgrading. Once you have done this the course of action is clear.
Murder as many IT project managers as possible as quickly as possible. It's what I call PRINCE3 Risk & Issue Management.
"...and we’re unable to process any payments, along with lots of other people."
The first thing that came to mind after reading that line is a picture of someone running down the street shouting "Protx Green is People!!"
I've decided to run multiple gateways so I can overcome any problems with downtime. When ProtX went down I was immediatley flagged by one of my customers so I immediately switched them to another payment gateway, Secure Hosting. It's the only way forward people it't not just money our customers are losing it's their reputation as well as ours.
ProtX runs an upgrade. Upgrade fails. Where was the failsafe architecture? Where was the capability to roll back immediately to the previous version of the transactional software when problems became apparent and where was the real-life test to ensure everything worked?
From my background I find it somewhat surprising that a financial services provider made such a fundamental mistake without having any backup. Are you insane? Completely meshugge? ProtX KNOWS that organisations rely on their system for their livelihood. To play it fast and dirty like that and then go down for twelve hours to fix it is irresponsible.
Yes, so they fixed it in the end, but it should NOT have happened. ProtX is not alone though! HSBC and other financial institutions have made the same mistake with ATM and banking upgrades (Abbey only recently, hello?). They shut down and fixed the systems, but they've never had that problem with their merchant systems because they know they have to keep those running or the rest of the country grinds to a halt.
ProtX, my advice:
1. Test your systems rigorously!
2. Double up on your live servers. Upgrade the one set, wait for problems, keep the others in reserve (and not connected to the rest of the world).
3. Switch to the second set if the upgrade does cause problems (when you get 50 calls about the same problem, you SWITCH. You don't ask questions, you just do it!!!)
4. If the upgrade is successful, upgrade the second set, fixing any problems you may have encountered in the first set.
You NEVER EVER EVER EVER upgrade the whole lot at the same time and cause endless hell for customers! Where were your project managers on this? Fire the lot of them for screwing this up!
Jeez. I'm glad I'm not a ProtX customer this time round, because I would've sent the lawyers round to you by now.
For such an institution having a “test server” is an absolute necessity.
Clone the “live” server onto the test server, install [whatever] and test it out. Does it work? – clone it to the live server. Did it f up? – restore the server backup. All the data should (must) be stored on other serverS – so it makes no difference what server reads/writes the data.
This is SO basic “critical system” setup; it is very surprising that they did not think of it.
If you cannot afford multiple servers (I am guessing that this not the case with this company) just run VM on one server, one VM for SQL, one VM for transactions, other VM a clone of the transactional server for testing purposes, and so on.
The actual problem is the cockiness of the (obviously) inexperienced IT team. “We know best” type thing. Seen it before, and it is always extremely annoying when trying so suggest something to such people, and see them sniggering, thinking “yea, sure, you have a accent, learn to “speak” before you teach us”
How About Designing Your Site Better
People may complain that the outage cost people money due to lost orders?
My company uses Protx, we ran a huge promotion yesterday and didnt lose a single order.
I think people should design their sites better so they can cope with this kind of problem.
Protx -not the first time!
As one of the merchants let down by protx yesterday, it's worth pointing out that this type of service is now the norm. Even when they aren't upgrading, I've had instances where I cannot even get an email through to support!
Yesterday was no exception. I sent 60 emails and everyone bounced back. If you try the other departments in the company no-one will help.I think it's a classic case of taking on more business than they can handle.
Agree with Steve, I've had constant problems trying to contact ProTx support and not getting sufficient responses. One time it took me 6 months (!?!?!) and several escalations to get a simple answer about how part of their system works. Problem is as other people have mentioned is that their cheap, our profit margins are so small that a couple of pence per transaction makes a BIG difference.
@21:28: "Several years ago, HP pushed out the concept of five nines and "the industry" embraced it. "
It predates HP by a long way, the telecomms business has been using the term for many years. It is just recently that manufacturers of general-pupose COTS computer equipment (HP, Sun, IBM, etc.) have reached levels of reliability (software and hardware) such that they can start to offer mainstream equipment for use in 5-nines systems.
The computer industry in general always thinks that it's on the bleeding edge, but when it comes to reliability engineering it has a long way to go to catch up with the phone carriers, as will become increasingly evident as more people change from conventional phone service to VoIP. There's a reason that VoIP is cheaper than POTS, and you get what you pay for...
The Protx Saga
Protx good. Protx taken over by Saga. Protx now shit. Coincidence?
from my experience...
...in IT at a financial institution, any update at all is done outside of business hours. (unless it is a fix for a major problem thats already happening)
Every change I have ever done has been implemented at 3 o'clock on a sunday morning (ps i do not work at IF), when nothing is running apart from batch jobs. We then get people in to use the system before business hours and if its not working at 08:00 it gets pulled and rolled back to the previous version.
We do have a planned nightmare scenario which does involve 12 hours of downtime, but it hasn't happened yet, the idea is that if something does slip through and cause a problem during business hours. If only one part of the business is affected, and it will affect all of the others to fix it. The one affected system is shut down until business hours are over and the fix done then.
I'm not sure if Protx have the same issue, only one service out of a number failed and so they didn't want to risk the rest by rolling back until the end of the day?
Re: The Protx Saga
"Protx good. Protx taken over by Saga. Protx now shit. Coincidence?"
ProtX was taken over by SAGA...? That explains why there was a problem - ProtX was bought out by a bunch of OAPs. It probably took them 12 hours to navigate the corridors of ProtX Towers with their zimmer frames to get back to their server suite to uninstall their borked update.
Still problems, 36 hours later...
Despite Protx's insistence, there are still huge numbers of people having problems with the new system. It's not down to user error either - we have several website running exactly the same code, but one of the just simply doesn't work with the new Protx system. If you point anyone else's account at their site, it does work though - AND it works on their test system.
Someone has dropped the ball big time with this, but all the people responsible are hiding behind 1st line support and insisting that there was no problem.
Just a quick look at the most popular posts on their support forum shows that there are loads of "features" on live that never appeared on test (different validation rules for dates, for example) and lots of bugs that were previously reported, but still released onto the live servers.
Re: Still problems, 36 hours later...
I completely agree. There should have been no other changes, other than forcing everyone to use 3D Secure. But our existing ProtX integrations were completely broken by other changes, which were NOT communicated by ProtX, no information at all was provided about these changes. Their document states a load of stuff as "optional", but in the end it is actually required!
Also, they were not purchased by SAGA, it was Sage - Unless you're being sarcastic and I'm being pedantic by refusing to accept the sarcasm and correcting you :p
Backup payment service provider
Obviously losing your PSP service in this way is a major problem - and all PSPs have their issues from time to time, no matter how much investment is thrown at infrastructure.
It really makes sense to set up a second account with another PSP, that way you can switch to the other PSP should your main one have any problems. OK, so it's going to cost about another hundred quid a year but surely that is insignificant compared to the financial and goodwill cost of losing your payment provider.
Load of Rubbish
I am amazed at the comments I am reading re Protx. This upgrade was a complete mess and has not been fixed yet. Nobody answers phones, no emails are answered and their documentation for the required changes was incomplete and just plain wrong. We have lost tens of thousands of pounds on this upgrade. If you want to get the real story check out the forum below
- Vid Hubble 'scope snaps 200,000-ton chunky crumble conundrum
- Updated + vids WHOA: Get a load of Asteroid DX110 JUST MISSING planet EARTH
- 10 years of Facebook Inside Facebook's engineering labs: Hardware heaven, HP hell – PICTURES
- Very fabric of space-time RIPPED apart in latest Hubble pic
- Massive new AIRSHIP to enter commercial service at British dirigible base