300,000 UK T-Mobile customers had a quiet morning as they were unable to make or receive calls thanks to a database snafu that forced the operator to restore from backups - a process which is still in progress. The problems started at around 10am this morning, and meant that 300,000 customers couldn't be verified by the network …
Couldn't happen to a nicer network
So, things havent changed since they were one2one and cut me off whilst abroad for exceeding my credit limit, which they hadn't told me about. Without any prior noitification. So I gave them one more chance and told them so, and guess what ?? They did it again. Again without contacting me first to give me a chance to add funds to my account.
I will never, ever use One2One or its successors-in-interest thanks to their total incompetence
"Network operators work hard to reduce critical points in the network, replicating key servers and infrastructure, but such procedures are much harder to implement where security and authentication are involved, leaving customers at the mercy of software bugs or hardware failure."
Sorry but I question the validity of this statement in today's environments. I expect this glib type of reporting on ITN but not on The Register 8-)
Security and Authentication systems can be easily replicated both for high-availability and fast disaster recovery. RADIUS, Active Directory, LDAP, Certificate Authorities, Firewalls, anti-DDOS .. you name it, most of these are straight forward to replicate. For these types of always online systems DR by recovery from backup is just a cheap cop out IMHO 8-)
Sure, if there is a software bug (or more likely admin level "PEB[CK]A[SK]" screwups) then faults can spread and bring whole systems down but authentication systems can be run with appropriate breaks/checkpoints to limit the scope of any damage.
So now you just wander the internets, moaning about being mistreated by a company that ceased to exist at least a decade ago?
In the interests of balance, I've been with T-Mobile ever since 1998 when they were one2one, and I've never had a problem with them - in fact the service has been excellent.
Does the "T" stand for...
Certainly looks like it.
The only place in my house where I could get a connection via T Mobile was sat in the bath!!
Crap service, crap coverage.
Paris,, cos she has T Mobile clothing ( regular big holes in coverage)
A title shouldn't be required
Nice to hear that a companies backups are actually working (presumably) as planned for once.
Get over it
@MIke, I would think that an outage like this is just a little different from them cutting you off due to you going over some predefined limit.
Should ask Jacqui Smiff and Nu Liars
She knows all about Database's and corruption
Sounds like HLR
As my number is ONO (originating network operator) from T-Mobile I was also impacted by this issue. . . sigh should really think about getting a new number just too damm lazy to learn another.
Their Ex Customers Migrating Out Are Having Issues Too
I moved to Voda, the vodafone went live last Friday, and both phones remained working until yesterday after which the t-mobile phone went offline.
Unfortunately, I can only call out from the Voda and cannot receive calls or send/receive texts. This is due to a cock-up at t-mobile in the porting process - which they have admitted to.
I have asked vodafone to forward me the evidence that they have on t-mobile over this so that I can forward it to ofcom - t-mobile are well outside their 5 day remit for completing a port.
T-mobile will not give Vodafone a resolution date for this. They keep saying 'by 4PM' - its been 'by 4PM' for 5 days now. T-mobile have called on the services of a third party to get this fixed.
I'll forward any evidence I should get to the reg in case they want to publish it.
the difference personally, they're crap all the time
T-Mobile leaves 300,000 disconnected
first off Mike Pellatt you accepted this limit as it is in the t+c's of service, second the layout and set-up of a mobile network is different to a computer network, authentication doesn't just rely on the above standards and protocols, some data and keys are required from the sim, these are replaced when a new sim is issued and must be identical otherwise no service, if these have been deleted or overwritten the you are up a creek without a paddle
HLR yeah, but for the wrong reason. Your phone is identified on the network by an IMSI, subscriber number is irrelevent. I'd say your current carrier has local roaming set up with T, and its a break between the VLR and HLR, which would result in you phone not being able to get back to your home carrier via T's systems. I'd be interested to see if an OS user roaming on T had a problem or not..
End of the day, big infrastructure is a bastard to maintain, at least they got onto it as quick as you can expect.
Can I just say...
That I have had a T-Mobile contract for about 6 years or so. I have never ever ever had a problem with them. I've even called them when my direct debit bounced (woops!) to apologise and they simply said "Oh, that's no problem, thanks for letting us know, we'll try again in 14 days". Great service.
I too was affected by the outage yesterday. So effin what? To be honest, I just assumed that the local cell was down, possibly for upgrade/maintenance, and filed the whole issue away in my head under the category 'shit happens'.
I mean, why do people get so peed off if their poxy mobile doesn't work for a few hours? What's the big deal? Use a freakin' land line FFS.
That's what we did 20 years ago. Used landlines. We didn't have palpitations if we left the house 'without me mobile', because we didn't have one.
Get a life you sad whinging gits.
Mine's the one with no mobile 'phone in it.
Thats ok 'cause you can kick in your slower backup sql server that you replicate to right? right?
Paris, she regularly has her back up.
Ha more evidence T-Mobile are useless, they do my works phone and at home the reception is cack where as my o2 reception is perfect.
not complaining though, not my fault if people cant get through when on call
It's the HLR
I happen to know that T-Mobile are upgrading their HLRs around about now, and it sounds like this is what caused the problem. Having a backup close to hand sounds a lot like -- we migrated, it was fecked, we rolled back.
There's not really much you can do in these instances. HLRs are a bit old skool (at least the ones T-Mobile were replacing) and are designed to be highly available, highly resilient in their own right. But, if you're swapping from one to another -- there's always a chance things can go wrong.
To be honest, it seems like they did a pretty good job of containing the issue.
@Danny: SQL Server? Are you having a laugh. That's _certainly_ not carrier grade. If the HLRs were SQL Server based, you'd never connect a call!
@Yorkshirepudding: that's really just physics, and not much T-Mobile can do about it ("ya cannae change tha laws of physics, cap'n, etc. etc.). T-Mobile runs at 1800MHz, while O2 (and Voda) are on 900Mhz. The low frequency has greater penetration. Hence O2 and Voda customers can use their phones where Orange and T-Mobile cutomers can't. As far as I'm aware, all networks are using 2100Hz 3G, so are all as fecked as one another in that area.
Paris, 'cos we all know when she goes down.
@TMS9900, totally agree mate but hey the vast majority of the UK population are so emotionally weak that they go to pieces if someone is not texting (or to some texing) or phoning every two minutes then they go to pieces and these are the same bunch that are on facebook day in day out.
I carry a mobile yeah but wouldn't be devastated if O2 went offline for a bit.
Oh and btw, O2 have better coverage than T-mobile because of the lower frequency and more base stations.
Oh and to the NO2ID crowd, a mobile can be used to track your almost precise location so why do you carry one.
@anyone wondering why they didn't just switch to another server
... they may well have had another DB server to switch to.
Usually these things are set up with replication. If the replication target server accepts the same corruptions as prod - (eg. some muppet dropping a table) - then it's feck all use, it's in the same condition as the original production one. The only solution is to have something which really does keep copies of the data somehow, or a way to roll back.
I'm pretty surprised they didn't have an few hours old snapshot somewhere for a DB of this criticality... or the ability to just pull the real DB off tape or disk fast enough for it not to make much difference either way. That's how I'd have done it (with some sort of snapshot, flashcopy or whatever).
Paris 'cause maybe they'll think before acting in future.