Feeds

back to article Leap second bug cripples Linux servers at airlines, Reddit, LinkedIn

The leap second inserted at the weekend crippled Linux-powered servers running one of the world’s largest airline reservation systems - delaying and cancelling flights. Machines running the mighty Amadeus Altea system were brought down soon after an extra second was added to Coordinated Universal Time (UTC) at midnight on …

COMMENTS

This topic is closed for new posts.

Page:

jai
Silver badge

simple solution

can't we just trigger all the volcanoes on one side of the world to erupt at the same time, thereby adding some extra spin to the planet, negating the slow-down, and negating the need to keep on adjusting the time by a second? c'mon scientists! why haven't you sorted this out yet??

4
0
Anonymous Coward

Re: simple solution

Just string a row of Bussard ramjets along a meridian...

2
0
Megaphone

Re: simple solution

Took us out as well. This plus cycling the app servers fixed us:

sudo date -s "`date -u`"

1
0
Silver badge

Its a kernel bug

that only affects Java?

3
0
Silver badge

Re: Its a kernel bug

Don't know the details here, but it's not impossible. Kernel documented to do X, actually does Y which is subtly different. Java is the only widespread app which does something noticeably bad as a consequence. Everything else "just works" much the same under X or Y.

0
0
(Written by Reg staff) Silver badge

Re: Its a kernel bug

As I understand it, the system ideally should be a multicore machine on a 2.6.32-3.3 production kernel for the bug to occur: if one core grabs a high-resolution kernel timer lock while setting the time via the adjtime() system call and another core holds two other related timer locks. This will livelock the kernel, manifesting in services burning up all available CPU time.

I'm not entirely sure so I'd rather not say for certain at this stage; last time I looked there was a lot of discussion on exactly how the crash happens. You have to be fairly unlucky to encounter it, but if you've got a server farm, you will see machines dropping off.

C.

0
0
Silver badge
Unhappy

Re: Its a kernel bug

Fairly unlucky ... at 66% hitrate.

0
0
Holmes

Re: Its a kernel bug

Heh. I have one where a combination of DTRACE and the jvm results in spiralling startup times for shortlived java process on Solaris 10u7. Any suitably complex systems can interact in unfathomably chaotic ways...

2
0
Stop

Re: Its a kernel bug

"As I understand it, the system ideally should be a multicore machine on a 2.6.32-3.3 production kernel for the bug to occur"

The RHEL advisory says it affects only RHEL 4 and RHEL 5, though, which are running *far* older kernels (like, 2.6.9 and 2.6.18 or something; ancient history). It specifically says the RHEL 6 kernel - which is newer - isn't affected. That suggests you're wrong.

Disclaimer: I work for RH but on Fedora, not RHEL, and I know nothing about this issue beyond a vague impression that it only affects really old kernels, which is why it's older enterprise installs that are having trouble.

0
0
(Written by Reg staff) Silver badge

Re: AdamWill

"That suggests you're wrong"

With respect to version numbers, probably - it was a range some had suggested through experience rather than a triaged ranged. The reason why is more or less there - see the LKML.

C.

1
0
Holmes

@diodesign

BTW, Intagram also went down yesterday, and this was probably caused by the same bug.

P.S. Will we see anyone fron the Linux community falling on sword?

0
0
Anonymous Coward

Indian's again?

I remember back in the day Amadeus was one of my first experiences of outsourcing to India.

Speaking to there support team was great fun... "Oh what's that? Internet Explorer isn't working with Amadeus?... Now sir, turn off all security in Internet Explorer this will make it work"...

Ok, that was for the end user app.. But I guess the developers are the same.

0
11
Anonymous Coward

Re: Indian's again?

I have met some Indians with very good English though.

They know that Indian's again should be spelt Indians again and there support should be their support.

20
0
Anonymous Coward

Re: Indian's again?

I think you just got trolled.

0
0
Linux

Linux bug? Really?

We have linux boxes of various ages (some more than 5 years old) in some fairly critical places running messaging software with NTP direct from gps hardware. They all simply carried on working perfectly happily. Mind you all the apps are written in boringly old fashioned C.

Java? I've heard of it.

13
3

Re: Linux bug? Really?

Has anyone figured out why the Java VM running on Linux is so adversely affected?

0
0
Silver badge
Boffin

Re: Linux bug? Really?

From what I've read elsewhere the problem has to do with how it handles threading. Java was apperently not the only thing affected, just one of the most common things. I've seen reports of moderate to heavily loaded mysql instnaces being affected also.

Java just happens to be widely deployed, and very likely to create the proper situation for this bug to manifest, but java is not doing anything which is outside of what the system says should work (ergo, it's not a java bug). furthermore this problem did not manifest on non-linux systems running java.

1
0

This post has been deleted by its author

Anonymous Coward

Paging Bob Vistakin

We now need 4 months worth of shit jokes about Linux being a massive turd because some systems crashed.

Sauce for the goose....

8
1
Silver badge

Re: Paging Bob Vistakin

I have vague recollection of an OS/2 time bug that crashed ATMs across the world at the same time. If you're going to screw up, you might as well do it in a high-profile manner and make it worthwhile.

Anyway, highly-polished turd, if you please.

0
0
Anonymous Coward

Re: Paging Bob Vistakin

I also notice that we've not heard from eulampios... When MS had their certificate/Leap year problem they were really vocal, maybe they're both on holiday...

2
1
Anonymous Coward

Phew...

...looks like I somehow engineered a miraculous escape. Three Linux-based systems at my house (nettop, netbook and a Raspberry Pi), and I didn't find a single one of them on its back on Sunday morning, having joined the choir invisible.

Maybe I should've lent my Pi to these folks...

2
1
Anonymous Coward

Re: Phew...

Well, clearly you are running skanky hardware like me, nothing too fancy, so you'll be fine ;)

0
0
Silver badge

Re: Phew...

According to various write-ups, you needed a multi-CPU system for the problem to show itself.

0
0
Silver badge

Re: Phew...

Like that multi-CPU desktop I've got at home. Dreading booting it up tonight now.

0
0
Go

Re: Phew...

Fear not - the bug only manifested if your machine was (a) running at midnight (b) using more than one core AND had a really old kernel.

2
0
Vic
Silver badge

Re: Phew...

> you needed a multi-CPU system for the problem to show itself.

I'm running quite a lot of those, across multiple sites. I didn't have any machines lock up.

Of course, those machines run very little Java. That's probably unrelated.

Vic.

1
0
WTF?

Leap seconds: not a one-off unique event

How did this ever happen anyway? There's a leap second about once every 18 months IIRC, so it's not like Y2K where no-one had ever experienced the event before it happened!

0
1
Silver badge
Coat

Re: Leap seconds: not a one-off unique event

The last one was 31st December 2008.

(Amusingly, my copy of Bulletin B arrived while I was reading this.)

1
0
Anonymous Coward

Re: Leap seconds: not a one-off unique event

There hadn't been one since 2009, which is longer than some of those big-name websites have been running on their current infrastructure.

I support a major industrial / scientific application and have been shitting myself for the last couple of weeks, when I found out accidentally about the leap second. I really should read those NANUs more often :-P

It also hit a major GPS vendor in a similar way as us: by the time they found there was a problem in their firmware, there was not enough time to fix it and push out the patches, so they had to email users with work-around instructions instead. Our problem was slightly different in that we didn't realise about the leap second until two weeks ago (they are published at least six months in advance), so we didn't have time to test--in the end, none of our users have reported anything amiss that I'm aware of, so all's well.

The point here is that, while not unique, it's infrequent enough that I lot of us don't give it the attention it deserves, particularly since high-precision timing has a lot more users nowadays than it had only a few years ago.

Be interesting to see what will happen when/if the first negative leap second is introduced.

1
1
Anonymous Coward

Re: Leap seconds: not a one-off unique event

Incidentally, that Google blog post mentioned in the article makes for an interesting read. A proper engineering team, their 7Ps figured out, and learned from previous mistakes.

URL: http://googleblog.blogspot.co.uk/2011/09/time-technology-and-leaping-seconds.html

2
0
Coat

Re: Leap seconds: not a one-off unique event

Why not add 2 seconds so we can skip the next one?

1
0
Silver badge
Thumb Down

Re: Leap seconds: not a one-off unique event

Easy... because the next big mega thrust earthquake will speed up the earth's rotation slightly meaning you'll be 3 seconds out

Or the difference between hitting the atmosphere of Mars at the right angle or missing completely

0
0
Silver badge
Linux

Re: Leap seconds: not a one-off unique event

I don't know why they hung. All of our Linux machines ran quite happily (as they have done fore years before including this event) using NTP and Trimble GPS for precise time-keeping.

It is not like the folk behind NTP don't know about this, it has been supported and documented for a long time:

http://www.eecis.udel.edu/~mills/leap.html

Sounds like something that was not tested during the last leap second event, but still, I fail to understand why it would take the system down for more than 1 second?

2
1
Anonymous Coward

Re: Leap seconds: not a one-off unique event

Looks like almost all Linux servers/desktops/embedded systems were NOT affected by this. Shows the strength of a heterogeneous system

1
0
Silver badge
Mushroom

Yeah...

Sod this. Red Hat 6 servers going haywire *simultaneously* from 02:00 CEST (Red Hat 5 holding though), with load average at around 150 or so. JBoss was gummed up, restart was not helping, everything dying a crawling slow death, everything waiting for some futex. The only thing I could think of was that mcelogd is triggered from cron at that time. Bad lead.

That was not a good night.

Who injects leap seconds on weekends, at night, during the Euro football??!?

2
2
Silver badge
Meh

Who injects leap seconds on weekends, at night, during the Euro football??!?

People who realize that doing it during the week is likely to cause even more business disruption.

As for your other two time references , it's always "at night" somewhere, and there's pretty much always a popular sporting event going on. So those references are moot.

2
1
Silver badge
Facepalm

Re: Who injects leap seconds on weekends, at night, during the Euro football??!?

> People who realize that doing it during the week is likely to cause even more business disruption.

Really I would like to see those people. Probably managers. Or freshmen.

Definitely not people who have to pay for the sysops to be called in pronto.

1
0
Silver badge

Re: Yeah...

OK, maybe it did bite me. I woke up on Sunday morning to find my main machine with all four cores at 100% and hitting 80C and tripping alarms. The main machine runs Mint 12 but it's running a VirtualBox instance of Centos 6. I wonder if the time bug caused it? I had to reboot the whole shebang, just restarting the VM didn't fix it.

0
0
Silver badge
Headmaster

Re: Yeah...

"Who injects leap seconds on weekends, at night, during the Euro football??!?"

Boffins who wear socks and sandals, have no concept of day, night or weekend (they work when it interests them), and who avoided the Euro football by more than 14400 seconds so you should count yourself lucky.

3
0
Bronze badge
FAIL

Re: Yeah...

"Who injects leap seconds on weekends, at night, during the Euro football??!?"

By definition, leap seconds are always added (or removed) at the end of the last day of December or the end of the last day of June.

0
0
Anonymous Coward

Re: Who injects leap seconds on weekends, at night, during the Euro football??!?

Conversely, who schedules a football championship or a weekend, during a leap second event?

The time of the day issue can be solved by temporarily moving to a different time zone.

Btw, am I the only sad git who looked for this line in his /var/log/messages?

Jun 30 23:59:59 refaim kernel: [215399.847017] Clock: inserting leap second 23:59:60 UTC

2
0
Silver badge
Thumb Down

Re: Who injects leap seconds on weekends, at night, during the Euro football??!?

Sysops aren't users or customers. Inconvenience is why they're paid. If you don't want to work weekend or late hours, IT support is not the field for you.

2
0
Silver badge
Joke

One second

When Linus was presenting the finger to nVidia, he was not being rude. He was just showing them the 1 second that he forget to account for in the kernel ........

2
0
Anonymous Coward

My Windows servers never missed a beat :-P

Oh come on, the Penguin fanbois never usually miss a chance to poke fun at MS stuffware! :-)

7
0
Silver badge
Trollface

Re: My Windows servers never missed a beat :-P

Were they too busy remote-installing Skype across your network? :-p

4
0
Anonymous Coward

Re: My Windows servers never missed a beat :-P

No, you see, I didn't leave the out of the box settings without changing them.

Also, I've had updates from RedHat knacker bits of my systems, I can't think of any software company who is perfect in this record.

0
0

Not just java...

MySQL was also hit by this bug. Specifically MySQL on Debian based servers (and Slackware).

Here's a fix (which would have been handy to know yesterday);

/etc/init.d/ntp stop;

date;

date `date +"%m%d%H%M%C%y.%S"`;

date;

/etc/init.d/ntp start

This solves the problem.

1
0
Anonymous Coward

Re: Not just java...

What problem?

My Linux server running MySQL sailed through the weekend with no problems at all..

2
1
Silver badge

Re: Not just java && Kernel bug?

OK I'll change that to:

Does it affect only machines running Oracle software...

1
1

Page:

This topic is closed for new posts.