# The time on Microsoft Azure will be: Different by a second, everywhere

Servers running Microsoft’s cloud will be briefly out of sync with each other and official time standards on June 30, as they implement the leap second. Microsoft has determined that clocks on tens of thousands of servers globally running Azure should switch to the leap second at midnight in the time zone where they are based …

1. #### There's 15 minutes difference between Nepal time and India time

Also, is there any difference at all between the way Amazon and Google handle the leap second? The descriptions look pretty much the same to me.

1. #### Re: There's 15 minutes difference between Nepal time and India time

The description of the Google method is confusing. It says they're slowing down their clocks and adding the second at the end. But surely, if they're just adding the second, they don't need to slow the clocks down first?

AWS are lengthening their seconds so the leap second will have been added in tiny proportions every second until it is back in sync with how it should be.

1. #### Re: There's 15 minutes difference between Nepal time and India time

The description is confusing - Google don't add the second at the end, the rest of the world do. Google don't add it at all. They do this because some operating systems don't support the extra second (23:59:60) and will either ignore it or repeat 23:59:59, which can cause problems.

Google's solution is to lengthen every second over a 20 hour period, at the end of which their clocks will be one second ahead of the rest of world following real time. When the rest of the world add the leap second, Google is now back in sync.

That is, while most people will have 86401 seconds in their leap day, Google's day will only have 86400 seconds (like a normal day), but those seconds will be 1.000014 real SI seconds long.

World: 00:00:00, 00:00:01 ... 23:59:58, 23:59:59, 23:59:60, 00:00:00, 00:00:01

Google: 00:00:00, 00:00:01 ... 23:59:57, 23:59:58, 23:59:59, 00:00:00, 00:00:01

Amazon's solution is the same except they do it over a 24hour period starting at midday. So Google will be back in sync at midnight, Amazon will be at midday July 1st.

2. #### Re: There's 15 minutes difference between Nepal time and India time

So Microsoft are the only ones doing it correctly and adding the leap second at midnight - which is when it is supposed to happen.

2. #### Obvious

"[T]he firm doesn’t foresee availability or reliability problems hitting Azure."

Well, of course they don't foresee any problems. You run experiments to determine if there are problems that you didn't think of. Some experiments are small. Some are somewhat larger--like using an international cloud service to check if you've done everything correctly.

Which is why Micro\$oft is Micro\$oft.

1. #### Re: Obvious

This is one of the best descriptions why M\$ is such a low quality company - they don't do the testing internally, they just test at the customer.

/Zane

1. #### Re: Obvious

They are thinking of the shareholders and profit. They don't need expensive testing teams, just let the customers do the beta test... as they have always done. Will there be a Service Pack 1 after this happens to fix everything?

2. #### Re: Obvious

How do you know that the reason they don't foresee any problems is because they have tested it?

Same comment could be made about Google or AWS, do we know whether or not they have tested their own ways of handling the leap second?

3. #### Re: Obvious

Well, it never has foreseen any problems, now has it ?

Yet, problems have cropped up.

Maybe nothing will happen - entirely possible that this one second thing is not actually an issue. After all, managing time synchronization across a WAN is an old problem which, I'm sure, has ample documentation. Then again, Apple somehow forgot the existence of time zones not so long ago.

So maybe they need to dust off their crystal ball ?

1. #### Re: Obvious

Foreseen problems, sure there are. Always, and sometimes they're big ones, it's just they don't go around shouting about them.

I get the feeling it's a fingers crossed ecoculture.

4. #### Re: Obvious

"Well, of course they don't foresee any problems. You run experiments to determine if there are problems that you didn't think of."

Like, oh I don't know, setting the clock on a Windows network so that it is a second out. I bet that's never happened before. Oh wait...

This is FUD. The algorithms for NTP are published and almost certainly the same ones as used by Windows Time service. They've been used to synchronise all sorts of rubbish clocks for several decades. (No PC has a decent real-time clock. You used to be able to spend a few hundred quid on an add-in card that did this but the market never took off because it cost so much less to buy a network adapter.) NTP works. If it didn't, networks and servers would collapse because of time sync problems on a daily basis. All sorts of other protocols (like Kerberos) depend on it working over the long term.

1. #### Re: Obvious

> The algorithms for NTP are published and almost certainly the same ones as used by Windows Time service.

Care to expound, please, what do you mean by "almost certainly"? In what aspects do the two systems differ and in what aspects are they the same?

1. #### Re: Obvious

" what do you mean by "almost certainly"? "

Easter Sailor gives a link below (https://support.microsoft.com/en-us/kb/939322) where MS explain that Windows Time uses SNTP and NTP but don't go the whole hog on the latter and so only manage synchronisation to within a second or two, which is sufficient to make Kerberos (and so Active Directory) work.

I was unaware of the link when I posted, but I'm familiar with the problem domain and NTP's solution. Since Windows has been doing time sync for a couple of decades and the NTP RFCs have been published and revised 4 or 5 times over the same period, and the Tier 1 clocks are maintained by national laboratories all around the planet, it seemed vanishingly unlikely that MS would ignore something that is no rocket science, is known to work, published in immense detail, and supported by an existing world-wide infrastructure.

1. #### Re: Obvious

> Easter Sailor gives a link [....]

Thanks for the clarification, Ken.

2. #### Re: Obvious

"The algorithms for NTP are published and almost certainly the same ones as used by Windows Time service.

Care to expound, please, what do you mean by "almost certainly"?"

The Windows Time service integrates NTP version 3 with algorithmic enhancements from NTP version 4.

2. #### Re: Obvious

Ken,

as I mentioned earlier and as explained in the Microsoft Knowledge Base, Windows does not have a full NTP implementation. The Windows Time Service is OK for a client PC, but it is not suitable for use in an NTP hierarchy with full NTP implementations at lower levels of the hierarchy. A full NTP implementation will adjust its clock frequency to bring itself in step with the higher strata time servers, and will monitor the stability of those servers. An SNTP implementation just jumps to match the higher strata server and does not adjust its own clock frequency. The Windows Timer Service is somewhere between the two protocols, but was only designed to be good enough for Kerberos.

https://support.microsoft.com/en-us/kb/939322

In my last company we had to ship proper NTP software with our products, because the standard Windows software was not good enough for our needs.

3. #### Does this mean I get an extra second in bed?

That would be a result!

1. #### Re: Does this mean I get an extra second in bed?

NO YOU LOSE A SECOND

4. NTP handles this correctly.

Most OS can handle it correctly as well, but from time to time (groan!) someone changes the time-handling code and then fails to test it on leap seconds and you get problems, like the Linux glitch a year or so ago.

You can get GPS simulators and create your own NTP servers that push out this sort of thing for testing, so its quite possible to do, but people don't. And the results are predictable. Of course, you also get programmers doing dumb thing to implement delays, etc, rather than using the proper OS calls, leading to more bugs.

I personally think they should step the second backwards and forwards every Wed for a couple of months - then we would get OS and application software tested and fixed. One can hope they would fix it...

1. Time and dates are tricky things to code. Everyone assumes they will be easy because as humans we regularly handle calculations involving time and dates without thinking about them but once you have to think about them they are hard. Which doesn't excuse not properly testing edge cases like leap seconds, 29th Feb, etc but does explain why there are so many bugs.

I like your idea of a fluid time Wednesday. It would separate the programmers from the brogrammers and blow stoner's minds everywhere. :)

2. > NTP handles this correctly.

How can you assert that any of the NTP ways (you do not say which specific scenario you are considering) is the correct way? Correct under what criteria?

I would only go as far as saying that a lot of thought has been given, by a lot of very clever people, to the way leap seconds (and other timing aspects) are handled in NTP, and that there are a number of well-documented pros and cons to the strategies used. I might even suggest that to my knowledge so far no other team have done better. But I would not affirm that the way it is done is "correct"--partly because I lack the expertise to make such a claim.

1. NTP handles leap seconds in the "correct way" as far as it is defined, in that it makes UTC follow its defined values. The problem in the more general sense is you have two concepts of time, you have:

(1) The UTC/Civil definition of days being 24 hours, of 60 minutes of 60 seconds always, along with a formulae for dates that make up the Gregorian calendar (lets keep quiet for now about other calendars).

(2) You also want for various reasons staying in approximate synchronisation with the solar time - i.e. that at, say, 0 longitude the 12:00 local is, on a yearly average, the time the sun is overhead.

Now the second is define these days with extreme precision, but the Earth's rotation is variable and, worst of all, not quite predictable due to stuff moving around inside as well as tidal friction, etc.

The correct way to do all of this, of course, already is known and implemented in some systems that really matter, and that is to have you clock keeping "atomic" time that has no discontinuities, and then to apply a leap-second correction to get "civil" time. See:

http://en.wikipedia.org/wiki/International_Atomic_Time

That is exactly how the GPS satellites do it, and their own GPS time was in sync with UTC in 1980 and is now 16 seconds different.

What is a problem for more software when it comes down to second "accuracy" is the most computer libraries are based on (1) and:

a) They don't quite know how to deal with the 59-second or 61-second minutes that happen when you get a second removed/added.

b) Also to perform the conversion to/from atomic time you need the offset values and as they have to be updated as the Earth's motion is observed, so it is hard to do correctly on anything stand-alone. You then would need internet access and the security problem that brings, and the grief caused when in a few years some web developer stupidly change URLs of important data for no obvious reason when tarting up sites.

Finally, there is a project (which I have not checked/tried yet) to give you a local NTP "fluid time Wednesday" effect here:

https://support.ntp.org/bin/view/Dev/LeapSecondTest

5. "The world is divided into 24 time zones"

No, I count 40 of them. One might expect more than 24 if some are separated by less than an hour.

1. Yes, plus there are several different ways to do daylight savings.

6. Square in the sky keeps on turnin'

I don't know where I'll be tomorrow

7. #### Root of the problem

Within a synchronized (using that word literally) system, there is no problem even if \$LOCAL_SECOND takes 3.14 UTC ISO validated kosher seconds. The problem comes, surely, when one is comparing timestamps between systems which think they're synchronized, but aren't. If that is so, then programming constructs such as IF (T1 >= T2) THEN GOSUB sumfin , where T1 and T2 are expressed in high-precision time units, but (for whatever reason) are referred to different time standards, are the ones that are at risk. This must, I imagine, represent a class of bug vulnerabilities right up there with memory allocation errors and buffer overruns. I'm sure there must be cloudy applications which will reference more than one of AWS-time, Azure-time, and UTC. Should be interesting.

1. #### Re: Root of the problem

" If that is so, then programming constructs such as IF (T1 >= T2) THEN GOSUB sumfin , where T1 and T2 are expressed in high-precision time units, "

It depends on the system call that was used to create T1 and T2. If you are serious about time handling then you will have to trace your usage all the way down to the bottom.

If you are writing in C on Linux and getting as close to the kernel as possible you'll only have to worry about something like a few dozen libc calls to the, at least, five different types of clock available (I'm only exaggerating a little.) Then there is the hardware - on a Rasp PI for example there is no hardware clock (or something - anyway something is missing clock related) which is why Raspian has ntpd setup from the outset.

Mathematicians famously start with a spherical cow, God only knows what a time obsessive assumes - perhaps a spherical second for all the sense the options allow.

1. #### Re: Root of the problem

Time-obsessives have two things:

1) Atomic time, which is precise and monotonic (the spherical cow).

2) Human/civil/UTC time that follows the Earth's rotation (upon which our concept of time and units were based). And there are differing degrees of the (look up UT1 & UT2 if you want to know more). This is your real cow, and equivalent choice of Frisian, Aberdeen Angus, etc...

2. #### Real "Root of the problem"

A more general problem with programmers is they use "clock time" as a substitute for "order of events".

This works well if all events are being recorded with consistent time stamps, say for conditional compilation on a local machine where you can check if the .o file in one location is older then your .c file in another, or when you pressed the "build" button in the GUI, etc.

Things break due to time faults: such as the same conditional process on a network file system where the time stamp of some files is due to the servers' clock, and others locally are from the client's clock which is different, or the file system's time resolution (e.g. 2 seconds on FAT32 as a worst case) is now greater than the interval between steps, etc.

Then we get in to all sorts of debates abut keeping leap seconds to work around dumb programming. But really what the programmers & software architects should be asking is down to the ACID database situation - how do you guarantee correct order of events in a process if the local clocks are not fully in sync?

8. Years ago, Windows systems had a problem with their clocks based (IIRC) on their tracking local time rather then GMT and applying the proper delta. Is this still the case?

1. I think so. I don't think there is an official reason, but two web links that seem pertinent are:

http://www.cl.cam.ac.uk/~mgk25/mswish/ut-rtc.html (especially the comments at the end)

http://blogs.msdn.com/b/oldnewthing/archive/2004/09/02/224672.aspx

My own opinion is that devices should speak TAI (i.e, SI seconds since an epoch) to each other and convert to local time (i.e., some text format, subject to an slowly-changing mixture of cultural, geographical and political influences) as best they can only when interfacing to the wretched human beings who insist on using it.

9. #### Why?

Why invite trouble by changing the time reference to match civil time? Surely the reference can continue to advance steadily at some agreed pace? Then all you need is a simple algorithm to derive civil time (for display) from the reference (with appropriate "jumps" at the leap seconds). I don't understand.

10. This was one of the reasons 64 bit linux went with nanoseconds resolution... easy to adjust and remain in sync...

11. #### Windows doesn't care about the odd second anyway

WIndows doesn't really care about the odd second as it does not have a full NTP implementation to achieve the required level of accuracy. If you want to have proper time you have to use a 3rd party port of the standard NTP software. This Knowledge Base article explains: https://support.microsoft.com/en-us/kb/939322

"The W32Time service cannot reliably maintain sync time to the range of one to two seconds. Such tolerances are outside the design specification of the W32Time service."

1. #### Re: Windows doesn't care about the odd second anyway

" you want to have proper time you have to use a 3rd party port of the standard NTP software"

NTP isn't that much better. For accurate time keeping you really need a PTPv2 (IEEE 1588-2008)implementation such as Domain Time II

1. #### Re: NTP isn't that much better

NTP is better than Windows SNTP by many order of magnitude

A typical Windows installation (thinking desktop here) has, by default, a time set once per week - so can be out by minutes at times. Even if you set the frequency to once per hour (registry setting) you are lucky to get better than 1 second.

NTP on a WAN typically give you accuracies of 10ms or better (so around 100 times improvement)

NTP on a LAN with decent time servers (e.g. machine with very good hardware clock or local GPS) gives you accuracies of the order of 0.1ms or better, so around 10k times better.

Often the question programmers should be asking is why am I using time, and is that actually the best way of determining order and sequence?

1. #### Re: NTP isn't that much better

"NTP on a WAN typically give you accuracies of 10ms or better (so around 100 times improvement)"

With NTP, asymmetric routes and network congestion can cause errors of 100 ms or more. As stated above for accurate time you need PTP. PTP eliminates Ethernet latency and jitter issues through hardware time stamping to cancel out a measured delay between nodes at the physical layer of the network. Accuracy in the range of 10 to 100 nanoseconds can be achieved.

1. #### Re: NTP isn't that much better

"PTP eliminates Ethernet latency and jitter issues through hardware time stamping"

So you have an irrelevant comparison: PTP can't work on a WAN, and on your LAN (without WiFi use or woeful congestion meaning you should upgrade your routers) you get sub-ms accuracy which is smaller than the time-slice for most software/OS task scheduling.

Also having asymmetric delays of 100ms or so is quite poor, you really ought to be using NTP sources that are 'closer' to your machine (in a network sense).

But returning to may main point made elsewhere, using time stamps which are *assumed* accurate to re-order data over a wide system is simple but also prone to clock error. Should programmers not be looking at other hand-shake and event counting methods to synchronise the *order* of events instead of trusting everyone's clock is always sufficiently close in time-keeping?

12. #### Cloud computing and time zones

Isn't the whole selling point of "cloud computing" that it doesn't matter where in the world you are and where in the world your data is? Except now your local and cloud clocks might not be synchronised properly for up to 24 hours depending on your timezone and wherever your data/cloud compter happens to be.

Maybe the UN should set up a committee to define a standard for adding leap seconds? ;-)

13. #### Feb 29th

Azure crashed because of a leap day three years ago? Drats, must have missed that one. Good for a laugh. I expect Microsoft to be confused by having a leap day in 2000 and not in 2100, but the one in 2012 was quite normal.

And yes, if you're using Cloud Computing (if you're using anything other than one time zone!) then use UTC.

1. #### Re: Feb 29th

I'm very thankful that America has timezones otherwise consider the mess that Microsoft would have created. Choosing local time over UTC was a bad enough mistake.

2. #### Re: Feb 29th

Yep - Azure crashed because internally the nodes communicate using internally-issued SSL certificates with a one year validity - so on Feb 29th, any node that got rebooted requested a certificate for itself with an expiry date of Feb 29th 2013. Of course, that doesn't exist, so the request failed. That meant the new VM failed to communicate with its host in time, so got rebooted; after a few cycles of that, their systems decided the hosts were faulty and tried resetting those. Which, of course, then tried to get themselves new SSL certificates to connect to the controller, which failed ...

14. #### Obviously the sensible solution would be...

... to have computers switch to "atomic time" which just moves forward smoothly without any leap seconds. The difference to local time would then be accounted the same way it's done for time zones.

1. #### Re: Obviously the sensible solution would be...

Congratulations, you have just re-invented the UNIX epoch.

1. #### Re: Obviously the sensible solution would be...

No, the UNIX time_t follows UTC and so is not able to perform correct time duration calculations over the leap-second period as there are discontinuities at those points.

1. #### Re: Obviously the sensible solution would be...

> No, the UNIX time_t follows UTC

You are right, Paul. I should know better than post at that late an hour. Thanks for the correction.

15. #### A typical engineering problem

Put very simply, "time" is used in two ways: to refer to the duration of an event (or an interval between two events), and to situate an event (an infinitesimal part thereof) in an ever but not constantly increasing scale.

"Time" may also refer to various physical manifestations which are not necessarily related to each other but which are all useful in one way or another. E.g., societies with a concept of civil time (all of us save for a handful of people in remote places) expect noon to be somewhere around the middle of the day, and midnight to be roughly what the name says, which conventions are tied to astronomical definitions which must somehow be reconciled with physical definitions relating to atomic energy level transitions, even though the two are totally unrelated. The interesting part is that getting one "right" means getting the other "wrong" to a certain extent. Striking an acceptable balance between the two within a certain environment and constraints is what I describe as an interesting engineering problem.

1. #### Re: A typical engineering problem

Ah, the ancient Chinese engineer's curse: "May you work on interesting problems.".

16. #### Look before you leap!

I was dealing with this stuff over 30 years ago, and every time there was something like this, many systems (not mine, fortunately) had serious problems! It isn't the system time that is the problem. It is how applications are programmed to deal with these changes. They need to be designed to deal with these changes so that they continue to do the right thing. If a leap second issue requires a clock to fall back to midnight from 12:01, then applications that have critical events to process at midnight need to know that they already did them, and not do them again when the clock falls back, etc, etc. These are the extreme edge cases that catch a lot of programmers flat-footed.

## POST COMMENT House rules

Not a member of The Register? Create a new account here.