"There were no differences in initial conditions or in processing methods. The same starting conditions and same processing methods should lead to the same results"
Errrrrr, no. The initial conditions may be the same, but looking at the (not yet peer reviewed paper) shows that the model was run using a number of different compilers at a number of different optimisation levels using a number of different MPI implementations. All of these can lead to differences in operations as simple as adding up a list of numbers, and thus over time the result of the calculations may diverge. Hell, depending on the way the MPI (and possibly OpenMP) is implemented it's perfectly possible to get differing results from two runs on the same machine with the same executable.
And that's even before I consider whether the machine itself is strictly IEEE 754 compliant (maybe, probably not if you look at the grubby details), whether the code is being is being compiled to exploit the machine is a IEEE 754 manner (probably not in all cases), and whether you can expect all complier/library combinations used by the code can be expected to give identical answers in every single case (probably not). And then there's comparing runs done on differening numbers of cores. Et cetera, et cetera, et cetera ...
This is usual behavior. And what is being presented looks like (I haven't read the paper in detail) an reasonable attempt at trying to quantify the observed divergences.