GIGO
A lot of comms links' up/down times are measured from a log of SNMP polled events. When we looked closely at some anomalies we began to realise that market leader products mistakenly presented these SNMP events in aggregated SLA reports as unambiguous up/down states.
For example: a link up or down state was often only represented at SNMP level by a "changed" indicator - from which the reporting application deduced an up or down event. Repeated change events within a few minutes led to an incorrect deduction about the state of the link at that point.
When there were multiple resilience links the matter became even more complicated. Contractual outages were being reported when in fact the end to end path was unbroken.
It took us some thought and effort to write our own link break interpretation algorithm. Some unstable line conditions required a human to make an educated assessment of what had probably happened. The reporting application supplier wasn't really interested - apparently on the basis of "no one else is complaining".
We tried to be scrupulously honest about the interpretations so that the reported SLA for an end to end link was as near to the probable truth as possible.
This was only a few years ago - it would be interesting to know if the standard SNMP signalling and interpretation have overcome these deficiencies.