Anonymous Coward
Anonymous Coward

Re: Dates

"What you and I call "January" is "gennaio" in Italy, and "enero" in Spain [...]"

I have a bit of code that analyses web page postings' text for dates and times. At the last count there were about 20 different analysis functions - each of which had several variants on a particular theme.

It also tries to recognise month names - and their common abbreviations - in about twenty languages. It tries to avoid red herrings with words like "may" in the narrative part of the text. It limits the languages to English plus the native language(s) of the performer in order to narrow the choices on their pages.

It also recognises days of the week by name - and their common abbreviations - in the different languages. These are used to generate a year when it is not explicit.

The day of week name is also used to try to differentiate mm/dd and dd/mm when they are both less than 13. It still ends up flagging ambiguities when it turns out that the two dates - within scope - fall on the same day of the week. You can't apply a "which country" filter - as a European may list USA tour dates in US format.

Times are equally tricky - especially when something like "8:45 uhr" sometimes can be morning or evening. Then there is the "7h" type of notation for 19:00.

It all seemed so simple when I started - and then the variations started to crawl out of the woodwork. Some people manage to use several different variants in different postings.

The tricky ones are when the dated entries omit the year. You may then be expected to assume that there is a chronological order that will span year ends. The order of course can be latest first or earliest first.

One person published their list of performance dates in the random order they received the bookings - without any year indicator. They also didn't remove old entries.

