Shares in United Airlines crashed last month, and trading in the stock was suspended, after a glitch in the Google Matrix. The FT noted that "a false report that the carrier had returned to bankruptcy court surfaced on the internet. A six-year-old Chicago Tribune story on United’s 2002 bankruptcy filing, spotted on a Google …
No Evil Just Carelessness
I found this before and it's really annoying while it's not googles fault that every one is stupid they could timestamp or search dates on pages and display them so i that people don't make this mistake or if they do it's not googles fault it's one of the things that annoy me in searches.
although admittedly i checked the date on the srticles as i researching for my hons project and thats what these people should do.
Google can't fix this problem
Because the newspaper changed the URL Google rescanned the page. If it had put the scan date on the results it would simply encourage people to believe that the page was new. The proper solution is for all news reports to contain a prominent date and time visible even when a long page is scrolled.
Beaten to it
I was gonna point out that google can only go by the date reported to it, and in the quoted situation the content publisher had modified the document (by moving it) so was reporting a more current date.
But then, someone else beat me to it.
I agree that it would have made the original story more credible if google had said "this information was correct at x/y/z aa:bb" - as it was it was down to the journo to check the date on the actual story, which I believe was missing... not google's fault, and google don't have the information to correct the problem.
Everyone's fault except Google!
I see the Google fanboys have found this.
@Kevin - try RTFA.
If Altavista could offer search-by-date in its main search index 12 years ago, why can't Google offer search-by-date today?
They can, obviously - but they don't want to.
Pardon me SJ but I don't even use Google
So Alta Vista offered search-by-date. Date of what? Date that Alta Vista scanned the page? All the dates that that URL was scanned? Did they read the page looking for things that looked like dates and report those. Sounds like some serious AI was involved which I doubt they had. So Alta Vista would have been able to report just the information that the ignorant journalist thought he had: the apparent date of the article just as Google could have but didn't. But of course it would simply be the date that the spider scanned the page which because the newspaper changed the URL was recent.
As for RTFA, I did read it and noted that it was an incomplete and partial rendering of a news report that I had already read. The original had the vital information that the newspaper had altered the URL causing the spider to think it had found a new page.
I think it was elsewhere on El Reg (perhaps Google can find it for you).
Kevin Whitehead - Do you really think "serious AI" is required to find a date in a text file in the format "dd Month yyyy" and for the searchbot to record the time of indexing? Really?
Can I interest you in some attractive Mortgage Backed Securities?
Most of this article seems to have gone right over your head - my guess is you were speed reading, not thinking, and got confused, but decided to comment anyway. Instead of trying to break the world speed record for typing, why not have a look?
There is no way ...
... to tell precise and without guessing date of a story, unless such date was provided together with the story. And by story, I mean not only newspaper article, but also recent updates, fomatting changes, blog entries, comments underneath, etc. There is whole lot of stuff published in the form of HTML, and even more sent over HTTP (XML, images and the like). And very few, if any, of these protocols or formats cater for such a detail as "date published".
similar problem on BBC website
The New Scientist recently brought to light a method of gaming the BBC's "most mailed" sidebar, which could, conceivably, bring about a similar cascade of panic selling. It seems that not many people use the "email this story" feature, and the reader who brought it to their [NS's] attention reported that they only had to email themselves a story 5 times before seeing it climb to #4 in the "most mailed" rankings. Of course, the Beeb dates all stories, so pushing an old story about losses in a company should be spotted as "old news is old", but you never can tell.
I'm not sure if the Beeb has done anything to fix this since it came to light, but it does show that even the most trusted websites are vulnerable to the odd cock-up (Google's case) and even long-standing vulnerability to being gamed (BBC).
Speaking of such matters, this reminds me to ask about what's the story with The Register's "most commented" panel. On any given day I can find many more stories with more comments than those listed in the side panel. Is this behaviour "broken as designed?"
@Dr Stephen Jones
'Do you really think "serious AI" is required to find a date in a text file in the format "dd Month yyyy" and for the searchbot to record the time of indexing? Really?'
You would have to work out what the date(s) in the text actually are. Since a given article can reference more than one date and indeed reference dates in the past, present and future it will be impossible to determine what any given date in the article is.
And that's before you start talking about different date formats (not everyone will use the same format, let alone language).
Let alone the fact that (gasp) different news services will lay out the information in a different way.
It is impossible to intuit this sort of information from what is essentially a free-format file. All they really have to go on is the timestamp information.
put out the date it first was indexed.... they could also do a time-machine style interface, where you can push into the search based on time, maybe with something as simple as a slider bar control or timeframe links down the right hand side, between content and ads.
I was after something like this only a few weeks ago when I was looking for information on some .net controls... info from years ago was popping up in the search, very frustrating.
Thank you AC
Saved me the trouble of replying. Mind you I see that SJ is referring to someone with a quite different name so perhaps I didn't need to reply after all.
Fanboys apologise for Google
@AC: "It is impossible to intuit this sort of information from what is essentially a free-format file."
It's quite evidently NOT impossible, because somehow AltaVista manages to do this for both general web pages and news stories. Google doesn't. Got it yet?
Even for pages with no date or time stamp, because search engines crawl the news sites so frequently, they have the time of publication to within a few hours (or however often the robots visit).
It's as well that you chose to remain anonymous, because I don't think anyone could recommend hiring you for any software development work if you find simple exercises so difficult. Or are you just trying to talk up your hourly rate?
"Ooh, dunno. I might need AI to work out where the date field is on this page! That's another £1000 a day, mate"
Good article, by the way - even if flies over the fanboys heads.
So I went to altavista.com and searched for "Man Stabbed" to bring up news articles. Most of the results don't have any date. 2 did, out of the 30 I looked at, and that was part of the excract from the page, so would likely also be in the similar extract that Google does.
It wasn't stamped by Alta Vista.
What are you talking about?
I then did the same search on Google and most of the first page had dates in the extract.
You sound like an Alta Vista fan boy.
@Fanboys apologise for Google
Alta Vista does NOT do what you say it does. The search option (see below for quote from documentation) looks for pages created OR MODIFIED withing the given date range. It also adds a warning about the accuracy of dates.
Perhaps it is you who should RTFM?
So, the only place to get data is XML tags in the feed or the raw data of the article. As detailed above, neither of these will work.
Oh, and I post as AC to avoid trouble at work, and far from finding "simple exercises so difficult", I am too experienced to assume that the underlying problem is simple just because it sounds simple.
From the Alta Vista documentation:
This feature enables you to find Web pages that were created or modified in a time range that you can specify. Please keep in mind that some Web servers do not provide accurate modification dates for the pages they host.
- Select a time range from the menu or
- Type a date range into the from/to boxes.
Why not date search?
SJ thanks for the link. I knew AV could do date searches but had forgotten it was still around.
For fanboys and fangirls try this: http://tinyurl.com/47epcr
It returns every web page with "Russia" and "Sarah Palin" before August 28. No, it's not perfect (it's Altavista FFS) - but much better than a regular search which is swamped with eight billion articles on "I can see Russia from my window."
Even a poor date-delimited search is better than nothing. Why won't Google do this?
First the Google fanboys say date search is "impossible" and/or requires Artificial Intelligence - quite hilarious. Then the fanboys admit that it's possible, but isn't very good, so why bother?
Maybe they will eventually get round to RTFA and asking why Google omits this useful service. But I'm not holding my breath.
No, Google should NOT be Pretending to know this detail
It is the newspaper company which provided the article which needs to provide the date. It is the user who bears responsibility for looking at the date. Google can not attempt to parse the article for the date. Misrepresenting the date, for whatever reason, would be worse than leaving the responsibility where it belongs: with the reader and with the provider.
This is a non-story from Google's perspective. The issue is with irresponsible actions on the part of the reader.
Simple solution (well, fairly)
Google tells all news sites to include a specific date tag in the text, or they don't get indexed.
The thought of losing revenue will scare them into submission!
@Dr Steven James
Keith Whitehand and the other AC are correct. You are wrong.
If Google did display the "published" date of the article, the mistake would have been *compounded*.
As Kenneth Blackfoot rightly pointed out:
The. URL. Changed.
Just in case you don't get it, this means that the story (from 2002) first appeared at this URL *in 2008*. So as far as Google is concerned it's a story published *in 2008*.
Altavista's search by date gets this equally wrong, which is why they have opted to keep it but add a disclaimer, and why Google has opted to leave it out instead of having to add a disclaimer.
In case you still don't get it, try searching Altavista for "United Airlines files for bankruptcy" and limit your search to articles from 2008... and then tell us all when the top result was published?
for such a lively debate, boys. And, er, don't you have any work to get on with? I know I do - and I'm retired.
work takes a back seat
when someone is Wrong on the Internet...