I'm always a little nervous about the idea of a search engine as the solution to the tide of "unstructured data" we're all drowning in. For a start, most of it isn't really unstructured - show me an unstructured email invoice and I'll show you something that is useless because you aren't sure who it came from and what it applies …
"Je ne cherche pas, je trouve"
"Je ne cherche. pas, je trouve" The foregoing quote by Pablo Piccaso goes a long way to measuring the underpinnings of any search. When we search we search by definition for what we can find. While searching for what we recognize might seem trivial, it's really very powerful. Very nearly every question is informed and literally begs it's answer. Students who learn to carefully read exam questions discover a Royal Road to good grades as they give the examiner what it is the examiner is looking for. Simple, n'est pas?
The first key to a successful search is to refine as much as possible what is being sought. The more knowledgeable the searcher the more rewarding the results. Even a negative search result can be more informing than a positive search result when the search parameters are well defined and the results are read by an informed reader.
The tricksy part is searching for we know not what. When a broad uninformed search is undertaken the full power of the associational cortex comes into play along with everything else including what it is our neighbours tell us worked for them. For example, if I'm searching for results in an area I'm weak in, like the one I'm commenting on, I throw in a .pdf extension parameter because I know pdf files are widely used in academia. Otherwise a weak search is a pot luck affair and as likely to snag on your wife's maiden name as anything else.
Informed searches by informed searchers make us all look like Piccassos and Picasso wouldn't need no damn search engine, or , even a computer given his thoughts that computers are much good for anything as all they can give is answers. Uninformed searches by uninformed searches can't be helped much by any software solution except where the software is "taught" the searcher's preferences. So the searcher searches the search software searching for searching solutions for .... well you know how it goes.
Its great you are discussing the concept of enterprise search
Companies like Fast Search and Transfer fastsearch.com have been selling turn-key and custom solutions (because meta-data can be complicated) for years. Their search technology scales across clusters well so it supports billions of documents. They have access controls to securely separate internal corporate groups, dozens of database and document management connectors, navigators, customized meta-data/field extraction and processing, real time indexing of new documents, and data cleansing. On top of that, they have a rather complex query language that seriously fixes any search constraint problems.
Autonomy I have heard also sells similarly 'enterprise complete' stuff, but I am less familiar with their products.
And I agree with you, it has limited application. It isn't like SQL with direct/hard field relations and likely isn't useful to build an accounting system on top of. However, it provides for quick complex searches and answers that are more human and is something humans in a company need.
Admittedly many companies, like the ones you discuss, don't have complete solutions. But it is already being done even if Google's product isn't anywhere near an enterprise/government solution(no large document counts, no custom meta-data or document type extraction/parsing).
In addition, most of the core search features you discuss here have all been embedded in various enterprise content management systems such as those produced by EMC.
- Opportunity selfie: Martian winds have given the spunky ol' rover a spring cleaning
- Spanish village called 'Kill the Jews' mulls rebranding exercise
- NASA finds first Earth-sized planet in a habitable zone around star
- New Facebook phone app allows you to stalk your mates
- Reddit users discover iOS malware threat