Nice thoughts, but already implemented in InfoCodex.
Thanks for the interesting article. Once again IBM is giving us a great vision about the future and how unstructured information can be searched.
InfoCodex already does all this today with the help of a linguistical database and synonym and/or similarity search across 5 languages (German, French, Italian, English and Spanish). With InfoCodex you can search for a block of text in one language and it will find you all the similar documents in the other languages as well. All of this is done without one single minute of training - because of the linguistical database that contains 2.9 Mio words and terms (i.e. "European Court of Justice" or "The President of the United States" are terms and reconized as such).
See the following links: