What you call “model” I call "profile".
1. “Training” your data by dictionary definitions you create in each profile a huge number of truly significant patterns, which very accurately determine its semantic orientation. For this "training" must remove from the profile what I call "lexical noise", right at the stage of parsing (preliminary preparation) of the data. This deletion ensures that each profile can both be found and itself find only a narrow circle of other profiles.
(I wrote in my patent: "Such lexical noise is typically superfluous predicative definitions that do not explain the central themes contained within the digital textual information and, accordingly, removal of such noise often results in an improvement in the quality of the structured data." In another I wrote: "If Compatibility=100% - most likely only absolutely identical paragraphs/passages can be found. If Compatibility=0% - all paragraphs/passages that have even one same word and/or predicative definition are found. )
Then the presence or absence of some address matters only in combination with a variety of other patterns, since they will allow or not to overcome the compatibility threshold necessary for either receiving or transmitting information.
Therefore in order for compatibility to really work is necessary to remove all lexical noise, which is impossible without a high-quality dictionary.
2. If you train your date by "other data" and not by the high-quality dictionary then most likely you will not be able to remove its lexical noise. Indeed, this "other data" plays the role of a dictionary, defining parts of speech and the meaning of the words of your data. That is, you must be sure that the "other data" is able to adequately do it.
And now please explain me why spend time and a huge number of resources on the creation a new dictionary when there is the old and proven high-quality? Only because you don't want to pay me?
3. There is a sentence "Alice and Greg swim with joy." If a system doesn't see each word's part of speech, then the word "joy" can be taken as a noun (name) "Joy", resulting in erroneous patterns when parsing the sentence.
For instance, if the word "joy" is a noun-name, then these patterns appear:
- Alice swims
- Greg's swims
- joy swims.
If the word "joy" - an adjective, then these:
- Alice swims with joy
- Greg swims with joy.
You what may or may not happen, some system see the words 'joy" and "Joy" the same - try to type 'ilya geller" and "Ilya Geller" in Google?