The old ones are the good ones!
Boffins have put together a new computer system that attempts to translate protolanguages, the ancient "parent" tongues from which modern languages evolved. The sophisticated Rosetta Stone-like system can quickly reconstruct the languages of yore from today's vocabularies with 85 per cent accuracy, we're told. The system's …
The old ones are the good ones!
Yeah, but where's the comment about spoken language being composed of simple sounds like ooo, ahhh, mmmm and others....
The old ones are the good ones!
I'm so glad that the article wasn't about a really clever bunch of pygmies. Thank Heaven for small mercies, I say.
That's pretty much the entire vocabulary of Dorset..
Oooh Aaah, Tra'or Comboin 'Aaarster
Really? like C'thulhu?
I thought the trick was with "cunning stunt".
I'm sure they will have based their computer model on known protolanguages - is Latin really the only one? - and known daughter languages, and assumed that the process is the same for all languages. Languages do reliably change over time and linguists can categorise languages rather as biologists can categorise a particular species.
Try also Hebrew (up to 4500 years), a couple of types of Cuneiform/clay tablet stuff (Babylonian and pre-Sumerian, up to maybe 7,000 for earliest), two sorts of Egyptian (one of which in a different script turns out to be preserved by the Copts) , Greek/Linear B (up to 3,200 years ago?), and maybe something Scandinavian? Loads of Linear A clay tablets (3,800 years ago) but no-one has any idea what they say.
I think this tool is a good development. If it helps crack Linear A I'll be impressed.
There are also Chinese, Sanskrit (and other Indus valley/Indian?) Korean and Thai related stuff, no idea how old but the Chinese I believe had paper, gunpowder, Pasta and a Monetary system based on Government guarantee rather than rare shiny gems and metal when Europe was in Wattle and Mud huts using physical cows as currency, if at all.
South & Central America?
There is more really old stuff preserved because when you sack a city or library using clay tablets it makes them more durable. Parchment, Vellum, Paper and Papyrus was rather lacking in archival qualities in comparison (but likely can beat DVD & Tape backups by a thousand years or two). Doing the Library of Alexandria without a "backup" plan was stunningly bad. They WOULD have known the need for Off Site backups, but if they existed/exist it's a well kept secret.
Mines the one with the "Story of Writing" in the pocket.
(Babylonian and pre-Sumerian, up to maybe 7,000 for earliest),
But the earth was only created 6000 years ago.
Knowing what the markings on clay tablets mean is not the same as knowing how they're spoken. Languages tend to evolve by being spoken. Latin is used as the example because it is one of the few languages where we have vast collections of material detailing its evolution and we still know how to speak it.
I told you to stop reading fairy tales!!
I thought it was 4,500.
I'm sure that's what I was taught 50 years ago.
Don't forget the "Click languages" of the Bushmen.
They came here on the Giant Space Ark.
The sophisticated Rosetta Stone-like system can quickly reconstruct the languages of yore from today's vocabularies with 85 per cent accuracy, beating human linguists' painstaking manual reconstruction from the words we all know and use.
I would be interested to see the actual paper, when it is published if only to see the breakdown and analysis on which this statement is based. I cannot imagine that linguists do too much "manual" work in this area as there are software tools for almost every intellectual pursuit. This looks to be useful as a way to speed research along, but not to replace the human element in much the same way translation apps can get you pointed in the right direction, but still produce occasional howlers. I would guess that it will still be necessary to review 100% of the output to weed out and correct the 15% wrong, and to verify the 85% correct. It is unlikely to speed the process by 85%.
> I cannot imagine that linguists do too much "manual" work in this area as there are software tools for almost every intellectual pursuit.
Reconstructing ancient languages requires linguists to compare vocabulary items (including related words, homonyms etc.) for multiple languages and multiple time periods, so there's actually a huge amount of "manual" effort involved. Also, consider that people tend to specialise in selected languages or language families and might not have the resources to research multiple other languages, so it seems like a natural use for computers. As long as they manage to digitise and properly process all the required data, retrieving the entire historical evolution of more or less any word in any documented language (which you don't even need to be familiar with), presumably with cross-linking and related items, will be a huge aid to dictionary-wielding researchers.
Also, bear in mind that all such reconstruction results are ultimately hypothetical, so the more languages can be referenced, the more plausible the end result. In any case, exciting stuff for the cunning ones.
" study changes in pronunciation, among other techniques"
Genuine question - how can they know anything at all about how words were pronounced hundred+ years ago (any time before development of phonograph) simply from the written text?
I'm not sure, but the article references "proto-languages," so I suspect they are making links between languages where we have no evidence. So, language X has some similarities to language Y so we'll see if we can interpolate a language between them (on the assumption they had a common ancestor).
One way that is used sometimes is through poetry - if you look at pieces of Shakespeare where there is a strict rhyming scheme you'll notice that some words just don't rhyme. Given that Shakespeare knew what he was doing when he was writing poetry, there's a good chance those words did rhyme as they were pronounced at the time, though which way the pronunciation has changed may be ambiguous.
Similarly a pune ( or play on words ) will tend to work with homonyms, so when writers of the past are playful about language they can hand useful titbits to linguists of the future.
There are certain astonishingly accurate and universal laws that govern the changes in pronunciation through time. My flabber was well and truly gasted when I first stumbled upon this concept, but they do seem to work well. Google Grimm's law, and I'm sure you'll be amazed.
Another method is to compare isolated pockets of a given language. As the main population moves on, the remote group tends to remain somewhat locked in place, at least for a while. See, for example, the many discussions concerning "American" versus "English" languages. I can think of several apt comparisons to biological evolution, including the way this tool might work in reconstructing a language from modern remnants.
We can know how Latin was pronounced by reading descriptions in Latin of how it's pronounced, and can also get useful information from poetry - some pronunciations work better than others to fit the metre, for example.
The paper was published in the Proceedings of the national Academy of Sciences. The Register does not allow URLs, so repalce the obvious words below with the appropriate punctuation marks:
http colon slash slash www.pnas.org slash content slash early slash 2013/02/05 slash 1204678110
for the abstract and a pointer to the complete PDF.
The word "protolanguage" is a term of art in historical linguistics, in use in English for more than a century. It has a precise definition:
A protolanguage is the reconstruction of a prior stage of a group of related languages, providing prototype forms from which cognates in the different languages of the group can be derived by the application of regular rules of change. The protolanguage is constructed recursively, with more rules being added to the set of known changes as more data is added (from further investigations among the known languages, or from the addition of previously unrecognized or unknown members of the group.)
To provide a concrete example: It was recognized in the 18th Century (codified in a 1786 state by Sir William Jones) that Sanskrit was clearly related to Greek and Latin; Jones noted the probability that Celtic, Germanic and Persian were also related. Early in the 19th Century, Jacob Grimm noted the set of regular correspondences that confirm the Germanic relationship. (We call this set "Grimm's Law".)
In the 19th Century, Baltic and Slavic were added to the list early, with the recognition that Albanian and Armenian were also part of the group coming later. In 1876, Danish linguist Karl Verner explained some apparent exceptions to Grimm's Law as correlating with the Sanskrit accent. In 1879, a Swiss linguist named Ferdinand de Saussure posited some consonants in the reconstructed protolanguage based only on indirect evidence in the attested daughters.
In the very early 20th Century, 2 languages of Chinese Turkestan (as the area was then known) were discovered in Buddhist scriptures in caves. They were written in a script derived from an old Indian source, and were determined to be members of the Indo-European group we have been discussing.
In 1917, a Czech linguist, Friedrich (Bedřich) Hrozný, published a monograph demonstrating that an language from central Turkey, written in cuneiform and intermixed with Sumerian and Akkadian signs (not unlike the use of Chinese characters to write Japanese), was another unknown Indo-European language which we call Hittite. This claim was confirmed in 1927 by a Polish linguist, Jerzy Kuryłowicz, who pointed out that certain h-like consonants in the Hittite data occupied the places that Saussure had hypothesized in 1879, nearly 50 years earlier.
The last 80 years in Indo-European studies have, to a large extent, been a reassessment of the previous 150 years' research based on these discoveries, with deniers as well as accepters of the changes required by the data. The same techniques have been applied to dozens of language families, large (Afroasiatic, Austronesian) and small (Muskogean, Miwokan). One experiment was to take the data of the modern Romance languages and reconstruct a protolanguage for them, which turns out to be very similar to but more importantly not exactly the same as Latin, a proof that there are limits to as well as benefits from our reconstruction techniques.
.....my heartiest contrafribularites at the completion of this noble work.
I am anaspeptic, frasmotic and compunctuous at your wonderful reply.
Damn! My pandigestory interlude just evacuated my nose. You owe me a new keyboard, sir!
...this usable on the Winter Queen?
I fear few will get your reference to the Oglaf Snow Queen :)
(for those looking this up, be warned that it is very much Not Suitable For Work but worth it :) ).
Deth'nitely wuth it.
If it 'knows' the processes and patterns by which spoken language changes, can that process be applied to modern English (compared with older and ancient forms) in order to predict how it will be spoken in, say, 300 years time?
I realise that the modern world has a massive level of cultural exchange and pressure for language change than any previous natural process. It would be interesting to try it though.
My expectation was that it doesn't "know" so much as apply statistical analysis to large amounts of data- the Big Data version of knowledge. In that case it may not be as much use for projecting into the future, especially as many changes to the way language is used come from the description or use of new social, technological or environmental conditions. When we can predict those, things will have reached a very curious place.
Yes, it can.
With the same level of accuracy, which is to say "fairly low".
perhaps you are looking at version 0.01A (alpha release) of the all singing universal translator ... Check the ingredients list for 'just add Google live translate' and you can speak to Mrs Miggins at Ye Olde Pie Shoppe ...
Many scholars have tried and failed to crack Etruscan. There's no Rosetta stone. If this program can crack it, then it's a major advance. If not ....
Have they tested it on Basque, absent any help from a speaker of that language? Again that would be an acid-test. Banque is one of the world's anomalous languages, not related to any other in any known way.
You're missing the point. This program is not meant to decode the meaning of another language, it's supposed (as far as I can tell, anyway) to simulate the evolution of languages in reverse. This won't work on Basque or Etruscan or other language isolates, because they have no common ancestor, or at least none that is attested.
It can, on the other hand, simulate the common ancestor of, e.g., the English hound and the German Hund, but do so far more rapidly, if less accurately, than humans, basically helping us to develop a phylogenetic tree for all related languages and even tentatively reconstruct words from related languages that are now extinct.
[Basque] is one of the world's anomalous languages, not related to any other in any known way.
Euskara, the Basque language, may be related to other extinct languages in that part of Europe, such as Iberian; and it may be related to some Northern Caucasian languages such as Chechen. (There's a theory that Euskara and the Northern Caucasian languages are both related to the Na-Dené languages, but that is controversial, to put it mildly. Under this theory, as I understand it, a lost protolanguage in Asia would be a common ancestor for Euskara, the Caucasian members, and the Na-Dené members of the superfamily, via waves of migration at least six thousand years ago.)
It's not quite accurate to say it's "not related to any other in any known way". It could be a true isolate, but there are various possibilities for distant relatives.
As others noted, though, Euskara isn't a candidate for the kind of work being discussed in this article. Presumably you could try to use this model to reconstruct a hypothetical protolanguage using Euskara and some of the candidate relatives, but the connections are very tenuous and it's hard to see how the results would be useful. Typically, you want to reconstruct a protolanguage either to understand the development of and relationships among its descendants, or to help in translating another descendant. Neither would apply with a hypothetical protolanguage created from such distant relations.
But could it transcribe a conversation between a person from Norfolk, a person from Newcastle upon Tyne and a Glaswegian?
Pardon me my good man, I can't understand you.
Pardon me my good man, I can't understand you.
Pardon me my good man, I can't understand you.
Said at increased volume and reduced speed each time to assist Johnny Foreigner's understanding
Mines the one with a History of Empire in the pocket
Thought the comments section would be the usual witless avalance of rage at dumb boffins who obviously don't know what they're doing, but not at all. What a nice surprise.
@James Micallef: I'm no linguist but I've read up on it a long while ago. There's been a lot of work done <http://books.google.co.uk/books?id=ONY_EVd9zNYC>.
Didn't we already see this in Prometheus? Not that it did poor David much good, as I recall.
whatever info you feed the program, the answer either comes out:-
One ring to rule them all, one ring to bind them
... just wondered... ok... leaving now :(
Who cares what ancient languages sounded like? If they were as sonorous or all-round nifty as English we'd still be hearing them. Clearly irrelevant.
Biting the hand that feeds IT © 1998–2017