Sunday, February 15, 2004

In Search of the Ur-Language

As surprising as it may seem, it is a fact that the great majority of the languages in existence today belong only to a very few families: Afro-Asiatic, Austronesian, Indo-European, Niger-Congo, Nilo-Saharan and Sino-Tibetan being the most important of these (for a more extensive list, see this page). The spectacular success linguists have had in classifying the vast numbers of languages in existence into only a few clusters has prompted many to speculate that even these large groupings may yet be collected further into super-families, or, more ambitiously still, that a single ur-language spoken by the first men, and ancestral to all those in use today, might one day be recoverable, at least to a certain extent. I for one find such speculations implausible.

Languages are to a great extent like species in the way they fission and diverge from each other, and just like biological species, it is often difficult to tell precisely when a single language has really given birth to two truly different entities. Again, like species, two communities that speak a single language may become geographically isolated, and as time goes by and the two communities undergo random drift, whether in genes or in vocabulary and grammar, it may transpire that a point is reached when enough differentiation has occurred to set up an impregnable barrier to any future interchange, even if the original obstacles that separated the two communities were to disappear. Finally, in an analogous manner to that with living species, one tends to find with languages that the more central a feature tends to be, the less likely it is to undergo drift, explaining why it is precisely those words whose usage is central to the lives of the speakers, like mother (mutter (German), mater (Latin), matr (Sanskrit)) or two (zwei (German), deux (French), duo (Latin), dwo (Sanskrit)), that are best conserved with the passage of time. In light of the close resemblance between the process of speciation and linguistic differentiation, one might expect that the tools employed in tracing ancestry in one field would prove useful in another - as indeed has turned out to be the case. The great classification successes enjoyed by linguists, including controversial figures like Joseph Greenberg, has rested heavily on techniques that would be familiar to any disciple of cladistics.

Now, we know that biologists have enjoyed breathtaking successes in classifying all the life-forms on our planet into a single tree, with only a few major points of dissension left outstanding, and we also have some reason to believe that human language, whenever it might have arisen, did so only once, rather than on several independent occasions. As such, it might seem reasonable to expect that linguistics will meet with the same sort of success that biologists already have, but this is where it pays to keep sight of the ways in which the analogy between biological and linguistic evolution break down. There are considerations that would lead one to expect the task to be simpler for the linguists - for one thing, languages, unlike many living creatures, don't undergo sexual reproduction. At the same time, there are difficulties to consider that outweigh such advantages, not the least of which are the rapidity with which languages change by comparison to most species of multicellular animals, and the sheer amount of cross-family borrowing that occurs between languages - English, with more than half of its vocabulary of Romance origin, being one egregious example, with the heavily sinicized Japanese and the many English pidgins of the world providing more instances of this phenomenon.

What all of this amounts to is that as we go back further in time, it gets ever harder to tell genuine similarities between languages apart from resemblances that have arisen purely by coincidence. Even when we confine ourselves to languages we have good reason to believe are related, the difficulties remain very great. Consider, for instance, the following two translations of the first two articles of the the UN Declaration of Human Rights, the first from Edo, the language of the people of the old kingdom of Benin, and the second from Yorùbá, spoken by peoples who lived right next to the Edo, and with whom there has been a tremendous amount of cultural interchange. First the Edo version:

Emwen Nokaro.
Emwan ne agbon hia ne a biere, a bie iran noyan-egbe iran kevbe wee, umwon-mwen o ree etin hia ne o kheke iran khin. A ye ewaen kevbe ekhoe ne o maa wu iran, ne iran gha yin da egbe vbe orhion oghe eten-okpa.

Emwen nogieva
Dowande –omwan ore o mween etin ne o khekee kevbe a yaan-egbe omwan ovbehe. Iyen emwen na, vbe ne alughaen ke-alughaen i na rro, ne o dekaen ovbi evbo ne omwan khin, omwan fuofua ra okhui-khui okpia ra okhuo, ra ovbi evbo na ze, ugamwen, otu aze, ra otu eghaevbo, uhunmwun evbo ra oto evbo ne omwan ke rre, ukhu ne omwan mween, ubiemwe, ra emwin ovbehe hia. Levba-sevba, alughaen-ke alughaen ni khian gha rro vbekpa otu aze ne omwan ye, ototo evbo ne omwan ke rre, ra arrioba evbo ne o yaan egbe ere, arrioba edayi ra arrioba ne i mween ekhae ne egbe ere, ra evbo ne o rre ototo arrioba ovbehe.

And now the Yorùbá version:

Abala kìíní.
Gbogbo ènìyàn ni a bí ní òmìnira; iyì àti è̟tó̟ kò̟ò̟kan sì dó̟gba. Wó̟n ní è̟bùn ti làákàyè àti ti è̟rí-o̟kàn, ó sì ye̟ kí wo̟n ó máa hùwà sí ara wo̟n gé̟gé̟ bí o̟mo̟ ìyá.

Abala kejì.
E̟nì kò̟ò̟kan ló ní àǹfàní sí gbogbo è̟tó̟ àti òmìnira tí a ti gbé kalè̟ nínú ìkéde yìí láìfi ti ò̟rò̟ ìyàtò̟ è̟yà kankan s̟e; ìyàtò̟ bí i ti è̟yà ènìyàn, àwò̟̟̟, ako̟-n̅-bábo, èdè, è̟sìn, ètò ìs̟èlú tàbí ìyàtò̟ nípa èrò e̟ni, orílè̟-èdè e̟ni, orírun e̟ni, ohun ìní e̟ni, ìbí e̟ni tàbí ìyàtò̟̟ mìíràn yòówù kó jé̟. Síwájú sí i, a kò gbo̟dò̟ ya e̟nìké̟ni só̟tò̟ nítorí irú ìjo̟ba orílè̟-èdè rè̟ ní àwùjo̟ àwo̟n orílè̟-èdè tàbí nítorí ètò-ìs̟èlú tàbí ètò-ìdájó̟ orílè̟-èdè rè̟; orílè̟-èdè náà ìbáà wà ní òmìnira tàbí kí ó wà lábé̟ ìs̟àkóso ilè̟ mìíràn, wo̟n ìbáà má dàá ìjo̟ba ara wo̟n s̟e tàbí kí wó̟n wà lábé̟ ìkáni-lápá-kò yòówù tí ìbáà fé̟ dí òmìnira wo̟n ló̟wó̟ gé̟gé̟ bí orílè̟-èdè.

Both Edo and Yorùbá are classified as Benue-Congo languages, and both have been side by side for at least the last 1,500 years, with considerable amount of well-attested back and forth between speakers of the two; the best estimate is that the two tongues diverged only 5,000 years ago; yet despite all this, what stands out most, at least to my eyes, is how little the two languages actually share in terms of common vocabulary1. That it has even been possible to classify the two into the same family owes more to certain grammatical features they share in common than to the sort of systematic comparisons that Indo-European scholars relied on in the 19th century. If we are so hard-pressed to find resemblances between languages lying in the same subfamily of Niger-Congo, what right have we to expect any better when comparing conjectural reconstructions of the long-dead ancestors of today's big families?

The difficulties linguists have faced ascertaining the relationships between languages like Korean and Japanese are, if anything, greater than those with which students of African languages must wrestle. Here are two languages in close geographic proximity with each other, spoken by peoples that are clearly closely related by blood, but what little vocabulary they share is essentially comprised of the borrowings Korean and Japanese have both made from Chinese. If we accept the hypothesis that Japenese stems from a now extinct and highly divergent form of Korean spoken in the ancient kingdom of Baekje (백제), we ought to be, if anything, even more discouraged about the prospects for reconstructing language superfamilies, for the lesson to be drawn from the example of Japanese would be that linguistic divergence can be even so rapid that a mere 2,000 years is sometimes enough to efface all similarities in vocabulary. Our great success with Indo-European owes as much to luck as to anything innate to the process of linguistic change.

If one were trying to make the case for optimism, one might reason as follows: it is true that languages diverge so rapidly that it is foolish to hope to recover what was spoken by the men who first decided to venture out of Africa, but we needn't be reaching that far back in the past to obtain the ancestral language. Might it not be the case that the common ancestor of all the languages in existence today lies close enough in time for something of value to be recoverable? Just as in population genetics it can happen that the common ancestor2 of all of a given population occurred much more recently than the origin of the species itself, it could well be that Arabic, English, Japanese and Yorùbá are all descendants of just one of many languages spoken long ago, with the descandants of all the others having since gone extinct. This is not a possibility to be dismissed out of hand, but it still strikes me as extremely unlikely, as, for one thing, the extremely ancient date of settlement of those areas in which Austronesian and Na-Dene speakers are to be found argues against any possible ancestral language being less than 25,000 years old. Even if we restrict our attention to less comprehensive groupings like the hypothetical Nostratic, the time scales involved remain so great that any such proposal seems an exercise in futility.3

1 - The same phenomenon can be seen when comparing numerals in Igbo, another Benue-Congo language, with those in Yorùbá; numerals are usually amongst the most conservative of language elements, but there is essentially no resemblance two be found in the words the two languages use to denote numbers.

2 - The knowledgeable reader will be aware that I am referring, of course, to the coalescent theory of population genetics.

3 - We can pretty much rule out from the start any Nostratic proposal that includes Afro-Asiatic within a supposedly Eurasian superfamily, as the number and divergence of the various branches of Afro-Asiatic point very strongly to an African origin, most likely within that portion of North Africa that has given way to desert over the last 5,000 years.