10 Dec 2015 11:38

Things I want to learn more about: statistical language processing; pragmatics; semantics; "functional grammar".

Agent-based models of language change warrant their own notebook.

Query on the reliability of historical linguistics. A large part of historical linguistics consists of reconstructing languages which have left no written records, by means of extant or recorded descendants. The paradigm, as it were, is the reconstruction of proto-Indo-European from the recorded Indo-European languages. Accompanying such reconstructions, historical linguists also postulate regular rules for how the sounds in words in the ancestral language changed into different sounds in corresponding words in the descendant languages; similarly for other features of the language, like grammatical rules, conjugations, etc. (You could simply think of these as correspondence rules between the extant languages, without necessarily invoking an ancestor, if you liked, though the ancestor is a very natural hypothesis.) Now, obviously, I'm not competent to critique any of this, but I would like to know if the reliability of linguists at performing such reconstructions, and discovering correspondences, has ever been systematically tested. One test would be to give linguists corpora from related languages whose common ancestor is well-known, and see how well they could reconstruct that ancestor. (E.g., give them the modern Romance languages, and see how close they get to Latin.) Alternately, we could give them samples from languages which are actually unrelated, but tell them they are all connected, and see if they nonetheless come up with regular sound-change patterns and so forth. Has anyone ever done anything like these tests?

Update, 29 March 2005: John O'Neil writes to tell me that both the tests I describe above are, in fact, common exercises in graduate classes in historical and comparative linguistics! He doesn't know of any statistical studies on this kind of thing, however. Also, I am ashamed to learn that the immediate ancestor of the extant Romance languages was not, in fact, literary Latin but "proto-Romance", which had already, e.g., lost noun declensions. (Ashamed, because I should have known that.) I also should take this opportunity to stress that I am not skeptical about the reliability of mainstream historical linguistics in general, just curious if we can quantify that reliability, and about how general ideas about error and the growth of knowledge apply here.

Update, 20 September 2007: Brendan Shean points me to a very neat project on doing actual statistical inference for sound-change rules, and ultimately for linguistic phylogenetic trees. See Bouchard-Cote et al. below.

