Languages evolve over time according to a process in which reproduction, mutation and extinction are all possible. This is very similar to haploid evolution for asexual organisms and for the mitochondrial DNA of complex ones. Exploiting this similarity, it is possible, in principle, to verify hypotheses concerning the relationship among languages and to reconstruct their family tree. The key point is the definition of the distances among pairs of languages in analogy with the genetic distances among pairs of organisms. Distances can be evaluated by comparing grammar and/ or vocabulary, but while it is difficult, if not impossible, to quantify grammar distance, it is possible to measure a distance from vocabulary differences. The method used by glottochronology computes distances from the percentage of shared ' cognates', which are words with a common historical origin. The weak point of this method is that subjective judgment plays a significant role. Here we de. ne the distance of two languages by considering a renormalized edit distance among words with the same meaning and averaging over the two hundred words contained in a Swadesh list. In our approach the vocabulary of a language is the analogue of DNA for organisms. The advantage is that we avoid subjectivity and, furthermore, reproducibility of results is guaranteed. We apply our method to the Indo- European and the Austronesian groups, considering, in both cases,fifty different languages. The two trees obtained are, in many respects, similar to those found by glottochronologists, with some important differences as regards the positions of a few languages. In order to support these different results we separately analyze the structure of the distances of these languages with respect to all the others.

Language distance and tree reconstruction

SERVA, Maurizio
2008

Abstract

Languages evolve over time according to a process in which reproduction, mutation and extinction are all possible. This is very similar to haploid evolution for asexual organisms and for the mitochondrial DNA of complex ones. Exploiting this similarity, it is possible, in principle, to verify hypotheses concerning the relationship among languages and to reconstruct their family tree. The key point is the definition of the distances among pairs of languages in analogy with the genetic distances among pairs of organisms. Distances can be evaluated by comparing grammar and/ or vocabulary, but while it is difficult, if not impossible, to quantify grammar distance, it is possible to measure a distance from vocabulary differences. The method used by glottochronology computes distances from the percentage of shared ' cognates', which are words with a common historical origin. The weak point of this method is that subjective judgment plays a significant role. Here we de. ne the distance of two languages by considering a renormalized edit distance among words with the same meaning and averaging over the two hundred words contained in a Swadesh list. In our approach the vocabulary of a language is the analogue of DNA for organisms. The advantage is that we avoid subjectivity and, furthermore, reproducibility of results is guaranteed. We apply our method to the Indo- European and the Austronesian groups, considering, in both cases,fifty different languages. The two trees obtained are, in many respects, similar to those found by glottochronologists, with some important differences as regards the positions of a few languages. In order to support these different results we separately analyze the structure of the distances of these languages with respect to all the others.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11697/18173
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 28
social impact