We consider a large size population which evolves according to neutral haploid reproduction. The genealogical tree is very complex and genealog- ical distances are distributed according to a probability density which remains random in the limit of a large population. This density which varies for different populations, and varies for the same population at different times, has a distribution that we find out. The evolution of languages closely resembles the evolution of haploid organisms or mtDNA. This similarity allows for the construction of languages trees. The key point is the definition of a distance between pairs of languages. Here we use a renormalized Levenshtein distance among words with the same meaning and we average on all the words contained in a list. Assuming a constant rate of mutation, these lexical distances are logarithmically proportional, in average, to genealogical distances.The relation between lexical and genealogical distances is then further investigated in order to take into account the intrinsic randomness associated with the lexical evolution. We test our method by constructing the trees of the Indo-European and Austronesian groups.

Family trees: languages and genetics

SERVA, Maurizio
2009-01-01

Abstract

We consider a large size population which evolves according to neutral haploid reproduction. The genealogical tree is very complex and genealog- ical distances are distributed according to a probability density which remains random in the limit of a large population. This density which varies for different populations, and varies for the same population at different times, has a distribution that we find out. The evolution of languages closely resembles the evolution of haploid organisms or mtDNA. This similarity allows for the construction of languages trees. The key point is the definition of a distance between pairs of languages. Here we use a renormalized Levenshtein distance among words with the same meaning and we average on all the words contained in a list. Assuming a constant rate of mutation, these lexical distances are logarithmically proportional, in average, to genealogical distances.The relation between lexical and genealogical distances is then further investigated in order to take into account the intrinsic randomness associated with the lexical evolution. We test our method by constructing the trees of the Indo-European and Austronesian groups.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/33105
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact