The evolution of languages closely resembles the evolution of haploid organisms. This similarity has been recently exploited ( Gray R. D. and Atkinson Q. D., Nature, 426 ( 2003) 435; Gray R. D. and Jordan F. M., Nature, 405 ( 2000) 1052) to construct language trees. The key point is the definition of a distance among all pairs of languages which is the analogous of a genetic distance. Many methods have been proposed to de. ne these distances; one of these, used by glottochronology, computes the distance from the percentage of shared "cognates". Cognates are words inferred to have a common historical origin, and subjective judgment plays a relevant role in the identfication process. Here we push closer the analogy with evolutionary biology and we introduce a genetic distance among language pairs by considering a renormalized Levenshtein distance among words with same meaning and averaging on all words contained in a Swadesh list ( Swadesh M., Proc. Am. Philos. Soc., 96 ( 1952) 452). The subjectivity of process is consistently reduced and the reproducibility is highly facilitated. We test our method against the Indo-European group considering fifty different languages and the two hundred words of the Swadesh list for any of them. We find out a tree which closely resembles the one published in Gray and Atkinson ( 2003), with some significant differences. Copyright (c) EPLA, 2008.

Indo-European languages tree by Levenshtein distance

SERVA, Maurizio;
2008-01-01

Abstract

The evolution of languages closely resembles the evolution of haploid organisms. This similarity has been recently exploited ( Gray R. D. and Atkinson Q. D., Nature, 426 ( 2003) 435; Gray R. D. and Jordan F. M., Nature, 405 ( 2000) 1052) to construct language trees. The key point is the definition of a distance among all pairs of languages which is the analogous of a genetic distance. Many methods have been proposed to de. ne these distances; one of these, used by glottochronology, computes the distance from the percentage of shared "cognates". Cognates are words inferred to have a common historical origin, and subjective judgment plays a relevant role in the identfication process. Here we push closer the analogy with evolutionary biology and we introduce a genetic distance among language pairs by considering a renormalized Levenshtein distance among words with same meaning and averaging on all words contained in a Swadesh list ( Swadesh M., Proc. Am. Philos. Soc., 96 ( 1952) 452). The subjectivity of process is consistently reduced and the reproducibility is highly facilitated. We test our method against the Indo-European group considering fifty different languages and the two hundred words of the Swadesh list for any of them. We find out a tree which closely resembles the one published in Gray and Atkinson ( 2003), with some significant differences. Copyright (c) EPLA, 2008.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/11460
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 85
  • ???jsp.display-item.citation.isi??? 71
social impact