Files in this item



application/pdfBoga_2020_ILLS12proceedings.pdf (2MB)


Title:What is a Language? What is a Dialect?
Author(s):Boga, Hizniye Isabella
Italian Romance languages
Needleman-Wunsch algorithm
Levenshtein Distances
Geographic Coverage:Italy
Abstract:The current study shows how to distinguish dialects from languages. This distinction was found with the help of the Needleman-Wunsch algorithm with a weighted scorer system of PMI distances and the Levenshtein distance. The study focused on the Romance language family, especially languages of Italy. The means used in order to identify groupings in the data were mixture models and the k-means clustering. The results support the hypothesis of bearing two thresholds which divide language-language pairs, language-dialect pairs and dialect- dialect pairs into three distinct clusters. These clusters were found with the Needleman-Wunsch algorithm with normalised and divided (NWND) scores and an additional scorer system of PMI distances. Furthermore, I also used Levenshtein Distances Normalised and Divided (LDND) for comparative reasons. The suggested thresholds differentiated between the two methods. The threshold by the NWND method are 4.49 for distinguishing dialect-dialect pairs from language-dialect pairs and a threshold of 2.54 in order to distinguish dialect-language pairs from language- language pairs. For the LDND method the cut off-points are 0.37 to distinguish dialect-dialect pairs from dialect-language pairs, 0.58 to distinguish close dialect-language varieties from distant dialect- language varieties and 0.7 to distinguish distant dialect-language varieties from language-language pairs.
Issue Date:2020
Publisher:Studies in the Linguistic Sciences: Illinois Working Papers
Citation Info:Studies in the Linguistic Sciences: Illinois Working Papers 43: 1-31.
Date Available in IDEALS:2020-12-01

This item appears in the following Collection(s)

Item Statistics