Files in this item
Files | Description | Format |
---|---|---|
application/pdf ![]() | (no description provided) |
Description
Title: | Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database |
Author(s): | Torvik, Vetle I.; Agarwal, Sneha |
Subject(s): | bibliometrics
ethnicity classification machine learning |
Abstract: | We present a nearest neighbor approach to ethnicity classification. Given an author name, all of its instances (or the most similar ones) in PubMed are identified and coupled with their respective country of affiliation, and then probabilistically mapped to a set of 26 predefined ethnicities. The dominant ethnicity (or pair of ethnicities) is assigned as the class. The predictions are also used to upgrade Genni (Smith, Singh, and Torvik, 2013) to provide ethnicity-specific gender predictions for cases like Italian vs. English Andrea, Turkish vs. Korean Bora, Israeli vs. Nordic Eli, and Slavic vs. Japanese Renko. Ethnea and Genni 2.0 are available at http://abel.lis.illinois.edu |
Issue Date: | 2016-03 |
Citation Info: | Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington DC, USA |
Genre: | Conference Paper / Presentation |
Type: | Text |
Language: | English |
URI: | http://hdl.handle.net/2142/88927 |
Sponsor: | NIH P01AG039347 NSF 1348742 |
Date Available in IDEALS: | 2016-03-01 |
This item appears in the following Collection(s)
-
Faculty and Staff Research and Scholarship - Information Sciences
Articles, papers, and other research and scholarship from iSchool faculty and staff