Implementing pre-trained language modeling approaches for author name disambiguation
- Implementing pre-trained language modeling approaches for author name disambiguation
- Kim, Jenna
- Issue Date
- author name disambiguation, pre-trained language model, deep learning
- Distinguishing who is who has been a lingering problem for researchers analyzing bibliographic data in which different authors share the same names or different names refer to the same authors. Various machine learning (ML) methods have been proposed to resolve the issue of author name ambiguity. Still, technical details of ML workflows necessary to compare and improve these methods have been insufficiently shared. Also, a few studies showed that neural network models can outperform conventional ML models in author name disambiguation (AND), but deep learning (DL) using pre-trained language models such as BERT have not yet been applied to AND tasks. This study investigates how pre-trained language models can be adopted for AND tasks, aiming to make novel contributions in several ways. First, an approach of using the state-of-the-art pre-trained language models is used in AND tasks for comparison with conventional ML techniques. Second, the scope of features used for existing studies is extended by adding abstract texts to records of metadata. Third, the workflows of several high-performing ML and DL methods for classification and clustering in AND are integrated as an open-source framework, enabling the implementation steps of different AND methods transparent and thus allowing easy comparison across the methods. The code will be made publicly available for use as a benchmark framework upon which AND researchers can build new models, reducing error or delay raised when developing code from scratch. By helping researchers have a better understanding of similarities and differences between various ML- and DL-based AND approaches, this study can enhance the robustness of research findings that resolve author name ambiguity in bibliographic data.
- Type of Resource
Edit Collection Membership