|Abstract:||Recently, music complexity has drawn attention from researchers in the Music Information Retrieval (MIR) area. In particular, computational methods to measure music complexity have been studied to provide better music services in large-scale music digital libraries. However, the majority of music complexity research has focused on audio-related facets of music, while song lyrics have been rarely considered. Based on the observation that most popular songs contain lyrics, whose different levels of complexity contribute to the overall music complexity, this dissertation research investigates song lyric complexity and how it might be measured computationally.
In a broad sense, lyric complexity comes from two aspects of text complexity--quantitative and qualitative dimensions--that have a complementary relationship. For a comprehensive understanding of lyric complexity, this study explores both dimensions. First, for the quantitative dimensions, such as word frequency and word length, refer to those that can be measured efficiently using computer programs. Among them, this study examines the concreteness of song lyrics using trend analysis. Second, on the contrary to the quantitative dimensions, the qualitative dimensions refer to a deeper level of lyric complexity that requires attentive readers' comprehension and external knowledge. However, it is challenging to collect a large-scale qualitative analysis of lyric complexity due to the resource constraints. To this end, this dissertation introduces user-generated interpretations of song lyrics that are abundant on the web as a proxy for assessing the qualitative dimensions of lyric complexity. To be specific, this study first examines whether the user-generated data provide quality topic information, and then proposes a Lyric Topic Diversity Score (LTDS), a lyric complexity metric based on the diversity of the topics found in users' interpretations. The assumption behind this approach is that complex song lyrics tend to provoke diverse user interpretations due to their properties, such as ambiguous meanings, historical context, the author's intention, and so on.
The first findings of this study include that concreteness of popular song lyrics fell from the middle of the 1960s until the 1990s and rose after that. The advent of Hip-Hop/Rap and the number of words in song lyrics are highly correlated with the rise in concreteness after the early 1990s. Second, interpretations are a good input source for automatic topic detection algorithms. Third, the interpretation-based lyric complexity metric looks promising because it is correlated with Lexical Novelty Scores (LNS), the only previously developed lyric complexity measure. Overall, this work expands the scope of music complexity by focusing on relatively unexplored data, song lyrics. Moreover, these findings suggest that any potential analysis and application on any objects can benefit from this kind of auxiliary data, which is in the form of user comments.