Files in this item



application/pdfHu_Xiao.pdf (1MB)
(no description provided)PDF


Title:Improving music mood classification using lyrics, audio and social tags
Author(s):Hu, Xiao
Director of Research:Downie, J. Stephen
Doctoral Committee Chair(s):Smith, Linda C.
Doctoral Committee Member(s):Downie, J. Stephen; Zhai, ChengXiang; Heidorn, P. Bryan
Department / Program:Library & Information Science
Discipline:Library & Information Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Music mood classification
Social tags
Affect analysis
Multimodal classification
Emotion theories
Music mood categories
Music information retrieval
Abstract:The affective aspect of music (popularly known as music mood) is a newly emerging metadata type and access point to music information, but it has not been well studied in information science. There has yet to be developed a suitable set of mood categories that can reflect the reality of music listening and can be well adopted in the Music Information Retrieval (MIR) community. As music repositories have grown to an unprecedentedly large scale, people call for automatic tools for music classification and recommendation. However, there have been only a few music mood classification systems with suboptimal performances, and most of them are solely based on the audio content of the music. Lyric text and social tags are resources independent of and complementary to audio content but have yet to be fully exploited. This dissertation research takes up these problems and aims to 1) summarize fundamental insights in music psychology that can help information scientists interpret music mood; 2) identify mood categories that are frequently used by real-world music listeners, through an empirical investigation of real-life social tags applied to music; 3) advance the technology in automatic music mood classification by a thorough investigation on lyric text analysis and the combination of lyrics and audio. Using linguistic resources and human expertise, 36 mood categories were identified from the most popular social tags collected from, a major Western music tagging site. A ground truth dataset of 5,296 songs in 18 mood categories were built with mood labels given by a number of real-life users. Both commonly used text features and advanced linguistic features were investigated, as well as different feature representation models and feature combinations. The best performing lyric feature sets were then compared to a leading audio-based system. In combining lyric and audio sources, both methods of feature concatenation and late fusion (linear interpolation) of classifiers were examined and compared. Finally, system performances on various numbers of training examples and different audio lengths were compared. The results indicate: 1) social tags can help identify mood categories suitable for a real world music listening environment; 2) the most useful lyric features are linguistic features combined with text stylistic features; 3) lyric features outperform audio features in terms of averaged accuracy across all considered mood categories; 4) systems combining lyrics and audio outperform audio-only and lyric-only systems; 5) combining lyrics and audio can reduce the requirement on training data size, both in number of examples and in audio length. Contributions of this research are threefold. On methodology, it improves the state of the art in music mood classification and text affect analysis in the music domain. The mood categories identified from empirical social tags can complement those in theoretical psychology models. In addition, many of the lyric text features examined in this study have never been formally studied in the context of music mood classification nor been compared to each other using a common dataset. On evaluation, the ground truth dataset built in this research is large and unique with ternary information available: audio, lyrics and social tags. Part of the dataset has been made available to the MIR community through the Music Information Retrieval Evaluation eXchange (MIREX) 2009 and 2010, the community-based evaluation framework. The proposed method of deriving ground truth from social tags provides an effective alternative to the expensive human assessments on music and thus clears the way to large scale experiments. On application, findings of this research help build effective and efficient music mood classification and recommendation systems by optimizing the interaction of music audio and lyrics. A prototype of such systems can be accessed at
Issue Date:2011-01-14
Rights Information:Copyright 2010 Xiao Hu
Date Available in IDEALS:2011-01-14
Date Deposited:December 2

This item appears in the following Collection(s)

Item Statistics