Withdraw
Loading…
Study on speech emotion recognition based on deep learning
Guan, Haozhong
Loading…
Permalink
https://hdl.handle.net/2142/117682
Description
- Title
- Study on speech emotion recognition based on deep learning
- Author(s)
- Guan, Haozhong
- Issue Date
- 2022-12-05
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Speech Emotion Recognition
- Convolution Neural Network
- Resnet50
- Language
- eng
- Abstract
- Speech emotion recognition (SER) is closely related to human life, and has the potential to bring great changes and improvements to people's lives. The continuous development of artificial intelligence and SER will bring new breakthroughs to the field of human-machine interaction. Therefore, studying SER has extremely important theoretical value and research significance. In this thesis, the development status of speech emotion recognition is reviewed, and the existing problems and development challenges are pointed out. On the basis of summarizing the key technologies of speech emotion recognition, the speech emotion recognition model of ResNet50 CNN is constructed, and the recognition experiment and analysis are carried out. The main work is as follows: The speech emotion description model, the process of speech emotion recognition, the preprocessing of speech signals and the extraction method of emotion feature parameters are summarized. The time domain waveform and the spectrogram characteristics of different emotional speeches are analyzed, and the speech emotion recognition scheme combining the extraction of spectrogram and CNN is determined. In this thesis, a CNN model is constructed based on a residual network, which uses ResNet50 network and bottleneck block, and consists of 49 convolutional layers and one fully connected layer. The output is expressed as a linear superposition of nonlinear transformation by “shortcut connections” of residual network, which improves the problem of gradient disappearance or explosion in the process of back propagation, and makes the deep network get better training. Based on IEMOCAP and Emo-DB datasets, the efficient speech emotion recognition is realized. The results show that the recognition accuracies of the constructed ResNet50 CNN model for IEMOCAP and Emo-DB datasets are 69.12% and 85.92%, respectively. Compared with other deep learning models, the proposed ResNet50 CNN model is simple and efficient.
- Graduation Semester
- 2022-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/117682
- Copyright and License Information
- Copyright 2022 Haozhong Guan
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…