Withdraw
Loading…
Multimodal emotion recognition and speaker identification in financial conversations
Kaikaus, Jamshed
Loading…
Permalink
https://hdl.handle.net/2142/129951
Description
- Title
- Multimodal emotion recognition and speaker identification in financial conversations
- Author(s)
- Kaikaus, Jamshed
- Issue Date
- 2025-07-17
- Director of Research (if dissertation) or Advisor (if thesis)
- Brunner, Robert J
- Doctoral Committee Chair(s)
- Brunner, Robert J
- Committee Member(s)
- Mendoza, Kimberly
- Carrasco Kind, Matias
- Zhu, Wei
- Department of Study
- Illinois Informatics Institute
- Discipline
- Informatics
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Emotion recognition
- Speaker identification
- Multimodal learning
- Multi-task learning
- Machine learning
- Deep learning
- Affective computing
- Representation learning
- Large language models
- Natural language processing
- Speech processing
- Financial artificial intelligence
- Financial communication
- Earnings calls
- Data annotation
- Dataset curation
- Abstract
- This dissertation explores the use of multimodal, speaker-, and emotion-aware modeling for financial discourse, with a primary focus on Multimodal Emotion Recognition in Conversation and its downstream impact on key financial inference tasks. Motivated by the limitations of unimodal sentiment analysis and the strategic language regulation present in corporate communication, we introduce a large-scale, Large Language Model-based annotation framework for labeling emotion and emotion intensity in quarterly earnings call transcripts and audio recordings. We conduct a rigorous evaluation of annotation quality under varying prompt configurations and deterministic settings, uncovering key trade-offs between diversity and reliability. The resulting novel corpus, Multimodal Financial Emotion, serves as the foundation for developing MERSI, a multimodal, multi-task model that jointly predicts emotional state and speaker identity, leveraging contextual and acoustic cues. We show that this joint formulation improves performance over unimodal and context-agnostic baselines, particularly in capturing the nuanced structure of multi-label emotion recognition. Building on these insights, we evaluate MERSI in the scope of representation learning with respect to two downstream financial applications: Financial Restatement Prediction and Market Movement Prediction. Moreover, we introduce a Label Distribution Learning-based variant of our proposed model, which offers superior generalization on minority class predictions, highlighting its utility in capturing subtle emotional and narrative cues. Notably, the proposed models both exhibit stronger performance when predicting negative outcomes, which potentially reflects the greater emotional salience of adverse events. These findings further underscore the potential of multimodal, speaker-aware modeling as a scalable, generalizable framework not only for general emotion recognition but also in high-stakes financial decision-making and regulatory insight.
- Graduation Semester
- 2025-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129951
- Copyright and License Information
- Copyright 2025 Jamshed Kaikaus
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…