Enhancing Automatic Emotion Recognition for Clinical Applications: A Multimodal, Personalized Approach and Quantification of Emotional Reaction Intensity with Transformers

Date
2023
Authors
Qian, Yang
Contributor
Advisor
Washington, Peter
Department
Computer Science
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
In the realm of artificial intelligence, Automated Emotion Recognition (AER) has emerged as a pivotal research area, intersecting computer vision, natural language processing, and human- computer interaction. This research is particularly relevant to fields such as healthcare, education, and entertainment. This thesis is primarily concerned with enhancing Facial Expression Recogni- tion (FER), a crucial aspect of AER. In contrast to typical multimodal and Transformer models’ methodologies, this work explores the potential of personalization and the quantification of emo- tional reaction intensity in the pursuit of improving AER.This research draws upon the encouraging advancements of deep learning techniques, such as Convolutional Neural Networks (CNNs) and Attention Mechanisms, to improve FER. A com- prehensive review of the literature is presented, encompassing topics like emotion recognition, affective computing, face detection, and the role of deep learning in multimodal communication. The research methodology and experimental design, which involve the use of emotion recognition datasets and integrated network methodologies such as CNN-LSTM (Long Short-Term Memory) and CNN-Transformers, are delineated in subsequent chapters. In the methodology chapter, a suite of independent experiments is designed to probe different facets of emotion recognition. The first experiment investigates the model architecture, feature ex- traction, and data preprocessing techniques for FER. A comparative analysis is conducted among traditional CNN, transfer learning with ImageNet pre-trained models, and Vision Transformers (ViT) for FER tasks, with the goal of deciphering the causes of their performance differences. This research further investigates the potential of personalization in emotion recognition to en- hance AER performance. This is demonstrated by developing a personalized CNN-LSTM emo- tion recognition model trained on individual-specific data. Additionally, an Emotional Reaction Intensity Estimation experiment is performed, utilizing CNN-Transformer model approaches. The results reveal that a focus on personalization and quantification of emotional reaction intensity contributes to significant improvements in emotion recognition. Notably, the performance metrics recorded a marginal gain of 1.5% and 1% in accuracy and an 8% improvement in the Pearson Correlation Coefficient, respectively. These findings underscore the relevance of personalization and emotional reaction intensity quantification in AER, highlighting the necessity for more precise and robust emotion recognition systems.
Description
Keywords
Computer science, Deep learning, Emotion Recognition, Facial Expression Recognition, Multimodal Learning
Citation
Extent
50 pages
Format
Geographic Location
Time Period
Related To
Table of Contents
Rights
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.