摘要:
语音情感识别 (SER) 技术可帮助计算机理解语音中的人类情感,这在推进人机交互和心理健康诊断方面占据了关键地位。本研究的主要目标是通过创新的深度学习模型提高 SER 的准确性和泛化性。尽管它在人机交互和心理健康诊断等各个领域都很重要,但由于说话者、口音和背景噪音的差异,从语音中准确识别情绪可能具有挑战性。这项工作提出了两种创新的深度学习模型来提高 SER 准确性:CNN-LSTM 模型和注意力增强 CNN-LSTM 模型。这些模型在 2015 年至 2018 年间收集的瑞尔森情感言语和歌曲视听数据库 (RAVDESS) 上进行了测试,该数据库包括 1440 个男性和女性演员表达八种情绪的音频文件。这两个模型在将情绪分为八类方面都取得了令人印象深刻的超过 96% 的准确率。通过比较 CNN-LSTM 和注意力增强 CNN-LSTM 模型,本研究提供了对建模技术的比较见解,有助于开发更有效的情绪识别系统,并为医疗保健和客户服务中的实时应用提供实际意义。
关键词:
语音情感识别;CNN-LSTM;注意力机制;深度学习;音频处理;爵士
1. 引言
2. 文献综述
Study Ref. | Technique | Data Used | Key Findings | Accuracy |
---|---|---|---|---|
[9,10] | End-to-end SER | Various | Swift information extraction; no manual features | - |
[11,12,13] | CNN-LSTM, others | Various | Enhanced SER performance | - |
[14] | CNN | Turkish tweets | Excelling in text analysis | 87% |
[15] | LSTM, CNN | Various | Improved sentiment analysis | 94% |
[16] | CNN | Audio | Novel approach; modest success | 63% |
[17] | ResNets | Various | Promising in SER tasks | 70+% |
[19] | CNN-LSTM | RAVDESS | High accuracy in sentiment analysis | 90% |
[20] | BERT | Text emotion | Excellent in text-based detection | 92+% |
[21] | DNN, GNN | Audio, multimodal | Innovative in audio signal recognition | 70–88% |
[22] | NN | RAVDESS | Real-time emotion recognition | 80% |
3. 方法
本节介绍了本研究中使用的方法,包括数据收集、预处理和模型开发。每个步骤都详细说明了研究过程,以确保研究结果的可靠性。
3.1. 数据集
本研究选择了公开的 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) 数据集,因为它提供了音频和视频格式的广泛情绪表达 [3]。RAVDESS 收集于 2015 年至 2018 年间,包括来自 24 位专业演员(12 位男性和 12 位女性)的 1440 个音频文件,表达了八种情绪:中性、平静、快乐、悲伤、愤怒、恐惧、厌恶和惊讶。每种情绪都以两个强度级别(正常和强烈)记录,中性除外,增强了其对 SER 的适用性。它的优势包括高质量的录音和平衡的性别代表,尽管它仅限于英语和受控设置,可能会减少现实世界的可变性。在这项研究中,只使用了音频数据,这些数据是通过从多模态数据集中分离语音组件来提取的,忽略了视频元素。图 1 显示了每个类的示例。
3.2. 预处理

预处理阶段通过调整采样率和采用噪声调制技术(详见第 3.3 节)来优化 SER 模型的数据,以确保所提出的模型暴露在真实且具有挑战性的音频输入阵列中,从而增强它们在各种条件下概括和准确识别情绪的能力,例如嘈杂的真实世界环境。
3.3. 模型描述
所提出的模型集成了卷积和递归神经网络,以有效地处理和分类音频数据中的情绪。数据被分成 80% 的训练集、10% 的验证集和 10% 的测试集,并进行时间分离(早期记录用于训练,后期用于测试)以避免过度拟合。以下是对这两种模型如何工作的全面解释,并有数学背景和特定参数的支持,并通过共享配置确保了它们的可重复性。
3.3.1. 时间分布的二维 CNN-LSTM 模型
以下步骤表示第一个模型的结构。堆栈时间是指对时间分布的数据(例如 Mel 频谱图块)进行顺序处理,以捕获时间依赖性 [13]。
我们设计了如图 3 所示的模型。该模型从六个时间分布的 2D CNN 块开始,这些块处理分段的 Mel 频谱图块,这些块是预处理后获得的语音音频文件的频谱。前两个卷积块应用步幅为 1 且填充为 2 的 5 × 5 内核,而其余四个使用填充为 1 的 3 × 3 内核。此配置可确保在大多数卷积运算中保留空间维度。每个块中的初始卷积层从输入频谱图中提取局部特征。批量归一化遵循每个卷积以稳定和加速训练,如公式 (4) 中所述。

Speech Emotion Recognition: Comparative Analysis of CNN-LSTM and Attention-Enhanced CNN-LSTM Models
Abstract
1. Introduction
2. Literature Review

3. Methodology
3.1. Dataset
3.2. Pre-Processing
3.3. Models Description
3.3.1. Time Distributed 2D CNN-LSTM Model
3.3.2. 堆叠时间分布 2D CNN——带注意力的双向 LSTM
3.3.2. 堆叠时间分布 2D CNN——带注意力的双向 LSTM




此过程重复多次,以创建每个原始信号的多个增强版本,这与传统方法不同,传统方法可能无法解决噪声鲁棒性问题。然后将增强的信号添加到训练数据集中,有效地增加了其大小和多样性。这有助于防止模型过度拟合,确保模型在训练期间暴露于更广泛的数据中,从而提高从 RAVDESS 到潜在真实世界噪声条件的泛化能力。
3.4. 消融研究
为了验证关键架构组件的贡献,我们对两种模型进行了消融实验。在模型 2 中,删除 LSTM 层导致准确率从 98.1% 显着下降到 95%,突出了它在捕获时间依赖性方面的作用。此外,省略 dropout 层会导致过度拟合增加,验证损失增加 15% 就是证明。对于这两种型号。此外,排除双向 LSTM 将准确性降低到 89%,证实了它在增强上下文理解方面的价值。这些结果验证了每个组件实现高性能和强泛化的必要性。
4. 结果与讨论
本节使用损失、准确度、精度、召回率和 F1 分数等指标对模型的性能进行定性和定量评估。混淆矩阵用于有效地比较两种模型的分类能力。
在这项研究中,使用两种不同的技术将噪声增强应用于两个模型。模型 1 使用自定义均匀噪声注入方法将均匀分布的噪声添加到信号中进行训练,而模型 2 使用加性高斯白噪声 (AWGN) 进行增强,如提供的实现中所述。增强方法改进了泛化并减少过拟合,所有这些都无需修改底层模型架构。
为了确保公平评估并避免超时问题,在批量大小为 32 的 GPU (NVIDIA RTX 3080) 上进行了训练,在报告时间内完成了 564 个时期(表 2 )。有多种技术可用于改进泛化,例如正则化、提前停止以及通过删除层或块来简化模型。然而,在这项研究中,我们选择保持模型不变,使用简单的噪声增强来保持整体简单性。均匀噪声和 AWGN 都有助于显著降低验证损失,表明模型泛化程度有所提高。
Table 2. A comparison of the training epochs, time, and accuracy of both models.
Models | Epochs | Training Time (Minutes) | Accuracy |
---|---|---|---|
Model 1 | 60 | 20 | 60% |
130 | 43 | 80% | |
200 | 71 | 96.5% | |
Model 2 | 60 | 31 | 67% |
130 | 68 | 85% | |
200 | 103 | 98.1% |
混淆矩阵对于评估分类任务至关重要。图 5 提供了两种模型在噪声增强前后的全面比较,以及它们的最终测试性能。图 5a、b 分别显示了添加均匀噪声之前和之后模型 1 的验证结果。最初,模型 1 很难区分情绪相似的类别,如悲伤和平静,如图 5a 所示。在均匀的噪声增强之后,图 5b 显示了大多数情绪的显著改善,尤其是在减少训练样本较少的中性错误分类方面。

Table 3. Comparing precision, recall, and F1 score for both models, along with overall accuracy.
Class | Precision | Recall | F1-Score | Accuracy (%) | |||
---|---|---|---|---|---|---|---|
Model 1 | Model 2 | Model 1 | Model 2 | Model 1 | Model 2 | ||
Surprise | 1.00 | 1.00 | 1.00 | 0.95 | 1.00 | 0.97 | 96.5 (M1) 98.1 (M2) |
Neutral | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
Calm | 0.90 | 0.95 | 0.95 | 1.00 | 0.92 | 0.97 | |
Happy | 0.95 | 1.00 | 0.95 | 1.00 | 0.95 | 1.00 | |
Sad | 0.95 | 1.00 | 0.95 | 1.00 | 0.95 | 1.00 | |
Angry | 0.90 | 0.95 | 0.95 | 1.00 | 0.92 | 0.97 | |
Fear | 1.00 | 0.95 | 1.00 | 1.00 | 1.00 | 0.97 | |
Disgust | 1.00 | 0.95 | 0.95 | 0.90 | 0.97 | 0.92 |
5. 结论
model = Sequential()
model.add(Conv1D(1024, kernel_size=5, strides=1, padding='same', activation='relu', input_shape=(X.shape[1], 1)))
model.add(MaxPooling1D(pool_size=2, strides = 2, padding = 'same'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Conv1D(512, kernel_size=5, strides=1, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2, strides = 2, padding = 'same'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Conv1D(256, kernel_size=5, strides=1, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2, strides = 2, padding = 'same'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(128, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(128))
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(8, activation='softmax'))
model.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ conv1d_3 (Conv1D) │ (None, 162, 1024) │ 6,144 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling1d_3 (MaxPooling1D) │ (None, 81, 1024) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_3 │ (None, 81, 1024) │ 4,096 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_9 (Dropout) │ (None, 81, 1024) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv1d_4 (Conv1D) │ (None, 81, 512) │ 2,621,952 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling1d_4 (MaxPooling1D) │ (None, 41, 512) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_4 │ (None, 41, 512) │ 2,048 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_10 (Dropout) │ (None, 41, 512) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv1d_5 (Conv1D) │ (None, 41, 256) │ 655,616 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling1d_5 (MaxPooling1D) │ (None, 21, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_5 │ (None, 21, 256) │ 1,024 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_11 (Dropout) │ (None, 21, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ lstm_3 (LSTM) │ (None, 21, 128) │ 197,120 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_12 (Dropout) │ (None, 21, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ lstm_4 (LSTM) │ (None, 21, 128) │ 131,584 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_13 (Dropout) │ (None, 21, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ lstm_5 (LSTM) │ (None, 128) │ 131,584 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_14 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_4 (Dense) │ (None, 128) │ 16,512 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_15 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_5 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_16 (Dropout) │ (None, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_6 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_17 (Dropout) │ (None, 32) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_7 (Dense) │ (None, 8) │ 264 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 3,778,280 (14.41 MB)
Trainable params: 3,774,696 (14.40 MB)
Non-trainable params: 3,584 (14.00 KB)
Epoch 1/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 37ms/step - accuracy: 0.2153 - loss: 1.9636 - val_accuracy: 0.2426 - val_loss: 1.9693 - learning_rate: 0.0010
Epoch 2/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.3108 - loss: 1.6652 - val_accuracy: 0.3463 - val_loss: 1.5517 - learning_rate: 0.0010
Epoch 3/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 35ms/step - accuracy: 0.3883 - loss: 1.5169 - val_accuracy: 0.3664 - val_loss: 1.6439 - learning_rate: 0.0010
Epoch 4/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.4566 - loss: 1.4205 - val_accuracy: 0.5945 - val_loss: 1.1129 - learning_rate: 0.0010
Epoch 5/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 39ms/step - accuracy: 0.5555 - loss: 1.2191 - val_accuracy: 0.6353 - val_loss: 0.9806 - learning_rate: 0.0010
Epoch 6/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 37ms/step - accuracy: 0.6087 - loss: 1.1061 - val_accuracy: 0.5578 - val_loss: 1.2841 - learning_rate: 0.0010
Epoch 7/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.6493 - loss: 0.9911 - val_accuracy: 0.6758 - val_loss: 0.8649 - learning_rate: 0.0010
Epoch 8/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.6640 - loss: 0.9323 - val_accuracy: 0.6800 - val_loss: 0.8433 - learning_rate: 0.0010
Epoch 9/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.6765 - loss: 0.8867 - val_accuracy: 0.6717 - val_loss: 1.0084 - learning_rate: 0.0010
Epoch 10/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 39ms/step - accuracy: 0.6944 - loss: 0.8469 - val_accuracy: 0.6675 - val_loss: 0.8587 - learning_rate: 0.0010
Epoch 11/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 37ms/step - accuracy: 0.6899 - loss: 0.8328 - val_accuracy: 0.6892 - val_loss: 0.7895 - learning_rate: 0.0010
Epoch 12/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.6997 - loss: 0.8093 - val_accuracy: 0.7035 - val_loss: 0.7804 - learning_rate: 0.0010
Epoch 13/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 37ms/step - accuracy: 0.7010 - loss: 0.8087 - val_accuracy: 0.7247 - val_loss: 0.7322 - learning_rate: 0.0010
Epoch 14/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.7118 - loss: 0.7662 - val_accuracy: 0.7046 - val_loss: 0.7973 - learning_rate: 0.0010
Epoch 15/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.7338 - loss: 0.7147 - val_accuracy: 0.7078 - val_loss: 0.7720 - learning_rate: 0.0010
Epoch 16/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 36ms/step - accuracy: 0.7229 - loss: 0.7290 - val_accuracy: 0.6889 - val_loss: 0.8257 - learning_rate: 0.0010
Epoch 17/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.7225 - loss: 0.7494 - val_accuracy: 0.6949 - val_loss: 0.8535 - learning_rate: 0.0010
Epoch 18/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.7355 - loss: 0.7058 - val_accuracy: 0.7194 - val_loss: 0.7390 - learning_rate: 0.0010
Epoch 19/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.7411 - loss: 0.6920 - val_accuracy: 0.7376 - val_loss: 0.7078 - learning_rate: 0.0010
Epoch 20/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.7406 - loss: 0.6911 - val_accuracy: 0.7141 - val_loss: 0.7635 - learning_rate: 0.0010
Epoch 21/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.7450 - loss: 0.6977 - val_accuracy: 0.7265 - val_loss: 0.7643 - learning_rate: 0.0010
Epoch 22/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 39ms/step - accuracy: 0.7431 - loss: 0.6786 - val_accuracy: 0.7362 - val_loss: 0.7558 - learning_rate: 0.0010
Epoch 23/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 35ms/step - accuracy: 0.7346 - loss: 0.7241 - val_accuracy: 0.7348 - val_loss: 0.7062 - learning_rate: 0.0010
Epoch 24/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.7594 - loss: 0.6490 - val_accuracy: 0.7380 - val_loss: 0.6892 - learning_rate: 0.0010
Epoch 25/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 40ms/step - accuracy: 0.7486 - loss: 0.6795 - val_accuracy: 0.7286 - val_loss: 0.6955 - learning_rate: 0.0010
Epoch 26/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.7423 - loss: 0.6799 - val_accuracy: 0.7389 - val_loss: 0.7198 - learning_rate: 0.0010
Epoch 27/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 37ms/step - accuracy: 0.7559 - loss: 0.6395 - val_accuracy: 0.7249 - val_loss: 0.7698 - learning_rate: 0.0010
Epoch 28/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 38ms/step - accuracy: 0.7583 - loss: 0.6320 - val_accuracy: 0.7429 - val_loss: 0.7185 - learning_rate: 0.0010
Epoch 29/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 35ms/step - accuracy: 0.7527 - loss: 0.6426 - val_accuracy: 0.7422 - val_loss: 0.6955 - learning_rate: 0.0010
Epoch 30/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.7676 - loss: 0.6296 - val_accuracy: 0.7493 - val_loss: 0.6471 - learning_rate: 0.0010
Epoch 31/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.7662 - loss: 0.6177 - val_accuracy: 0.7571 - val_loss: 0.6693 - learning_rate: 0.0010
Epoch 32/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.7819 - loss: 0.6026 - val_accuracy: 0.7555 - val_loss: 0.6534 - learning_rate: 0.0010
Epoch 33/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.7768 - loss: 0.6018 - val_accuracy: 0.7544 - val_loss: 0.6464 - learning_rate: 0.0010
Epoch 34/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 35ms/step - accuracy: 0.7747 - loss: 0.6004 - val_accuracy: 0.7203 - val_loss: 0.7747 - learning_rate: 0.0010
Epoch 35/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.7768 - loss: 0.6081 - val_accuracy: 0.7461 - val_loss: 0.7599 - learning_rate: 0.0010
Epoch 36/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.7825 - loss: 0.5860 - val_accuracy: 0.7613 - val_loss: 0.6666 - learning_rate: 0.0010
Epoch 37/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 37ms/step - accuracy: 0.7836 - loss: 0.5870 - val_accuracy: 0.7622 - val_loss: 0.6769 - learning_rate: 0.0010
Epoch 38/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.7964 - loss: 0.5549 - val_accuracy: 0.7424 - val_loss: 0.6943 - learning_rate: 0.0010
Epoch 39/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.7884 - loss: 0.5751 - val_accuracy: 0.7611 - val_loss: 0.6916 - learning_rate: 0.0010
Epoch 40/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.7961 - loss: 0.5651 - val_accuracy: 0.7594 - val_loss: 0.7202 - learning_rate: 0.0010
Epoch 41/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.7930 - loss: 0.5558 - val_accuracy: 0.7574 - val_loss: 0.6860 - learning_rate: 0.0010
Epoch 42/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 35ms/step - accuracy: 0.7949 - loss: 0.5488 - val_accuracy: 0.7532 - val_loss: 0.6702 - learning_rate: 0.0010
Epoch 43/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.7946 - loss: 0.5478 - val_accuracy: 0.7491 - val_loss: 0.7065 - learning_rate: 0.0010
Epoch 44/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.7908 - loss: 0.5802 - val_accuracy: 0.7551 - val_loss: 0.7272 - learning_rate: 0.0010
Epoch 45/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8036 - loss: 0.5504 - val_accuracy: 0.7652 - val_loss: 0.6707 - learning_rate: 0.0010
Epoch 46/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 36ms/step - accuracy: 0.8009 - loss: 0.5405 - val_accuracy: 0.7689 - val_loss: 0.6954 - learning_rate: 0.0010
Epoch 47/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8067 - loss: 0.5342 - val_accuracy: 0.7774 - val_loss: 0.6802 - learning_rate: 0.0010
Epoch 48/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.8144 - loss: 0.5044 - val_accuracy: 0.7712 - val_loss: 0.6441 - learning_rate: 0.0010
Epoch 49/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8135 - loss: 0.5123 - val_accuracy: 0.7666 - val_loss: 0.7112 - learning_rate: 0.0010
Epoch 50/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.7987 - loss: 0.5400 - val_accuracy: 0.7661 - val_loss: 0.6949 - learning_rate: 0.0010
Epoch 51/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.8163 - loss: 0.5119 - val_accuracy: 0.7696 - val_loss: 0.6756 - learning_rate: 0.0010
Epoch 52/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8204 - loss: 0.5084 - val_accuracy: 0.7578 - val_loss: 0.7212 - learning_rate: 0.0010
Epoch 53/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 36ms/step - accuracy: 0.8135 - loss: 0.5052 - val_accuracy: 0.7668 - val_loss: 0.6683 - learning_rate: 0.0010
Epoch 54/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8164 - loss: 0.5022 - val_accuracy: 0.7781 - val_loss: 0.6305 - learning_rate: 0.0010
Epoch 55/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 41ms/step - accuracy: 0.8148 - loss: 0.5150 - val_accuracy: 0.7737 - val_loss: 0.6815 - learning_rate: 0.0010
Epoch 56/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.8253 - loss: 0.4878 - val_accuracy: 0.7200 - val_loss: 0.9112 - learning_rate: 0.0010
Epoch 57/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 35ms/step - accuracy: 0.8114 - loss: 0.5258 - val_accuracy: 0.7703 - val_loss: 0.7413 - learning_rate: 0.0010
Epoch 58/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8100 - loss: 0.5143 - val_accuracy: 0.7832 - val_loss: 0.6766 - learning_rate: 0.0010
Epoch 59/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 36ms/step - accuracy: 0.8171 - loss: 0.5114 - val_accuracy: 0.7790 - val_loss: 0.6732 - learning_rate: 0.0010
Epoch 60/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.8290 - loss: 0.4781 - val_accuracy: 0.7802 - val_loss: 0.6485 - learning_rate: 0.0010
Epoch 61/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8261 - loss: 0.4743 - val_accuracy: 0.7813 - val_loss: 0.6789 - learning_rate: 0.0010
Epoch 62/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.8325 - loss: 0.4594 - val_accuracy: 0.7813 - val_loss: 0.6707 - learning_rate: 0.0010
Epoch 63/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8395 - loss: 0.4399 - val_accuracy: 0.7793 - val_loss: 0.6661 - learning_rate: 0.0010
Epoch 64/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8384 - loss: 0.4407 - val_accuracy: 0.7892 - val_loss: 0.6566 - learning_rate: 0.0010
Epoch 65/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 37ms/step - accuracy: 0.8408 - loss: 0.4469 - val_accuracy: 0.7657 - val_loss: 0.7264 - learning_rate: 0.0010
Epoch 66/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8357 - loss: 0.4654 - val_accuracy: 0.7717 - val_loss: 0.6659 - learning_rate: 0.0010
Epoch 67/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8366 - loss: 0.4634 - val_accuracy: 0.7823 - val_loss: 0.6298 - learning_rate: 0.0010
Epoch 68/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 35ms/step - accuracy: 0.8428 - loss: 0.4283 - val_accuracy: 0.7816 - val_loss: 0.6530 - learning_rate: 0.0010
Epoch 69/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8521 - loss: 0.4189 - val_accuracy: 0.7770 - val_loss: 0.6688 - learning_rate: 0.0010
Epoch 70/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8420 - loss: 0.4489 - val_accuracy: 0.7857 - val_loss: 0.6584 - learning_rate: 0.0010
Epoch 71/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8424 - loss: 0.4449 - val_accuracy: 0.7666 - val_loss: 0.7331 - learning_rate: 0.0010
Epoch 72/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 36ms/step - accuracy: 0.8456 - loss: 0.4312 - val_accuracy: 0.7878 - val_loss: 0.6981 - learning_rate: 0.0010
Epoch 73/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8492 - loss: 0.4324 - val_accuracy: 0.7878 - val_loss: 0.6661 - learning_rate: 0.0010
Epoch 74/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8401 - loss: 0.4341 - val_accuracy: 0.7855 - val_loss: 0.7077 - learning_rate: 0.0010
Epoch 75/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.8461 - loss: 0.4273 - val_accuracy: 0.7857 - val_loss: 0.6850 - learning_rate: 0.0010
Epoch 76/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8463 - loss: 0.4318 - val_accuracy: 0.7906 - val_loss: 0.6968 - learning_rate: 0.0010
Epoch 77/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8548 - loss: 0.4133 - val_accuracy: 0.7882 - val_loss: 0.7185 - learning_rate: 0.0010
Epoch 78/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8409 - loss: 0.4635 - val_accuracy: 0.7825 - val_loss: 0.6842 - learning_rate: 0.0010
Epoch 79/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 35ms/step - accuracy: 0.8570 - loss: 0.4260 - val_accuracy: 0.7834 - val_loss: 0.6868 - learning_rate: 0.0010
Epoch 80/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8472 - loss: 0.4231 - val_accuracy: 0.7873 - val_loss: 0.6727 - learning_rate: 0.0010
Epoch 81/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8564 - loss: 0.4117 - val_accuracy: 0.7942 - val_loss: 0.6683 - learning_rate: 0.0010
Epoch 82/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8605 - loss: 0.3988 - val_accuracy: 0.7984 - val_loss: 0.7252 - learning_rate: 0.0010
Epoch 83/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.8571 - loss: 0.4182 - val_accuracy: 0.7968 - val_loss: 0.6476 - learning_rate: 0.0010
Epoch 84/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.8532 - loss: 0.4352 - val_accuracy: 0.7982 - val_loss: 0.6890 - learning_rate: 0.0010
Epoch 85/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8636 - loss: 0.3951 - val_accuracy: 0.7813 - val_loss: 0.7375 - learning_rate: 0.0010
Epoch 86/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8647 - loss: 0.3963 - val_accuracy: 0.7919 - val_loss: 0.6307 - learning_rate: 0.0010
Epoch 87/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 37ms/step - accuracy: 0.8661 - loss: 0.3894 - val_accuracy: 0.7995 - val_loss: 0.6687 - learning_rate: 0.0010
Epoch 88/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8705 - loss: 0.3767 - val_accuracy: 0.7956 - val_loss: 0.6626 - learning_rate: 0.0010
Epoch 89/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8655 - loss: 0.3786 - val_accuracy: 0.7938 - val_loss: 0.7064 - learning_rate: 0.0010
Epoch 90/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 39ms/step - accuracy: 0.8708 - loss: 0.3785 - val_accuracy: 0.8000 - val_loss: 0.7013 - learning_rate: 0.0010
Epoch 91/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 35ms/step - accuracy: 0.8668 - loss: 0.3851 - val_accuracy: 0.7917 - val_loss: 0.6687 - learning_rate: 0.0010
Epoch 92/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 36ms/step - accuracy: 0.8640 - loss: 0.3867 - val_accuracy: 0.7988 - val_loss: 0.6336 - learning_rate: 0.0010
Epoch 93/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.8602 - loss: 0.3998 - val_accuracy: 0.7795 - val_loss: 0.9010 - learning_rate: 0.0010
Epoch 94/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8661 - loss: 0.3733 - val_accuracy: 0.7984 - val_loss: 0.6607 - learning_rate: 0.0010
Epoch 95/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 36ms/step - accuracy: 0.8762 - loss: 0.3688 - val_accuracy: 0.7880 - val_loss: 0.7836 - learning_rate: 0.0010
Epoch 96/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 37ms/step - accuracy: 0.8844 - loss: 0.3356 - val_accuracy: 0.8092 - val_loss: 0.6792 - learning_rate: 0.0010
Epoch 97/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 38ms/step - accuracy: 0.8718 - loss: 0.3705 - val_accuracy: 0.8104 - val_loss: 0.6307 - learning_rate: 0.0010
Epoch 98/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 36ms/step - accuracy: 0.8714 - loss: 0.3702 - val_accuracy: 0.7763 - val_loss: 0.7733 - learning_rate: 0.0010
Epoch 99/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 41ms/step - accuracy: 0.8728 - loss: 0.3559 - val_accuracy: 0.7982 - val_loss: 0.6486 - learning_rate: 0.0010
Epoch 100/100
[1m317/317[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 35ms/step - accuracy: 0.8761 - loss: 0.3585 - val_accuracy: 0.8005 - val_loss: 0.6756 - learning_rate: 0.0010
53
precision recall f1-score support angry 0.92 0.85 0.88 634 calm 0.62 0.89 0.73 93 disgust 0.84 0.70 0.76 625 fear 0.85 0.71 0.77 619 happy 0.68 0.82 0.74 643 neutral 0.80 0.81 0.80 623 sad 0.78 0.81 0.79 611 surprise 0.84 0.91 0.87 492 accuracy 0.80 4340 macro avg 0.79 0.81 0.80 4340 weighted avg 0.81 0.80 0.80 4340