赞
踩
1.1 the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset RAVDESS数据集
1.2 the Toronto emotional speech set (TESS) dataset TESS数据集
数据集样本个数:4240个
01 = neutral,
02 = calm,
03 = happy,
04 = sad,
05 = angry,
06 = fearful,
07 = disgust,
08 = surprised
音频特征处理使用MFCC,获得40维的一维向量
提取过程:连续语音--预加重--加窗分帧--FFT--MEL滤波器组--对数运算--DCT
识别模型是基于cnn+mfcc的卷积神经网络
模型结构:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
==========
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。