Speech-Emotion-Recognition-using-Deep-Learning-with-Librosa-Library/Datasets_Description.txt at main · Ramyadeveloper59/Speech-Emotion-Recognition-using-Deep-Learning-with-Librosa-Library · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Speech emotion recognition (SER) systems are a group of techniques that classify and analyse speech signals in order to identify the inherent emotions, according to our definition.

Here are the four most well-known English datasets: Crema, Ravdess, Savee, and Tess. Each one of them includes audio in.wav file along with a few key labels.

Here is Ravdess Dataset:

Here is the filename identifiers as per the official RAVDESS website:

Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
Vocal channel (01 = speech, 02 = song).
Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
Repetition (01 = 1st repetition, 02 = 2nd repetition).
Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Here is Crema Dataset.

The third component is responsible for the emotion label:

SAD - sadness;
ANG - angry;
DIS - disgust;
FEA - fear;
HAP - happy;
NEU - neutral.

Here is Tess Dataset.

Similar to Crema, the file's name includes the emotion it represents.

Here is Savee Dataset.

The audio recordings in this collection have names with prefix letters that correspond to the following descriptions of the emotion classes:

'a' = 'anger'
'd' = 'disgust'
'f' = 'fear'
'h' = 'happiness'
'n' = 'neutral'
'sa' = 'sadness'
'su' = 'surprise'