-
Install libraries from requirements.txt
-
Datasets can be downloaded from the sources mentioned in the paper. Language dataset mapping (Italian:EMOVO,Persian:ShEMO, Urdu: Urdu SER)
-
Install torchaudio:
pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
-
For Hyperbolic implementation we use implementations from Hazy Research: https://github.com/HazyResearch/hgcn. Clone the hgcn repository and enter inside the directory
git clone https://github.com/HazyResearch/hgcn.git
cd hgcn
-
The datasets can be downloaded from the respective sources mentioned in the paper. Load the datasets in the respective folders mentioned below as this is the format followed by (dataset)_train.csv for reading the speech files.
- URDU: create a folder named 'urdu-language-speech-dataset', load data here.
- ShEMO: create a folder named 'shemo-persian-speech-emotion-detection-database', load data here.
- EMOVO: create a folder named 'emovo', load data here.
-
For data loading, create a folder named 'data' and paste the dataset specific files taken from 'data_files' folder. Similarly vocab.json can also be replaced with the dataset specific file Example command: !cp /data_files/train_shemo.csv /data/train.csv
- model_run.py includes processing and trainging functions used by the model.
- For trying different hyperbolic variants:(HVIB, HVIB-C and ADAPT-VIB), change self.c in the function Wav2Vec2ClassificationHeadViBERTHyperbolic() {lines:575-582} accordingly. Dataset-specific hyperbolicity can also be changed here.
- To run base model or VIB replace trainer_hyp_vib.train() with trainer_base.train() and trainer_hyp_vib.train() respectively.