This folder contains a pre-trained VLAAI model, in three different formats:
- TensorFlow SavedModel format (pretrained_model/vlaai)
- HDF5 format (pretrained_model/vlaai.h5)
- ONNX format (pretrained_model/vlaai.onnx)
This model was trained on the single-speaker stories dataset, 80 subjects that listened to 1 hour and 46 minutes on average (approximately 15 minutes per recording) for a total of 144 hours of EEG data.
The preprocessing used in this notebook is the same as proposed in the paper:
-
For EEG:
- High-pass filtering using a 1st order Butterworth filter with a cutoff frequency of 0.5Hz (using filtfilt)
- Downsampling to 1024 Hz
- Eyeblink artefact removal using a Multichannel Wiener filter
- Common average re-referencing
- Downsampling to 64Hz
-
For Speech:
- Envelope extraction using a gamma-tone filterbank
- Downsampling to 1024 Hz
- Downsampling to 64 Hz
Finally, data was split per recording into a training, validation and test set, following a 80/10/10 split. The validation and test set were extracted from the middle of the recording, to avoid any edge effects. Data is standardized per recording, using the mean and standard deviation of the training set.
The model was trained for at most 1000 epochs, with a batch size of 64. The Adam optimizer was used with a learning rate of 0.001 and negative Pearson r as a loss function on segments of 5 seconds.