Synthetic Training Set Generation using Text-To-Audio Models for Environmental Sound Classification

Francesca Ronchini¹, Luca Comanducci¹, and Fabio Antonacci¹

¹ Dipartimento di Elettronica, Informazione e Bioingegneria - Politecnico di Milano
Paper accepted @ DCASE Workshop 2024

Abstract
Install & Usage
Link to additional material
Additional information

Abstract

In the past few years, text-to-audio models have emerged as a significant advancement in automatic audio generation. Although they represent impressive technological progress, the effectiveness of their use in the development of audio applications remains uncertain. This paper aims to investigate these aspects, specifically focusing on the task of classification of environmental sounds. This study analyzes the performance of two different environmental classification systems when data generated from text-to-audio models is used for training. Two cases are considered: a) when the training dataset is augmented by data coming from two different text-to-audio models; and b) when the training dataset consists solely of synthetic audio generated. In both cases, the performance of the classification task is tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, whereas the performance of the models drops when relying on only generated audio.

Install & Usage

For generating the data, we used AudioLDM2 and AudioGen.

Intalling AudioLDM2

Please refer to the AudioLDM2 GitHub repo and follow the installation instructions. For this study, we used the official checkpoints available in the Hugging Face 🧨 Diffusers and the audioldm checkpoint.

When AudioLDM2 has been installed, you can generate the audio files running the script audio_generation/class_generation_audioldm.py Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

After that, you can run the script with the command:

cd audio_generation
python class_generation_audioldm.py

Intalling AudioGen

Please refer to the AudioGen GitHub repo and follow the installation instructions.

When AudioGen has been installed, you can generate the audio files running the script audio_generation/class_generation_audiogen.py. Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

cd audio_generation
python class_generation_audiogen.py

Run the code

When all the data have been generated, you can reproduce the experiments.

First, install all the packages required by the system. Run the following command on your terminal to install all the packages needed:

pip install -r requirements.txt

When all packages have been installed, you need to specify which dataset to use following the instructions on the config/default.yaml file.

After all the parameters have been defined, you can run the code with the following command:

python main.py

Link to additional material

Additional material and audio samples are available on the companion website.

Additional information

For more details: Francesca Ronchini, Luca Comanducci, and Fabio Antonacci, "Synthetic training set generation using text-to-audio models for environmental sound classification", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024), October 2024

If you use code or comments from this work, please cite our paper:

@inproceedings{Ronchini2024,
    author = "Ronchini, Francesca and Comanducci, Luca and Antonacci, Fabio",
    title = "Synthetic Training Set Generation using Text-To-Audio Models for Environmental Sound Classification",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)",
    address = "Tokyo, Japan",
    month = "October",
    year = "2024",
    pages = "126--130",
}

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
audio_generation		audio_generation
config		config
docs		docs
CRNN_baseline.py		CRNN_baseline.py
NetworkData.py		NetworkData.py
README.md		README.md
TrainClass.py		TrainClass.py
cross_validation_process.py		cross_validation_process.py
data_augmentation_PS.py		data_augmentation_PS.py
data_augmentation_PS_random.py		data_augmentation_PS_random.py
data_augmentation_TS.py		data_augmentation_TS.py
data_preprocess.py		data_preprocess.py
inference.py		inference.py
log.py		log.py
main.py		main.py
model.py		model.py
paper_plots.ipynb		paper_plots.ipynb
requirements.txt		requirements.txt
training.py		training.py
training_data_processing.py		training_data_processing.py
urban_sound_dataset.py		urban_sound_dataset.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Training Set Generation using Text-To-Audio Models for Environmental Sound Classification

Abstract

Install & Usage

Intalling AudioLDM2

Intalling AudioGen

Run the code

Link to additional material

Additional information

About

Releases

Packages

Contributors 2

Languages

RonFrancesca/Text-to-Audio-ESC

Folders and files

Latest commit

History

Repository files navigation

Synthetic Training Set Generation using Text-To-Audio Models for Environmental Sound Classification

Abstract

Install & Usage

Intalling AudioLDM2

Intalling AudioGen

Run the code

Link to additional material

Additional information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages