Skip to content

RonFrancesca/Text-to-Audio-ESC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic Training Set Generation using Text-To-Audio Models for Environmental Sound Classification

Francesca Ronchini1, Luca Comanducci1, and Fabio Antonacci1

1 Dipartimento di Elettronica, Informazione e Bioingegneria - Politecnico di Milano
Paper accepted @ DCASE Workshop 2024

arXiv

Abstract

In the past few years, text-to-audio models have emerged as a significant advancement in automatic audio generation. Although they represent impressive technological progress, the effectiveness of their use in the development of audio applications remains uncertain. This paper aims to investigate these aspects, specifically focusing on the task of classification of environmental sounds. This study analyzes the performance of two different environmental classification systems when data generated from text-to-audio models is used for training. Two cases are considered: a) when the training dataset is augmented by data coming from two different text-to-audio models; and b) when the training dataset consists solely of synthetic audio generated. In both cases, the performance of the classification task is tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, whereas the performance of the models drops when relying on only generated audio.

Install & Usage

For generating the data, we used AudioLDM2 and AudioGen.

Intalling AudioLDM2

Please refer to the AudioLDM2 GitHub repo and follow the installation instructions. For this study, we used the official checkpoints available in the Hugging Face 🧨 Diffusers and the audioldm checkpoint.

When AudioLDM2 has been installed, you can generate the audio files running the script audio_generation/class_generation_audioldm.py Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

After that, you can run the script with the command:

cd audio_generation
python class_generation_audioldm.py

Intalling AudioGen

Please refer to the AudioGen GitHub repo and follow the installation instructions.

When AudioGen has been installed, you can generate the audio files running the script audio_generation/class_generation_audiogen.py. Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

cd audio_generation
python class_generation_audiogen.py

Run the code

When all the data have been generated, you can reproduce the experiments.

First, install all the packages required by the system. Run the following command on your terminal to install all the packages needed:

pip install -r requirements.txt

When all packages have been installed, you need to specify which dataset to use following the instructions on the config/default.yaml file.

After all the parameters have been defined, you can run the code with the following command:

python main.py

Link to additional material

Additional material and audio samples are available on the companion website.

Additional information

For more details: Francesca Ronchini, Luca Comanducci, and Fabio Antonacci, "Synthetic training set generation using text-to-audio models for environmental sound classification", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024), October 2024

If you use code or comments from this work, please cite our paper:

@inproceedings{Ronchini2024,
    author = "Ronchini, Francesca and Comanducci, Luca and Antonacci, Fabio",
    title = "Synthetic Training Set Generation using Text-To-Audio Models for Environmental Sound Classification",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)",
    address = "Tokyo, Japan",
    month = "October",
    year = "2024",
    pages = "126--130",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published