Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder - PCMD Accéder directement au contenu
Article Dans Une Revue Transactions of the International Society for Music Information Retrieval (TISMIR) Année : 2021

Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder

Résumé

In this article, we propose a new method of sound transformation based on control parameters that are intuitive and relevant for musicians. This method uses a variational autoencoder (VAE) model that is first trained in an unsupervised manner on a large dataset of synthesizer sounds. Then, a perceptual regularization term is added to the loss function to be optimized, and a supervised fine-tuning of the model is carried out using a small subset of perceptually labeled sounds. The labels were obtained from a perceptual test of Verbal Attribute Magnitude Estimation in which listeners rated this training sound dataset along eight perceptual dimensions (French equivalents of metallic, warm, breathy, vibrating, percussive, resonating, evolving, aggressive). These dimensions were identified as relevant for the description of synthesizer sounds in a first Free Verbalization test. The resulting VAE model was evaluated by objective reconstruction measures and a perceptual test. Both showed that the model was able, to a certain extent, to capture the acoustic properties of most of the perceptual dimensions and to transform sound timbre along at least two of them (aggressive and vibrating) in a perceptually relevant manner. Moreover, it was able to generalize to unseen samples even though a small set of labeled sounds was used.
Fichier principal
Vignette du fichier
Roche-TISMIR-21.pdf (2.78 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03247371 , version 1 (03-06-2021)

Identifiants

Citer

Fanny Roche, Thomas Hueber, Maëva Garnier, Samuel Limier, Laurent Girin. Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder. Transactions of the International Society for Music Information Retrieval (TISMIR), 2021, 4, pp.52 - 66. ⟨10.5334/tismir.76⟩. ⟨hal-03247371⟩
265 Consultations
150 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More