Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation - Equipe Signal, Statistique et Apprentissage Accéder directement au contenu
Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2021

Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation

Résumé

The goal of singing voice separation is to recover the vocals signal from music mixtures. State-of-the-art performance is achieved by deep neural networks trained in a supervised fashion. Since training data are scarce and music signals are extremely diverse, it remains challenging to achieve high separation quality across various recording and mixing conditions as well as music styles. In this paper, we investigate to which extent the separation can be improved when lyrics transcripts are used as additional information. To this end, we propose a joint approach to phoneme level lyrics alignment and text-informed singing voice separation. It is based on DTW-attention, a new monotonic attention mechanism including a differentiable approximation of dynamic time warping. Experimental results show that the method can align phonemes with mixed singing voice with high precision given accurate transcripts. It also achieves competitive results on challenging word level alignment test sets using less training data than state-of-the-art methods. Sequential alignment and informed separation lead to improved separation quality according to objective measures. Text information helps preserving spectral phoneme properties in the separated voice signals.
Fichier principal
Vignette du fichier
2021_Phoneme_level_lyrics_alignment_and_text-informed_singing_voice_separation.pdf (2.18 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03255334 , version 1 (03-08-2021)

Identifiants

Citer

Kilian Schulze-Forster, Clement S J Doire, Gael Richard, Roland Badeau. Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation. IEEE/ACM Transactions on Audio, Speech and Language Processing, inPress, ⟨10.1109/TASLP.2021.3091817⟩. ⟨hal-03255334⟩
465 Consultations
1462 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More