StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Lemercier, Jean-Marie; Welker, Simon; Richter, Julius; Gerkmann, Timo

doi:10.1109/TASLP.2023.3294692

Journal Article

PUBDB-2024-00134

StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Lemercier, J.-M. ; Richter, J. ; Welker, S.CFEL*DESY* ; Gerkmann, T. (Corresponding author)

2023
IEEE New York, NY

IEEE ACM transactions on audio, speech, and language processing 31, 2724 - 2737 (2023) [10.1109/TASLP.2023.3294692]

This record in other databases:

Please use a persistent id in citations: doi:10.1109/TASLP.2023.3294692 doi:10.3204/PUBDB-2024-00134

Report No.: arXiv:2212.11851

Abstract: Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online.

Classification:

ddc:400

Note: ISSN 2329-9304 not unique: **2 hits**.

Contributing Institute(s):

Research Program(s):

Experiment(s):

No specific instrument

Appears in the scientific report 2023

Database coverage:
Medline

;

; Current Contents - Engineering, Computing and Technology ; IF < 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Thomson Reuters Master Journal List ; Web of Science Core Collection

Click to display QR Code for this record

The record appears in these collections:
Private Collections > >CFEL > >FS-CFEL > FS-CFEL-1-CFEL
Private Collections > >CFEL > >FS-CFEL > CFEL-I
Document types > Articles > Journal Article
Public records
Publications database
OpenAccess

Linked articles:

Preprint Lemercier, J.-M. ; Richter, J. ; Welker, S. (Corresponding author)CFEL*DESY* ; Gerkmann, T.
StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
arXiv Files Fulltext by arXiv.org BibTeX | EndNote: XML, Text | RIS

Record created 2024-01-10, last modified 2025-07-24

Similar records

OpenAccess:

PDF

PDF (PDFA)
(additional files)
External link:

Fulltext by arXiv.org

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login PUBDB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help