StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Lemercier, Jean-Marie; Welker, Simon; Richter, Julius; Gerkmann, Timo
doi:10.1109/TASLP.2023.3294692
000601120 001__ 601120
000601120 005__ 20250724132703.0
000601120 0247_ $$2doi$$a10.1109/TASLP.2023.3294692
000601120 0247_ $$2ISSN$$a2329-9290
000601120 0247_ $$2ISSN$$a2329-9304
000601120 0247_ $$2datacite_doi$$a10.3204/PUBDB-2024-00134
000601120 0247_ $$2arXiv$$aarXiv:2212.11851
000601120 0247_ $$2altmetric$$aaltmetric:140463270
000601120 0247_ $$2WOS$$aWOS:001037791600002
000601120 0247_ $$2openalex$$aopenalex:W4384080510
000601120 037__ $$aPUBDB-2024-00134
000601120 041__ $$aEnglish
000601120 082__ $$a400
000601120 088__ $$2arXiv$$aarXiv:2212.11851
000601120 1001_ $$00000-0002-8704-7658$$aLemercier, Jean-Marie$$b0
000601120 245__ $$aStoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
000601120 260__ $$aNew York, NY$$bIEEE$$c2023
000601120 3367_ $$2DRIVER$$aarticle
000601120 3367_ $$2DataCite$$aOutput Types/Journal article
000601120 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1719573399_2382625
000601120 3367_ $$2BibTeX$$aARTICLE
000601120 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000601120 3367_ $$00$$2EndNote$$aJournal Article
000601120 500__ $$aISSN 2329-9304 not unique: **2 hits**.
000601120 520__ $$aDiffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online.
000601120 536__ $$0G:(DE-HGF)POF4-633$$a633 - Life Sciences – Building Blocks of Life: Structure and Function (POF4-633)$$cPOF4-633$$fPOF IV$$x0
000601120 536__ $$0G:(DE-HGF)2019_IVF-HIDSS-0002$$aHIDSS-0002 - DASHH: Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter (2019_IVF-HIDSS-0002)$$c2019_IVF-HIDSS-0002$$x1
000601120 588__ $$aDataset connected to CrossRef, Journals: bib-pubdb1.desy.de
000601120 693__ $$0EXP:(DE-MLZ)NOSPEC-20140101$$5EXP:(DE-MLZ)NOSPEC-20140101$$eNo specific instrument$$x0
000601120 7001_ $$00000-0002-7870-4839$$aRichter, Julius$$b1
000601120 7001_ $$0P:(DE-H253)PIP1088388$$aWelker, Simon$$b2
000601120 7001_ $$00000-0002-8678-4699$$aGerkmann, Timo$$b3$$eCorresponding author
000601120 773__ $$0PERI:(DE-600)2737756-8$$a10.1109/TASLP.2023.3294692$$gVol. 31, p. 2724 - 2737$$p2724 - 2737$$tIEEE ACM transactions on audio, speech, and language processing$$v31$$x2329-9290$$y2023
000601120 7870_ $$0PUBDB-2024-04907$$aLemercier, Jean-Marie et.al.$$d2023$$iIsParent$$rarXiv:2212.11851$$tStoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
000601120 8564_ $$uhttps://bib-pubdb1.desy.de/record/601120/files/2212.11851v1.pdf$$yOpenAccess
000601120 8564_ $$uhttps://bib-pubdb1.desy.de/record/601120/files/StoRM_A_Diffusion-Based_Stochastic_Regeneration_Model_for_Speech_Enhancement_and_Dereverberation.pdf$$yRestricted
000601120 8564_ $$uhttps://bib-pubdb1.desy.de/record/601120/files/2212.11851v1.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000601120 8564_ $$uhttps://bib-pubdb1.desy.de/record/601120/files/StoRM_A_Diffusion-Based_Stochastic_Regeneration_Model_for_Speech_Enhancement_and_Dereverberation.pdf?subformat=pdfa$$xpdfa$$yRestricted
000601120 909CO $$ooai:bib-pubdb1.desy.de:601120$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000601120 9101_ $$0I:(DE-H253)_CFEL-20120731$$6P:(DE-H253)PIP1088388$$aCentre for Free-Electron Laser Science$$b2$$kCFEL
000601120 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1088388$$aDeutsches Elektronen-Synchrotron$$b2$$kDESY
000601120 9131_ $$0G:(DE-HGF)POF4-633$$1G:(DE-HGF)POF4-630$$2G:(DE-HGF)POF4-600$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bForschungsbereich Materie$$lVon Materie zu Materialien und Leben$$vLife Sciences – Building Blocks of Life: Structure and Function$$x0
000601120 9141_ $$y2023
000601120 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000601120 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000601120 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000601120 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bIEEE-ACM T AUDIO SPE : 2015
000601120 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000601120 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000601120 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000601120 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology
000601120 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5
000601120 9201_ $$0I:(DE-H253)CFEL-I-20161114$$kCFEL-I$$lFS-CFEL-1 (Group Leader: Henry Chapman)$$x0
000601120 9201_ $$0I:(DE-H253)FS-CFEL-1-CFEL-20210408$$kFS-CFEL-1-CFEL$$lFS-CFEL-1 Fachgruppe CFEL-Infrastrukt.$$x1
000601120 980__ $$ajournal
000601120 980__ $$aVDB
000601120 980__ $$aI:(DE-H253)CFEL-I-20161114
000601120 980__ $$aI:(DE-H253)FS-CFEL-1-CFEL-20210408
000601120 980__ $$aUNRESTRICTED
000601120 9801_ $$aFullTexts
guest :: login PUBDB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help