StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Lemercier, Jean-Marie; Welker, Simon; Richter, Julius; Gerkmann, Timo
000611421 001__ 611421
000611421 005__ 20240630062350.0
000611421 0247_ $$2arXiv$$aarXiv:2212.11851
000611421 0247_ $$2altmetric$$aaltmetric:140463270
000611421 037__ $$aPUBDB-2024-04907
000611421 041__ $$aEnglish
000611421 088__ $$2arXiv$$aarXiv:2212.11851
000611421 082__ $$a400
000611421 1001_ $$00000-0002-8704-7658$$aLemercier, Jean-Marie$$b0
000611421 245__ $$aStoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
000611421 260__ $$c2023
000611421 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1719572962_2382625
000611421 3367_ $$2ORCID$$aWORKING_PAPER
000611421 3367_ $$028$$2EndNote$$aElectronic Article
000611421 3367_ $$2DRIVER$$apreprint
000611421 3367_ $$2BibTeX$$aARTICLE
000611421 3367_ $$2DataCite$$aOutput Types/Working Paper
000611421 500__ $$aISSN 2329-9304 not unique: **2 hits**.Published in IEEE/ACM Transactions on Audio, Speech and Language Processing, 2023
000611421 520__ $$aDiffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).
000611421 536__ $$0G:(DE-HGF)POF4-633$$a633 - Life Sciences – Building Blocks of Life: Structure and Function (POF4-633)$$cPOF4-633$$fPOF IV$$x0
000611421 536__ $$0G:(DE-HGF)2019_IVF-HIDSS-0002$$aHIDSS-0002 - DASHH: Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter (2019_IVF-HIDSS-0002)$$c2019_IVF-HIDSS-0002$$x1
000611421 588__ $$aDataset connected to arXivarXiv, CrossRef, Journals: bib-pubdb1.desy.de
000611421 693__ $$0EXP:(DE-MLZ)NOSPEC-20140101$$5EXP:(DE-MLZ)NOSPEC-20140101$$eNo specific instrument$$x0
000611421 7001_ $$00000-0002-7870-4839$$aRichter, Julius$$b1
000611421 7001_ $$0P:(DE-H253)PIP1088388$$aWelker, Simon$$b2$$eCorresponding author
000611421 7001_ $$00000-0002-8678-4699$$aGerkmann, Timo$$b3
000611421 8564_ $$uhttps://bib-pubdb1.desy.de/record/611421/files/2212.11851v2.pdf$$yRestricted
000611421 8564_ $$uhttps://bib-pubdb1.desy.de/record/611421/files/2212.11851v2.pdf?subformat=pdfa$$xpdfa$$yRestricted
000611421 909CO $$ooai:bib-pubdb1.desy.de:611421$$pVDB
000611421 9101_ $$0I:(DE-H253)_CFEL-20120731$$6P:(DE-H253)PIP1088388$$aCentre for Free-Electron Laser Science$$b2$$kCFEL
000611421 9101_ $$0I:(DE-588b)2008985-5$$6P:(DE-H253)PIP1088388$$aDeutsches Elektronen-Synchrotron$$b2$$kDESY
000611421 9131_ $$0G:(DE-HGF)POF4-633$$1G:(DE-HGF)POF4-630$$2G:(DE-HGF)POF4-600$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bForschungsbereich Materie$$lVon Materie zu Materialien und Leben$$vLife Sciences – Building Blocks of Life: Structure and Function$$x0
000611421 915__ $$0StatID:(DE-HGF)0580$$2StatID$$aPublished
000611421 9201_ $$0I:(DE-H253)CFEL-I-20161114$$kCFEL-I$$lFS-CFEL-1 (Group Leader: Henry Chapman)$$x0
000611421 980__ $$apreprint
000611421 980__ $$aVDB
000611421 980__ $$aI:(DE-H253)CFEL-I-20161114
000611421 980__ $$aUNRESTRICTED
guest :: login PUBDB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help