StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Lemercier, Jean-Marie; Welker, Simon; Richter, Julius; Gerkmann, Timo

doi:10.1109/TASLP.2023.3294692

Items
Marc 21

001			601120
005			20250724132703.0
024	7	_	\|a 10.1109/TASLP.2023.3294692 \|2 doi
024	7	_	\|a 2329-9290 \|2 ISSN
024	7	_	\|a 2329-9304 \|2 ISSN
024	7	_	\|a 10.3204/PUBDB-2024-00134 \|2 datacite_doi
024	7	_	\|a arXiv:2212.11851 \|2 arXiv
024	7	_	\|a altmetric:140463270 \|2 altmetric
024	7	_	\|a WOS:001037791600002 \|2 WOS
024	7	_	\|a openalex:W4384080510 \|2 openalex
037	_	_	\|a PUBDB-2024-00134
041	_	_	\|a English
082	_	_	\|a 400
088	_	_	\|a arXiv:2212.11851 \|2 arXiv
100	1	_	\|a Lemercier, Jean-Marie \|0 0000-0002-8704-7658 \|b 0
245	_	_	\|a StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
260	_	_	\|a New York, NY \|c 2023 \|b IEEE
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1719573399_2382625 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
500	_	_	\|a ISSN 2329-9304 not unique: 2 hits.
520	_	_	\|a Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online.
536	_	_	\|a 633 - Life Sciences – Building Blocks of Life: Structure and Function (POF4-633) \|0 G:(DE-HGF)POF4-633 \|c POF4-633 \|f POF IV \|x 0
536	_	_	\|a HIDSS-0002 - DASHH: Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter (2019_IVF-HIDSS-0002) \|0 G:(DE-HGF)2019_IVF-HIDSS-0002 \|c 2019_IVF-HIDSS-0002 \|x 1
588	_	_	\|a Dataset connected to CrossRef, Journals: bib-pubdb1.desy.de
693	_	_	\|0 EXP:(DE-MLZ)NOSPEC-20140101 \|5 EXP:(DE-MLZ)NOSPEC-20140101 \|e No specific instrument \|x 0
700	1	_	\|a Richter, Julius \|0 0000-0002-7870-4839 \|b 1
700	1	_	\|a Welker, Simon \|0 P:(DE-H253)PIP1088388 \|b 2
700	1	_	\|a Gerkmann, Timo \|0 0000-0002-8678-4699 \|b 3 \|e Corresponding author
773	_	_	\|a 10.1109/TASLP.2023.3294692 \|g Vol. 31, p. 2724 - 2737 \|0 PERI:(DE-600)2737756-8 \|p 2724 - 2737 \|t IEEE ACM transactions on audio, speech, and language processing \|v 31 \|y 2023 \|x 2329-9290
787	0	_	\|a Lemercier, Jean-Marie et.al. \|d 2023 \|i IsParent \|0 PUBDB-2024-04907 \|r arXiv:2212.11851 \|t StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
856	4	_	\|u https://bib-pubdb1.desy.de/record/601120/files/2212.11851v1.pdf \|y OpenAccess
856	4	_	\|u https://bib-pubdb1.desy.de/record/601120/files/StoRM_A_Diffusion-Based_Stochastic_Regeneration_Model_for_Speech_Enhancement_and_Dereverberation.pdf \|y Restricted
856	4	_	\|u https://bib-pubdb1.desy.de/record/601120/files/2212.11851v1.pdf?subformat=pdfa \|x pdfa \|y OpenAccess
856	4	_	\|u https://bib-pubdb1.desy.de/record/601120/files/StoRM_A_Diffusion-Based_Stochastic_Regeneration_Model_for_Speech_Enhancement_and_Dereverberation.pdf?subformat=pdfa \|x pdfa \|y Restricted
909	C	O	\|o oai:bib-pubdb1.desy.de:601120 \|p openaire \|p open_access \|p VDB \|p driver \|p dnbdelivery
910	1	_	\|a Centre for Free-Electron Laser Science \|0 I:(DE-H253)_CFEL-20120731 \|k CFEL \|b 2 \|6 P:(DE-H253)PIP1088388
910	1	_	\|a Deutsches Elektronen-Synchrotron \|0 I:(DE-588b)2008985-5 \|k DESY \|b 2 \|6 P:(DE-H253)PIP1088388
913	1	_	\|a DE-HGF \|b Forschungsbereich Materie \|l Von Materie zu Materialien und Leben \|1 G:(DE-HGF)POF4-630 \|0 G:(DE-HGF)POF4-633 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-600 \|4 G:(DE-HGF)POF \|v Life Sciences – Building Blocks of Life: Structure and Function \|x 0
914	1	_	\|y 2023
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b IEEE-ACM T AUDIO SPE : 2015
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Thomson Reuters Master Journal List
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0111 \|2 StatID \|b Science Citation Index Expanded
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1160 \|2 StatID \|b Current Contents - Engineering, Computing and Technology
915	_	_	\|a IF < 5 \|0 StatID:(DE-HGF)9900 \|2 StatID
920	1	_	\|0 I:(DE-H253)CFEL-I-20161114 \|k CFEL-I \|l FS-CFEL-1 (Group Leader: Henry Chapman) \|x 0
920	1	_	\|0 I:(DE-H253)FS-CFEL-1-CFEL-20210408 \|k FS-CFEL-1-CFEL \|l FS-CFEL-1 Fachgruppe CFEL-Infrastrukt. \|x 1
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-H253)CFEL-I-20161114
980	_	_	\|a I:(DE-H253)FS-CFEL-1-CFEL-20210408
980	_	_	\|a UNRESTRICTED
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login PUBDB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help