Causal Diffusion Models for Generalized Speech Enhancement

Richter, Julius; Gerkmann, Timo; Welker, Simon; Peer, Tal; Lemercier, Jean-Marie; Lay, Bunlong

doi:10.1109/OJSP.2024.3379070

Journal Article

PUBDB-2024-06733

Causal Diffusion Models for Generalized Speech Enhancement

Richter, J. (Corresponding author) ; Welker, S.CFEL*DESY* ; Lemercier, J.-M. ; Lay, B. ; Peer, T. ; Gerkmann, T. (Corresponding author)

2024
IEEE [New York, NY]

IEEE open journal of signal processing 5, 780-789 (2024) [10.1109/OJSP.2024.3379070]

This record in other databases:

Please use a persistent id in citations: doi:10.1109/OJSP.2024.3379070 doi:10.3204/PUBDB-2024-06733

Abstract: In this work, we present a causal speech enhancement system that is designed to handledifferent types of corruptions. This paper is an extended version of our contribution to the “ICASSP 2023Speech Signal Improvement Challenge”. The method is based on a generative diffusion model which hasbeen shown to work well in scenarios beyond speech-in-noise, such as missing data and non-additivecorruptions. We guarantee causal processing with an algorithmic latency of 20 ms by modifying the networkarchitecture and removing non-causal normalization techniques. To train and test our model, we generate anew corrupted speech dataset which includes additive background noise, reverberation, clipping, packet loss,bandwidth reduction, and codec artifacts. We compare the causal and non-causal versions of our method toinvestigate the impact of causal processing and we assess the gap between specialized models trained on aparticular corruption type and the generalized model trained on all corruptions. Although specialized modelsand non-causal models have a small advantage, we show that the generalized causal approach does not sufferfrom a significant performance penalty, while it can be flexibly employed for real-world applications wheredifferent types of distortions may occur.

Classification:

ddc:621.3

Contributing Institute(s):

Research Program(s):

Experiment(s):

No specific instrument

Appears in the scientific report 2024

Database coverage:
Medline

;

Creative Commons Attribution-NonCommercial-NoDerivs CC BY-NC-ND 4.0

;

;

; Article Processing Charges ; Clarivate Analytics Master Journal List ; DOAJ Seal ; Emerging Sources Citation Index ; Fees ; IF < 5 ; JCR ; SCOPUS ; Web of Science Core Collection

Click to display QR Code for this record

The record appears in these collections:
Private Collections > >CFEL > >FS-CFEL > CFEL-I
Private Collections > >DESY > >FS > FS-CFEL-1
Document types > Articles > Journal Article
Public records
Publications database
OpenAccess

Record created 2024-11-15, last modified 2025-07-23

Similar records

OpenAccess:

PDF

PDF (PDFA)

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login PUBDB
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help