DeFTAN: Dense Frequency-Time Attentive Network for
Multichannel Speech Enhancement and Separation

Demo Videos

Demo Video 1: Real-world speech separation

Demo Video 2: Unified Auditory Scene Analysis and Separation

DeFTAN Series

The DeFTAN (Dense Frequency-Time Attentive Network) series comprises advanced deep learning models for multichannel speech enhancement and separation, enhancing speech quality in noisy, reverberant environments by using spatial, frequency, and temporal information. Each model addresses specific challenges like real-time processing, array-agnostic enhancement, and universal source separation, applying state-of-the-art techniques for high performance. This work broadens Smart Sound System Lab.’s research to applications in multichannel sound source separation.

DeFT-AN:
DeFT-AN: Dense frequency-time attentive network for multichannel speech enhancement
Author: Dongheon Lee, and Jung-Woo Choi | IEEE SPL & ICASSP 2023
DeFTAN-RT
DeFTAN-II
DeFTAN-II: Efficient multichannel speech enhancement with subgroup processing
Author: Dongheon Lee, and Jung-Woo Choi | IEEE/ACM TASLP 2024
DeFTAN-AA
DeFTAN-AA: Array Geometry Agnostic Multichannel Speech Enhancement
Author: Dongheon Lee, and Jung-Woo Choi | Interspeech 2024
DeFT-Mamba
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
Author: Dongheon Lee, and Jung-Woo Choi | Submitted to ICASSP 2025