DeFTAN-AA: Array geometry agnostic
multichannel speech enhancement
[ Paper | Video ]

Dongheon Lee1, and Jung-Woo Choi1
1School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

overview

Abstract: We propose an array geometry agnostic multichannel speech enhancement model, which is trained on a single microphone array but can enhance speech in various arrays with different shapes and numbers of microphones. To enable array agnostic processing, the model employs a gated split dense block (GSDB) that separates foreground speech and background noise regardless of array geometry. Furthermore, to design an array-agnostic encoder compatible with different numbers of microphones, we introduce the spatial transformer (ST) that aggregates spatial information by channel-wise self-attention. The proposed space-object cross-attention (SOCA) block alleviates overfitting to a specific array configuration through cross-attention between spatial features and object features. Experimental results demonstrate the efficacy of the proposed model across various array geometries in both simulated and real-world datasets.


Overall architecture (click to expand)



Experimental result of simulated dataset (click to expand)



Real-world experimental result (click to expand)


Real-world demo


Measured speech Enhanced speech (Circular array) Enhanced speech (Rectangular array)

Audio samples


# Spectrogram and audio clips
1 Noisy Reference Circular Rectangular
Linear Tetrahedron Random Clean
2 Noisy Reference Circular Rectangular
Linear Tetrahedron Random Clean
3 Noisy Reference Circular Rectangular
Linear Tetrahedron Random Clean