DeFTAN-AA: Array geometry agnostic
multichannel speech enhancement
[ Paper | Video ]

Dongheon Lee¹, and Jung-Woo Choi¹
¹School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Abstract: We propose an array geometry agnostic multichannel speech enhancement model, which is trained on a single microphone array but can enhance speech in various arrays with different shapes and numbers of microphones. To enable array agnostic processing, the model employs a gated split dense block (GSDB) that separates foreground speech and background noise regardless of array geometry. Furthermore, to design an array-agnostic encoder compatible with different numbers of microphones, we introduce the spatial transformer (ST) that aggregates spatial information by channel-wise self-attention. The proposed space-object cross-attention (SOCA) block alleviates overfitting to a specific array configuration through cross-attention between spatial features and object features. Experimental results demonstrate the efficacy of the proposed model across various array geometries in both simulated and real-world datasets.

Overall architecture (click to expand)

Experimental result of simulated dataset (click to expand)

Real-world experimental result (click to expand)

Real-world demo

Measured speech	Enhanced speech (Circular array)	Enhanced speech (Rectangular array)

Audio samples

#	Spectrogram and audio clips
1	Noisy	Reference	Circular	Rectangular

	Linear	Tetrahedron	Random	Clean

2	Noisy	Reference	Circular	Rectangular

	Linear	Tetrahedron	Random	Clean

3	Noisy	Reference	Circular	Rectangular

	Linear	Tetrahedron	Random	Clean