DeFTMamba: Multichannel universal sound separation and polyphonic audio classification

Dongheon Lee1, and Jung-Woo Choi1
1School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

overview

Abstract: This paper presents a framework for universal sound separation and polyphonic audio classification, addressing the challenges of separating and classifying individual sound sources from a multichannel mixture. The proposed framework, DeFT-Mamba, utilizes the dense frequency-time attentive network (DeFTAN) combined with Mamba to extract sound objects, capturing the local time-frequency relations by gated convolution block and the global time-frequency relations by position-wise Hybrid Mamba. DeFT-Mamba surpasses existing separation and classification networks by a large margin, particularly in complex scenarios with in-class polyphony. Additionally, a classification-based source counting method is introduced to identify the presence of multiple sources, which outperforms conventional threshold-based approaches. A separation refinement tuning is also proposed to improve performance further. The proposed framework is trained and tested on a multichannel universal sound separation dataset developed in this work to mimic realistic environments with moving sources and varying onsets and offsets of polyphonic events.


Dataset explanation (click to expand)




Universal sound separation performance comparison (click to expand)



Polyphonic audio classification performance comparison (click to expand)



Source counting performance comparison (click to expand)

Number of sound sources = 2
Mixed sound Ground truth 1 Ground truth 2
SpatialNet
DeFTMamba
Number of sound sources = 3
Mixed sound Ground truth 1 Ground truth 2 Ground truth 3
SpatialNet
DeFTMamba
Number of sound sources = 4
Mixed sound Ground truth 1 Ground truth 2 Ground truth 3 Ground truth 4
SpatialNet
DeFTMamba
In-class polyphony audio (source 1 & 2)
Mixed sound Ground truth 1 Ground truth 2 Ground truth 3
SpatialNet
DeFTMamba