ENRICH

IEEE_Letter Demo: A fully recurrent feature extraction for single channel speech enhancement

Mr. Muhammed Shifas PV
Speech Signal Processing Lab (SSPL)
University of Crete (UoC), Greece

Email: shifaspv@csd.uoc.gr


Abstract :-Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of the vanilla CNN modules has been limited by the dimensionality constraint of the convolutional kernels that can be integrated – thereby failed to adequately model the noise context information at the feature extraction stage. To this end, adding recurrency factor into the feature extracting CNN layers, we introduce a robust context-aware feature extraction strategy for single- channel speech enhancement. As being robust in capturing the local statistics of noise attributes in the speech spectra, the suggested model is higly effective on differentiating speech cues, even at very noisy conditions. When evaluated against enhancement models using vanilla CNN modules, in unseen noise condition, the suggested model with recurrency in the feature extraction layers has produced a Segmental SNR (SSNR) gain of up to 1.5 dB, while the parameters to be optimized are reduced by 25%.


Few samples from the trained model are displayed below:.


Noisy speech CNN_FC-SE CNN_LSTM-SE gruCNN-SE clean speech

Acknowledgment: This work was funded by the E.U. Horizon2020 Grant Agreement 675324, Marie Sklodowska-Curie Innovative Training Network, ENRICH.