ENRICH

IS2018 Demo: Speech intelligibility enhancement based on a non-causal Wavenet-like model

Mr. Muhammed Shifas PV
Speech Signal Processing Lab (SSPL)
University of Crete (UoC), Greece

Email: shifaspv@csd.uoc.gr


Abstract :-Low speech intelligibility in noisy listening conditions makes more difficult our communication with others. Various strate- gies have been suggested to modify a speech signal before it is presented in a noisy listening environment with the goal to increase its intelligibility. A state-of-the art approach, referred to as Spectral Shaping and Dynamic Range Compression (SSDRC), relies on modifying spectral and temporal structure of the clean speech and has been shown to considerably improve the intelligibility of speech in noisy listening conditions. In this paper, we present a non-causal Wavenet-like model for mapping clean speech samples to samples generated by SSDRC. A successful non-linear mapping function has the potential to be used a) in improving the intelligibility of noisy speech and b) in the Wavenet-based speech synthesizers as a model based intelligibility improvement layer. Objective and subjective results show that the Wavenet-based mapping function is able to reproduce the intelligibility gains of SSDRC, while by far it improves the quality of the modified signal compared to the quality obtained by SSDRC.


Few samples from the trained model are displayed below:.


Plain speech SSDRC wSSSDRC

Acknowledgment: This work was funded by the E.U. Horizon2020 Grant Agreement 675324, Marie Sklodowska-Curie Innovative Training Network, ENRICH.