Completed Ph.D. theses

The intellectual property of the theses is jointly owned by the University of Crete and ICS-FORTH or by the University of Crete and Orange Labs, where indicated.

2022 2015 2014 2011 2010 2007



2022

Muhammed Shifas P.V., Neural Networks for the Quality and Intelligibility Enhancement of Speech [PDF] - funded by Horizon2020

Abstract: Speech is the most effective way to communicate ideas generated in human minds. However, spoken communication in real life is often affected by noise in the surroundings which can substantially reduce the intelligibility and perceived quality of the signal. Techniques to enhance the communication have been proposed in the past and successfully tested in modern engines like Amazon Alexa, allowing it to operate in adverse conditions. The ambient noise can disrupt both signal acquisition by a device as well as speech perception by the listener. Speech enhancement (SE) techniques are developed to restore speech from its disrupted observations, and listening enhancement (LE) techniques are designed to improve the perceived intelligibility by altering the speech before its presentation in noise as the naturally produced speech is not always very intelligible. Often SE and LE systems are operated as two independent modules in modern devices , which limit their performance. The effort in this thesis is to combine the SE and LE enhancement techniques to have an end-to-end system for communication applications. We approach the problem from neural networking perspective. As such, multiple novel architectures for SE and LE were invented, and the concepts from those models have been used to build the final end-to-end system.
See More

2015

Maria C. Koutsogiannaki, Intelligibility enhancement of Casual speech based on Clear speech properties [PDF] - funded by ICS-FORTH

Abstract: In adverse listening conditions (e.g. presence of noise, hearing-impaired listener etc.) people adjust their speech in order to overcome the communication difficulty and successfully deliver their message. This remarkable adjustment produces different speaking styles compared to unobstructed speech (casual speech) that vary among speakers and conditions, but share a common characteristic; high intelligibility. Developing algorithms that exploit acoustic features of intelligible human speech could be beneficial for speech technology applications that seek methods to enhance the intelligibility of “speaking-devices”. Besides the commercial orientation (e.g., mobile telephone, GPS, customer service systems) of these applications, most important is their medical context, providing assistive communication to people with speech or hearing deficits. However, current speech technology is deaf, meaning that it cannot adjust, like humans do, to the dynamically changing real environments or to the listener’s specificity.
See More

2014

George P. Kafentzis, Adaptive Sinusoidal Models for Speech with Applications in Speech Modifications and Audio Analysis [PDF] - funded by Orange Labs

Abstract: Sinusoidal Modeling is one of the most widely used parametric methods for speech and audio signal processing. The accurate estimation of sinusoidal parameters (amplitudes, frequencies, and phases) is a critical task for close representation of the analyzed signal. In this thesis, based on recent advances in sinusoidal analysis, we propose high resolution adaptive sinusoidal models for analysis, synthesis, and modifications systems of speech. Our goal is to provide systems that represent speech in a highly accurate and compact way.
See More

2011

Maria Markaki, Selection of Relevant Features for Audio Classification tasks [PDF] - funded by ICS-FORTH

Advances in time-frequency distributions and spectral analysis techniques (i.e., for the estimation of amplitude and/or frequency modulations) allow a better representation of non-stationary signals like speech, highlighting their fine structure and dynamics. Although such representations are very useful for analysis purposes, they complicate the classification tasks due to the large number of parameters extracted from the signal (“curse of dimensionality”). For such tasks, a significant dimensionality reduction is required.
See More

2010

Andre Holzapfel, Similarity methods for computational ethnomusicology [PDF] - funded by ICS-FORTH

Abstract: The field of computational ethnomusicology has drawn growing attention by researchers in the music information retrieval community. In general, subjects are considered that are related to the processing of traditional forms of music, often with the goal to support studies in the field of musicology with computational means. Tools have been proposed that make access to large digital collections of traditional music easier, for example by automatically detecting a specific kind of similarity between pieces or by automatically segmenting data into partitions that are either relevant or irrelevant for further investigation. In this thesis, the focus lies on music of the Eastern Mediterranean, and specifically on traditional music of Greece and Turkey. At the beginning of the thesis related work, the task was defined which directed the aspects of the necessary research activities.
See More

Yannis Pantazis, Decomposition of AM-FM signals with applications in speech processing [PDF] - funded by ICS-FORTH/Orange Labs

Abstract: During the last decades, sinusoidal model gained a lot of popularity since it is able to represent non-stationary signals very accurately. The estimation of the instantaneous components (i.e. instantaneous amplitude, instantaneous frequency and instantaneous phase) is an active area of research. In this thesis, we develop and test models and algorithms for the estimation of the instantaneous components of sinusoidal representation. Our goal is to reduce the estimation error due to the non-stationary character of the analyzed signals by taking advantage of time-domain information. Thus, we re-introduce a time-varying model referred to as QHM which is able to adjust its frequency values closer to the true frequency values. We further show that an iterative scheme based on QHM produce statistically efficient sinusoidal parameter estimation. Moreover, we extend QHM to chirp QHM (cQHM) which is able to capture linear evolution of instantaneous frequency quite satisfactorily.
See More

2007

Yannis Agiomyrgiannakis, Sinusoidal Coding of Speech for Voice over IP [PDF] - funded by ICS-FORTH

Abstract:It is widely accepted that Voice-over-Internet-Protocol (VoIP) will dominate wireless and wireline voice communications in the near future. Traditionally, a minimum level of Quality-of-Service is achieved by careful traffic monitoring and network fine-tuning. However, this solution is not feasible when there is no possibility of controlling/monitoring the parameters of the network. For example, when speech traffic is routed through Internet there are increased packet losses due to network delays and the strict end-to-end delay requirements for voice communication. Most of today’s speech codecs were not initially designed to cope with such conditions. One solution is to introduce channel coding at the expense of end-to-end delay. Another solution is to perform joint source/channel coding of speech by designing speech codecs which are natively robust to increased packet losses. This thesis proposes a framework for developing speech codecs which are robust to packet losses.
See More