ENRICH: EU Project 675324 Marie Curie (MSCA) Innovative Training Network 2016-2020
Greek Harvard Sentences
A Harvard-based corpus for speech technology and audiology
The current material consists of 720 sentences of variable syntactic structure, designed according to the
following criteria. Each sentence comprises exactly 5 keywords which are almost always content words, and
(optionally) 1 to 4 non-keywords which are pronouns and other function words; hence total sentence length
ranges strictly from 5 to 9 words. All words contain one, two or -maximally- three syllables, and have been
selected so that the sentences are meaningful and resemble everyday language. Keywords have been combined so
that the sentences are semi-predictable. Although a number of the original Harvard sentences have been
translated into Greek, the majority of the sentences in the present corpus are authentic.
This is still work in progress, so please check for updates and additional recordings.
You can find below:
A WORD file that contains the 720 sentences as they should appear to listeners.
An EXCEL file that contains the Greek orthography, the phonetic transcription and important meta-data such
as number of words, syllables and phonemes per sentence as well as which are the keywords and how many syllables
they contain.
As the material is not balanced yet, this information may be useful in deciding which sentences to select for
intelligibility tests.
A ZIP file that contains recordings of the material: one female speaker and one male speaker (raw data in .wav format).
To cite this work:
Sfakianaki, A. "Designing a Modern Greek sentence corpus for audiological and speech technology research".
TO APPEAR: In Proc. of the 14th International Conference on Greek Linguistics (ICGL14), September 5-8, 2019, University
of Patras, Greece. [Unpublished PDF]