Central and Eastern European Survey
Department of Intelligent Systems
Jozef Stefan Institute Resources
NL and Speech Resources available at the organisation: Textual, Software, Lexical resources.
Name: MULTEXT-East aligned corpus Nature: manually validated PoS tagged text Language: Multilingual: English + 6 CEE Size: 7 x 100k words Format: TEI Coverage: Orwell's 1984 Medium: Internet Availability: free for research purposes
Name: IJS-ELAN parallel corpus Nature: sentence aligned and automatically PoS tagged text Language: Slovene + English Size: 2 x 500k words Format: TEI Coverage: 15 terminology rich texts: economy, computes, etc. Medium: Internet Availability: free
Name: Slovene MULTEXT-East Lexicon Nature: lexical Language: Slovene Size: 15,000 lemmas, full inflectional paradigms Format: ASCII, tabular list of wordform / lemma / morphosyntacticdescription Coverage: MULTEXT-East Slovene corpus Medium: Internet Availability: free for research purposes
Name: Slovene Diphone Database Nature: Speech Language: Slovene Size: 1224 diphones, cca. 5 Mb Format: Binary, 16kHz sampling rate, RAW format Coverage: full Medium: Machine readable form Availability: by arrangement
Name: Slovene Readings Nature: Speech Language: Slovene Size: 1000 utterances, cca. 40 Mb Format: Binary, 19.8kHz sampling rate, binary format Coverage: mainly declarative, also questions and imperative sentences Medium: Machine readable form Availability: free for research purpose
Software description: Slovene TTS system: includes a grapheme-to-phoneme module, diphone
database, direct grapheme to phoneme translation, module for micro and
macro-prosody determination and module for concatenation of speech
units. System is free for research purposes.
Web concordancer: consists of a Perl CGI script and associated HTML
pages. Uses IMS CQP system as the corpus processing back-end. Enables
searches on marked up and parallel text. The interface is freely
available.
|