Language and Speech Resources
  
 
 
    
  
  
    
    Contents
    
     Language and speech resources are of crucial importance for research
      and development in language and speech technology. ELSNET aims at the
      creation and distribution of pilot resources for experimentation purposes,
      and acts as a platform for exchange of expertise across languages, and for
      discussion of emerging standards. ELSNET collaborates closely with the
      main organisations in the field of resources.  
    
     
    
    
    
    ELSNET, in close collaboration with the former ENABLER Network, is in
      the process of building a map of the Resources Landscape. This map should
      facilitate identification and access to Language Resources: surveys,
      metadata, networks, projects, ... 
    
    The first release of the landscape can now be found on
      http://www.ilc.cnr.it/elsnet4/ 
     
    
    
    
    
      - The European Corpus Initiative Multilingual Corpus I
      
 
      - The ECI/MCI CD-ROM contains over 98 million
        words, covering most of the major European languages, as well as
        Turkish, Japanese, Russian, Chinese, Malay and more. The primary focus
        in this effort is on textual material of all kinds, including
        transcriptions of spoken material. 
 
      -    
 
      - Newspapers on the internet 
 
      - A list of links to electronic versions of newspapers from various
        countries in several languages. The URL is
        http://www.ims.uni-stuttgart.de/info/Newspapers.html
      
 
      
     
     
    
    
    
    
      -    
 
      - The HCRC Map Task Corpus 
 
      - The HCRC Map Task Corpus is a set of 8 CD-ROMs containing linked
        audio and transcriptions of a total of about 18 hours of spontaneous
        speech that was recorded from 128 two-person conversations according to
        a detailed experimental design. 
 
      - CD-ROMS available from
        LDC
        (no longer from ELSNET). The non-member price is ca $200.
 
      - The project URL is http://www.hcrc.ed.ac.uk/maptask/
      
 
      -    
 
      - The Groningen Speech Corpus 
 
      - The Groningen Speech Corpus was
        collected by A.M. Sulter, MD and Prof. H.K. Schutte as part of a
        research project funded by NWO (Netherlands Organization for Scientific
        Research). The 4 CD-ROMs contain over 20 hours of speech. It is a corpus
        of read speech material in Dutch, recorded on PCM tape under fairly good
        conditions. 
 
      - CD-ROMS available from
        ELRA/ELDA
        (no longer from ELSNET). The non-member price is ca 800 euro.
 
          
      - The Syntax/Senmantic Annotation Task
 
      - In the course of 2000-2001 ELSNET has produced two small sample
        corpora of parallel structure for German and Italian, about 1000
        sentences of each language, illustrating 20 verbs, and their syntactic
        and semantic subcategorization. The annotation concentrates on the
        verbal predicates and their subcategorized complements, as well as on a
        few relevant modifiers. A short report can be found on
        http://www.elsnet.org/ssa
 
     
     
    
    
    
    
	- ELRA 
        
 -  The European Language Resources
		Association (ELRA) was established as a non-profit organization
		in Luxembourg in February, 1995. The overall goal of ELRA is to
		provide a centralized organization for the validation,
		management, and distribution of speech, text, and terminology
		resources and tools, and to promote their use within the
		European telematics R&TD community. The URL is http://www.icp.grenet.fr/ELRA/home.html.
        
 
        -    
 
        
      
      
	- LDC 
        
 -  The Linguistic Data Consortium
		(LDC) is an open consortium of universities, companies and
		government research laboratories. It creates, collects and
		distributes speech and text databases, lexicons, and other
		resources for research and development purposes. The University
		of Pennsylvania is the LDC's host institution. The LDC was
		founded in 1992 with a grant from the Advanced Research
		Projects Agency (ARPA), and is partly supported by grant
		IRI-9528587 from the Information and Intelligent Systems
		division of the National Science Foundation. The URL is http://www.ldc.upenn.edu
        
 
        -    
 
        
      
      
 -  ENABLER 
 
 -  The Enabler Network aims at improving cooperation among
      national activities established by national authorities for
      providing Language Resources for their languages. The
      action aims at: establishing a regular exchange of
      information; identifying and fostering possible synergies
      and cooperation; promoting the compatibility and
      interoperability of their results, thus facilitating the
      successful transfer of technologies and tools among
      languages and the construction of multilingual Language
      Resources; increasing the visibility and the strategic
      impact of those national activities in the field of HLT;
      contributing to the creation of an overall framework in
      which the public and private sectors, national efforts and
      international coordination could cooperate in order to
      answer the IST need for Language Resources.
 
 -  URL: http://www.enabler-network.org/
 
 -    
 
      
      
 -  NEMLAR 
 
 -  The goal of the NEMLAR (Network for Euro-Mediterranean
      LAnguage Resources) is to create a network of qualified
      Euro-Mediterranean partners to specify and support the
      development of high priority LRs for Arabic and other local
      languages in a systematic, standards-driven, collaborative
      learning context. The project will focus on identifying the
      state of the art of LRs in the region, assessing priority
      requirements through consultations with language industry
      and communication players, and establishing a protocol for
      developing a basic LR kit for the major forms of the
      region's predominant language - Arabic, and other local
      wide-spoken languages where appropriate.
 
 -  URL: http://www.nemlar.org 
 
 -    
 
      
      
	- TELRI 
        
 -  The TELRI association aims at
		collecting, promoting, and making available monolingual and
		multilingual language resources and tools for the extraction of
		language data and linguistic knowledge; with a special focus on
		Central and eastern European languages. The URL is http://www.telri.de.
        
 
        -    
 
      
      
      
     
  
 
 
 
 
 
 
    
   |