Next: Annotation: Schemata used and
Up: Short summary of the
Previous: Objectives
A comparable Italian/German corpus was created, in so far as both,
corpus composition and annotation, are based on the same principles
for both languages:
- Around 20 verbs of Italian have been selected, mainly to cover
different semantic fields and a variety of syntactic constructions,
and to have examples of a medium degree of polysemy. On that basis,
around 20 German verbs were selected, which are known to be
(partial) translation equivalents of the Italian verbs. Again,
emphasis was on a medium degree of polysemy, and on rich
documentation in the source corpora. In five cases, two German verbs
have been selected which are partial equivalents of one single
Italian item, to facilitate experiments in translation equivalent
selection and word sense disambiguation. A table summarizing the
selected verbs is appended, for both languages (cf. appendix).
- The corpora serving as raw material are mainly journalistic: for
Italian, the following local or national sources have been used:
Il Corriere della Sera, La Repubblica, La Stampa,
Il Sole 24 Ore, Unione Sarda and Panorama. For
German, Frankfurter Rundschau, die Tageszeitung, and
Stuttgarter Zeitung are the main sources: again local or
national newspapers.
- For each Italian verb around 50 sentences have been selected
from the data. For each German verb, it was aimed at the same
number; in a few cases, somewhat less material was available, which
was only accepted when two German verbs shared an Italian
equivalent.
- For lexical semantic annotation, basically verbs and their arguments
noun groups and prepositional groups are of interest.
Consequently, emphasis was laid on non-sentential readings, and
readings with sentential complements were only used as auxiliary
information, to complete the syntactic description, wherever
necessary.
Next: Annotation: Schemata used and
Up: Short summary of the
Previous: Objectives
Hannah Kermes
2/8/2001