T4.1 - Implementation of LPCs for the project languages

Stat date Jul 1, 2011
End date Oct 2, 2011
This task encompasses the implementation of language processing chains for Bulgarian, Croatian, German, Greek, Polish and Romanian, as well as reviewing the produced software components. The task requires selecting the best existing tools for various languages and implementing glue between the particular tools. The glue will consist of a number of converters from the output of one tool to the input of the next tool in the chain, as well as converters from and to the inter-lingual format adopted in the project. Such conversion will often require in-depth understanding of the linguistic annotation assumed and produced by various tools, as well as the implementation of maximally lossless mapping between such annotations. Once a particular language processing chain has been completed, a team of qualified personnel from Tetracom will examine the suitability of the particular chain for its intended use and will identify any discrepancies from specifications and standards. Any change request initiated by Tetracom will be sent directly to the LPC author who can approve or disapprove the request. If necessitated by technical reasons, LPC authors will send change requests related to the ATLAS platform to Tetracom. The well-established metrics for language tools – accuracy for taggers, accuracy and precision/recall (and F-measure) for the identification of noun phrases and named entities ; accuracy, fluency, syntactical, semantical correctness, syntactic and semantic understandability for human evaluation criteria ; precision and recall for cross-lingual search engine - will be used to assess the different links of the chain. these metrics will be reported in the language processing chains deliverable.