WP4 - Language Processing Chains: Results

Internal test data results

Internal tests were periodically executed by partners using WebCASDebugger environment. In order to facilitate them, the test data representing documents in all project languages with formats and encodings used for particular language and different sizes have been gathered and used in the process of testing.

The description of data used for internal tests is available online at http://www.atlasproject.eu/wp4/test-data-internal.pdf.

The data can be obtained at http://www.atlasproject.eu/wp4/test-data-internal.zip.

Final test data

Apart from verification of the proper integration of the tools constituting LPCs, the goal of the testing process was to roughly assess their performance and the stability, taking into account their incomparability caused by different annotators, implemented on different platforms (Java, C++, Perl). Nevertheless, such information is quite important for further optimization of the tools making the chains and also for future integrators of the LPCs in their applications.

In order to provide sound testing environment a parallel corpus of documents based on EurLEX ( http://eur-lex.europa.eu/en/index.htm ) database have been collected. It consists of over 360K documents of various sizes for all project languages. The documents were divided into 9 classes according to number of tokens.

The collected corpora was used only for testing of the LPC in all project languages and cannot and will not be made available.

Final integration and test results

The final integration tests have been executed on 28 December 2011 by Tetracom.

The detailed results are available online at http://www.atlasproject.eu/wp4/test-results.pdf.