Tasks

List of tasks
WP1
WP2
WP3
WP4
WP5
WP6
WP7
WP8
Task details

T5.1 - Implementation of text summarization tools

Stat date Apr 1, 2011
End date Apr 1, 2012

This task involves the implementation of text summarization tools for the project target languages, as well as reviewing the produced software components. Currently, there is an implementation for a summarization system working for Romanian and English, which could be adapted to the other project languages and serve as a starting point in the work package. Furthermore, the other underlying components necessary for summarization are also available – tokenizer, POS-tagger, sentence splitter, name entity recognition module (reusable from other work packages in the project), anaphora resolution (available for English and Romanian, adaptable for other languages as well), cue-phrases to act as discourse markers (available for English and Romanian, easily configurable for other languages). Our expectations of the produced summarization tools are listed below:

  • Evaluation of the summarization tools should yield results comparable to or better than the results yielded by the state-of-the-art summarization systems. In order to properly evaluate the results of the technology implemented in ATLAS, a consistent collection of parallel corpora is needed for the languages in the project. In the beginning of the project the consortium will decide on the best evaluation metrics and corpora to be used.
  • Evaluation methods could include ROUGE automatic n-gram matching (Lin and Hovy, 2003), ROSE (Conroy and Dang, 2008), AutoSummENG (Giannakopoulos et al., 2008), or BEwT-E (Tratz and Hovy, 2008). Evaluation metrics differ from method to method and, accordingly, the best results, too. Systems that perform automatic evaluation will be also considered (e.g. the recent AESOP track at NIST).
  • Data for evaluating summary systems have been provided at the Document Understanding Conference (DUC) and, lately, at the Text Analysis Conference (TAC).
  • We expect a number of national teams of the consortium to participate in the next TAC with relevant results (at least one system ranked among the first 3). The evaluation will thus be done in competitions, by comparing with the best approaches in the world.

Once a particular summarization tool has been completed, a team of qualified personnel from Tetracom will examine the suitability of the particular tool for its intended use and will identify any discrepancies from specifications and standards. Any change request initiated by Tetracom will be sent directly to the summarization tool author who can approve or disapprove the request. If necessary, summarization tool authors will send change requests related to the ATLAS platform to Tetracom.