WP5 - Text Summarization

Start date Friday, April 1, 2011
End date Wednesday, August 1, 2012
This work package focuses on adjusting and fine-tuning existing software components and producing summarization tools for the project target languages. The leader of this work package, UAIC, has achieved good results with a discourse-parsing based method of summarization, which allows for both general and focused summaries. The summary is obtained at the end of few processing steps (Cristea and Postolache, 2005), as follows: the text is first segmented into elementary discourse units (mainly clauses), then for each sentence a discourse tree is composed based on cue-phrases recognizable by a parser. Following, the sequence of sentence trees is arranged into one big tree for the whole discourse, by maximizing a score contributed from centering transitions and anaphoric links, as proposed by Veins Theory (Cristea et al., 1998). On the global structure tree, any type of summaries can be computed effortless. The method can be applied to any language in the project provided that the following resources are available for the language: a sentence splitter, a collection of discourse markers and (optionally) an anaphora resolution tool. Other summarization methods can be applied as well, and decisions on this will be taken in the beginning of the project by the consortium. We may consider to configure the summarization tool architecture specifically for each language in the project, depending on the processing tools and resources available.


