Linköping University
Department of Mathematics
Lars Eldén

November 17, 2004                                                     

 

NGSSC: Data mining and applications in science and technology


Computer Assignment

Summarization of a text: extraction of key words and key sentences 

In the  lab directory  /mailocal/lab/numt/ngssc/summarize/ there is a newspaper text in two version: the first is the original, the second has been preprocessed somewhat for the text parser (e.g. the sentences have been numbered). Extract the top ten keywords and top five sentences from the text using the saliency score method.


Use gtp  from the text mining assignment (modify the runmedline script),  perform stemming and remove  common words.


The report should include the code and the result.