Automatic summarization is the process in which a computer creates an abridged version of any kind of text. As our generation faces an information overload, the ability to shorten documents in one instant has become of great interest.
There are basically two different techniques for producing the summary:
Extraction: Based on keywords and keyphrases. The machine estimates the significance of each element, taking into account its frequency, location in the text and other characteristics such as a bold tag or a numerical value. The strings deemed most important are then selected to form an excerpt of the original document. Most research so far has focused on extracts, as abstracts require the use of a technology that is still in its development stages.
Abstraction: Applying natural language understanding and natural language generation, the system generates a brief closer to that which a human would create. Its aim is to mantain the primary meaning of the source text, so making use of synonyms and paraphrased sentences is not out.
User-focused summaries have recently allowed for more specific abbreviations. By determining a topic or query, the machine will filter the information accordingly, thus giving personalized results.
References:
- Automatic summarization (2009, February 23). In Wikipedia, the free encyclopedia. Retrieved March 16, 2009.
- What is automatic text summarization? (2008, November 24). In Hercules Dalianis’ homepage at DSV, by Hercules Dalianis. Retrieved March 16, 2008.
- CS838-1 Advanced NLP: Automatic Summarization (2007, March 16). In Andrew B. Goldberg’s homepage at University of Winsconsin-Madison. by Andrew Goldberg. Retrieved March 16, 2008.
- Advances in automatic text summarization (1999, July). Publisher: The MIT Press, edited by Inderjeet Mani and Mark T. Maybury, 450 pp. Retrieved March 17, 2008.