Littera Deusto

Modern Languages, Basque Studies and Humanities

Automatic summarization

marzo 18th, 2009 · No hay Comentarios

Automatic summarization is the process in which a computer creates an abridged version of any kind of text. As our generation faces an information overload, the ability to shorten documents in one instant has become of great interest.

There are basically two different techniques for producing the summary:

Extraction: Based on keywords and keyphrases. The machine estimates the significance of each element, taking into account its frequency, location in the text and other characteristics such as a bold tag or a numerical value. The strings deemed most important are then selected to form an excerpt of the original document. Most research so far has focused on extracts, as abstracts require the use of a technology that is still in its development stages.

Abstraction: Applying natural language understanding and natural language generation, the system generates a brief closer to that which a human would create. Its aim is to mantain the primary meaning of the source text, so making use of synonyms and paraphrased sentences is not out.

User-focused summaries have recently allowed for more specific abbreviations. By determining a topic or query, the machine will filter the information accordingly, thus giving personalized results.

References:

Etiquetas:

  • Etiquetas