Littera Deusto

Modern Languages, Basque Studies and Humanities

Questionnaire 2: Word Sense Disambiguation & Named Entity Recognition

abril 2nd, 2009 · No hay Comentarios

We have talked in the previous post about Machine Translation (MT) because it is considered a sub-field of the Computational Linguistics that we are currently studying. Broadly speaking, MT consists of translating words or speech from one natural language to another one using translation software. As we said, MT faces two problems in two different sub-areas of Natural Language Processing (NLP) that we are going to discover today in this post.

The Tower of Babel by Pieter Brueghel the Elder. "Come, let us go down and confuse their language so they will not understand each other." (Genesis 11:7)

The Tower of Babel by Pieter Brueghel the Elder. "Come, let us go down and confuse their language so they will not understand each other." (Genesis 11:7)

Word Sense Disambiguation (WSD):

In the first place, and as you all may have guessed by now, Word Sense Disambiguation (WSD) consists of identifying which sense of a word is the most suitable in a given sentence, considering, obviously, that the word has a large number of distinct senses. To achieve this goal there are two different possible approaches: deep approaches and shallow approaches.

Deep approaches presume access to a comprehensive body of world knowledge. Meanwhile, shallow approaches do not try to understand the text; they just consider the surrounding words. To tell the truth, deep approaches have not showed themselves very successful in practice due to the impossibility of having a whole body of knowledge in a computer-readable format; so, shallow approaches are much more used today even though theoretically are not as powerful as deep approaches. However, thanks to the researcher’s work the Word Sense Disambiguation (WSD) is becoming more accurate each time.

Named Entity Recognition (NER):

In the second place, we have to higlight the importance of the Named Entity Recognition (NER), a subtask of information extraction also known as Entity Identification or Entity Extraction. The main aim of NER systems is classifying the elements of texts into predefined categories such as organizations, names of persons, expressions of times, locations, percentages, quatities, monetary values, etc.

There are two main types of NER systems. The ones that work using linguistic grammar-based techniques, and the ones that use statistical models. In practice, the hand-crafted grammar-based systems have demonstrated to be more efficient, obtaining better results. Actually, state-of-the-art NER systems, with an average score around 94%, produce near-human performance. However, they require the support of well-trained computational linguistics, which makes them less competitive in terms of costs.

References:

(Note: in the future this article could be modified several times in order to include more up-to-date information.)

Etiquetas:

  • Etiquetas