Questionnaire 2: Word Sense Disambiguation & Named Entity Recognition

abril 2nd, 2009 · No hay Comentarios

We have talked in the previous post about Machine Translation (MT) because it is considered a sub-field of the Computational Linguistics that we are currently studying. Broadly speaking, MT consists of translating words or speech from one natural language to another one using translation software. As we said, MT faces two problems in two different sub-areas of Natural Language Processing (NLP) that we are going to discover today in this post.

The Tower of Babel by Pieter Brueghel the Elder. "Come, let us go down and confuse their language so they will not understand each other." (Genesis 11:7)

Word Sense Disambiguation (WSD):

In the first place, and as you all may have guessed by now, Word Sense Disambiguation (WSD) consists of identifying which sense of a word is the most suitable in a given sentence, considering, obviously, that the word has a large number of distinct senses. To achieve this goal there are two different possible approaches: deep approaches and shallow approaches.

Deep approaches presume access to a comprehensive body of world knowledge. Meanwhile, shallow approaches do not try to understand the text; they just consider the surrounding words. To tell the truth, deep approaches have not showed themselves very successful in practice due to the impossibility of having a whole body of knowledge in a computer-readable format; so, shallow approaches are much more used today even though theoretically are not as powerful as deep approaches. However, thanks to the researcher’s work the Word Sense Disambiguation (WSD) is becoming more accurate each time.

Named Entity Recognition (NER):

In the second place, we have to higlight the importance of the Named Entity Recognition (NER), a subtask of information extraction also known as Entity Identification or Entity Extraction. The main aim of NER systems is classifying the elements of texts into predefined categories such as organizations, names of persons, expressions of times, locations, percentages, quatities, monetary values, etc.

There are two main types of NER systems. The ones that work using linguistic grammar-based techniques, and the ones that use statistical models. In practice, the hand-crafted grammar-based systems have demonstrated to be more efficient, obtaining better results. Actually, state-of-the-art NER systems, with an average score around 94%, produce near-human performance. However, they require the support of well-trained computational linguistics, which makes them less competitive in terms of costs.

References:

Named entity recognition. (2009, March 25). In Wikipedia, The Free Encyclopedia. Retrieved 17:25, March 29, 2009, from
http://en.wikipedia.org/w/index.php?title=Named_entity_recognition&oldid=279521544
Agirre, E. & Edmonds, P. (2007). Word Sense Disambiguation: Algorithms and Applications. Springer. Retrieved: 18:56, April 2, 2009, from
http://www.wsdbook.org/
Word sense disambiguation. (2009, April 2). In Wikipedia, The Free Encyclopedia. Retrieved 19:10, April 2, 2009, from
http://en.wikipedia.org/wiki/Word_sense_disambiguation
Named entity recognition. (2005). In Centrum Voor Nederlandse Taal En Spraak. Retrieved: 18:47, April 1, 2009, from
http://www.cnts.ua.ac.be/conll2002/ner/