Littera Deusto

Modern Languages, Basque Studies and Humanities

Word-Sense Disambiguation (Questionnaire 2)

abril 28th, 2009 · No hay Comentarios

Word sense disambiguation (well known as WSD) have place in software programs which are designed to interpret the language. There are many ambiguous words and sentences which have more than one sense and can be understood in many different ways although only one sense in pretended. The main goal of disambiguation is to figure out the pretended or intended meaning.

This is a clear example of a written word which has two distinct senses or meanings:

For example, the written word “bass” has those two senses:

  1. bass = a kind of fish
  2. bass = tones of low frequency

When the word bass is introduced in a two different sentences, we actually know what  we are talking about:

  1. I went fishing for some sea “bass”.
  2. The “bass” line of the song is too weak.

A human knows that in the first sentence, the word “bass” is used to refer to a kind of fish and that the word “bass” which appears in the second sentence, is used to refer to a tone of low frequency.

Otherwise, we have to bear in mind that developing algorithms or algorism (a precise rule, or set of rules, specifying how to solve some problem) to reproduce this human ability can frequently be a tough mission.

Word-sense disambiguation systems are tested by comparing their results against the humans’  results. But we have to take into account, that  humans don’t always agree of which of the senses is the correct one. So it is not so logical to demand to  a machine (a computer in this case), to give a better performance that a human does (being the human the standard,  the computer can’t be better than this; it’s illogical). Researches of a coarse-grained divergence or distinction, are most effective since the human is better on coarse-grained divergence or distinction than on fine-grained one.

To end up, I also have to say that there are four conventional approaches to WSD (Word Sense Disambiguation):

  1. DICTIONARY- AND KNOWLEDGE-BASED METHODS: these methods mainly rely on dictionaries, on thesauri (book of synonyms), and also on lexical knowledge bases.
  2. SUPERVISED METHODS: these are methods which are based on the hypothesis that the context can give us some proofs in order to disambiguate words.
  3. SEMI-SUPERVISED METHODS: semi-supervised or minimally-supervised methods, use secondary resource of knowledge like a small annotated corpus as seed data in a bootstrapping process, or a word-aligned bilingual corpus.
  4. UNSUPERVISED METHODS (or word sense discrimination): these methods dodge barely fully external information and work straightaway from unprocessed unannotated corpora.

 

 

REFERENCES

Etiquetas:

  • Etiquetas