As Jim Cowie and Yorick Wilks say, this name (Information Extraction) is given to a process that discriminatively structures and also combines data found in only one, or more texts. The ending outcome of the process of extraction changes; nevertheless, it can be transformed in order to populate some database type. Information analyst who have worked long run on particular assignment have already carried out information extraction manually with the main of database creation.
The importance of Information Extraction is determined by the huge amount of information available in a badly built form; internet is a good example of this fact. Those unstructured information can be made more accessible by transforming into relational form or also by marking-up with XML tags. To transform unstructured data into something that can be reasoned with, is required Information Extraction.
We can find a lot of definitions about Information Extraction given by some experts:
- Grishman(1997): “The identification of instances of a particular class of events or relationships in a natural language text, and the extraction of the relevant arguments of the event or relationship. It involves the creation of a structured representation (such as a data base) of selected information drawn from the text.”
- Riloff (1999): “A subfield of natural language processing that is concerned with identifying predefined types of information from text.”
- Yangarber (2001): “An emerging NLP technology whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts in the text, and to use these facts to fill a database.”
- Peshkin and Pheffer (2003): “The task of filling template information from previously unseen text which belongs to a predefined domain.”
- Cunningham (2005): “A technology based on analyzing natural language in order to extract snippets of information.”
One of the reasons for interest in Information Extraction is its role in analyzing, and contrasting different Natural Language Processing technologies. The evaluation (analyzing) process is specific and moreover, it can also be performed automatically. This, and the immediate applications of a successful extraction system, has given encouragement to research funders to support both evaluations of and research into Information Extraction
To end up, it’s crucial to mention the typical subtasks of Information Extraction:
-
Named Entity Recognition: entity names’ recognition (for organizations and people), name of places, temporal expressions, and some types of numerical expressions.
-
Conference: identification chains of noun phrases which refer to the same object. For instance, anaphora is a kind of conference.
-
Terminology extraction: finding the suitable terms for a particular corpus.
-
Relationship Extraction: relations’ identification between entities, such as:
*Person works for organization (extracted from the sentence “Bill works for IBM.”)
*Person located in location (extracted from the sentence “Bill is in France.”)
REFERENCES:
.Information Extraction. In Natural Language Processing Group, The University of Sheffield. Retrieved 11:53, March 18, 2009, from http://gate.ac.uk/ie/
.Information extraction. (2007, February 06). In Open Clinical, knowledege management for medical care. Retrieved 13:24, March 28, 2009, from http://www.openclinical.org/informationextraction.html
.Information Extraction. Jim Cowie and Yorick Wilks. In Department of Computer Science, University of Sheffield. Retrieved 13:11, March 28, 2009, from http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf
.Information extraction. (2009, February 14). In Wikipedia, The Free Encyclopedia. Retrieved 11:46, March 18, 2009, from http://en.wikipedia.org/wiki/Information_extraction