It is, in natural language processing, a type of information retrieval which is used to automatically extract structured information like categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents. Another of their goals is to allow computation to be done on the previously unstructured data and allow logical reasoning to make inferences based on the content of the input data.
Information Extraction is not Information Retrieval: Information Extraction does not recover from a collection a subset of documents which arerelevant to a query: it extracts from the documents salient facts about prespecified types of events, entities or relationships. These facts are entered automatically into a database, which may then be used to analyse the data to give a natural language summary.
As it is very complex, many different communities of researchers are bringing techniques from machine learning, databases, information retrieval, and computational linguistics for various aspects of the information extraction problem.
Sources:
- Information extraction. (2009, April 29). In Wikipedia, The Free Encyclopedia. Retrieved 16:08, May 19, 2009, from http://en.wikipedia.org/w/index.php?title=Information_extraction&oldid=286789152
- Information extraction. (2009). In the General Architecture for text Engineering (GATE) website. Retrieved 16:20, May 19, from http://gate.ac.uk/ie/
- Information extraction. (2009). In the National Institute of Standards and Technology (NIST). Retrieved 16:26, May 19, from www-nlpir.nist.gov/related_projects/muc/
- Information extraction. (2009). In the Now Publishers website. Retrieved 16:47, May 19, fromwww.nowpublishers.com/product.aspx