Littera Deusto

Modern Languages, Basque Studies and Humanities

ZER DA CORPUS BAT?

abril 22nd, 2009 · No hay Comentarios

Corpus bat, (pluralean corpora), testuen bilduma bat da eta normalean testu hauek elektronikoki biltegiratuta eta prozesatuta daude. Definizio hau, gaur egun leku askotan topa dezakegun definizioa da, baina hona hemen kontzeptu honen gainean emandako beste definizio batzuk:

  ● “On the face of it, a computer corpus is an unexciting phenomenon: a helluva lot of text, stored on a computer”.  (Leech 1992)

  ● […] a corpus is a collection of texts assumed to be representative of a given language, dialect, or other subset of a language to be used for linguistic analysis.( Francis 1982).

●   […] a corpus is a collection of naturally-occurring language text, chosen to characterize a state or variety of a language. (Sinclair 1991).

●   [a corpus is] a subset of an ETL (Electronic Text Library) built according to explicit design criteria for a specific purpose. (Atkins, Clear eta Ostler 1992).

●   Corpus: A collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language. (EAGLES Expert Advisory Group on Language Engineering Standards, 1996).

 Beraz, testu bat baino gehiagoko bildumak “corpus” izenez deituak izan daitezke.  “ Corpus” hitza latindarra da eta “gorputza” adierazi nahi du. Hori dela eta, corpus bat edozein testuren gorputza da, baina “corpus” terminoa lingüistika modernoaren  eremuan erabiltzen denean konnotazio ezberdinak hartzen ditu.

Hurrengo zerrendak corpus modernoaren lau ezaugarri garrantzitsuenak biltzen ditu:

● Makinaz forma irakurterraza.
● Erreferentziazko patroi bat.
● Laginketa eta ordezkaritasuna.
● Tamaina txikia.

Etiquetas:

  • Etiquetas