Littera Deusto

Modern Languages, Basque Studies and Humanities

Getting to know better SARA and Xaira…

mayo 16th, 2009 · No hay Comentarios

  1. ‘SARA (SGML Aware Retrieval Application) was developed specifically for access to the BNC in a Microsoft Windows environment. It is freely available to all BNC licensees and also for registered users of the BNC Subscription service hosted by the British Library. A copy of SARA is delivered with every copy of the BNC World corpus. You can also download the latest version here. The SARA webpage offers more information about SARA.

The Xaira program derives from SARA but has been developed further. It can be used on all well-formed corpora in XML. The BNC XML Edition, BNC Baby, and BNC Sampler corpora are delivered with a copy of Xaira. You can also download the latest version of XAIRA from SourceForge.net. More information about Xaira can be found on the Xaira webpage.’ Information retrieved at 10:43, May 16, 2009 from http://www.natcorp.ox.ac.uk/tools/index.xml.

BUT WHAT ARE THEY?

SARA allows investigations on the content and structure of a corpus. The precise searches and enquiries possible on a given corpus will of course depend upon the nature and completeness of the markup applied to it. However, indicatively, SARA supports features such as the following:

– Searches on words, truncated words and phrases
-Searches on SGML tags, attributes
-Combinatorial Boolean operations
-Frequency counts
-Lexicon, to allow identification of similar words (eg gumboot, gum-boot, gum-boots etc)
-Refinement
-Storing searches
-Limiting scope of queries
-Presentation of Results
-With or without SGML markup
-Page or concordance format
-Optional use of colour to enhance display

By way of illustration, some of the markup in the BNC relates to the social class of the speaker (in the case of spoken words); markup is also used to signify parts of speech. Thus, in the case of the BNC, SARA can be used to formulate a query equivalent to: How often do speakers of social class C1 use the word “input” as a verb?

XAIRA is the same thing as SARA but with more features since it is the recent MODIFICATED version of the last one.

Etiquetas:

  • Etiquetas