Help for the search

You can search the MERLIN corpus with the help of the search and visualisation software ANNIS.

Getting started: Open the search. Choose an example search from Help/Examples to get an impression how the ANNIS search works. Now, you can modify the query. Choose and copy an annotation (see section 2) or use the Query Builder as described in section 3 to search for a specific L2 feature.

1 Explanation of your search output

ANNIS help

  1. Search field displaying your query in the query language
  2. Options: export search results or perform a frequency analysis
  3. Choose L2 corpus 
  4. Number of tokens displayed left and right from the key word
  5. Metadata, i.  e. information on the learner and the ratings, as well as statistical information
  6. Detail from the learner text (L2 text) that contains the word or the feature you are looking for 
  7. TH1 = minimally corrected, i. e. orthographically and grammatically acceptable version of the learner texts; TH2 = sociolinguistically acceptable version of the learner text; TH1Diff and TH2Diff = description of the deviation between the L2 text and the target hypothesis
  8. Categorical description of the error or the L2 feature [EA_category] 
  9. Actual manifestation of the feature/error [_type]
  10. displays the complete learner text

hint bulb For automatic annotations displayed under "automatic grid" (POS annotations, lemmas, t-units, sentences) see  MERLIN for research.

2 Search MERLIN for L2 features

In the following section, all annotated learner language features are listed according to their categorical description [EA_category] and specific manifestation [_type]. Copy the annotation names (tags) into the ANNIS search window to start a search.

hint bulb For concrete examples of annotated learner language features, see MERLIN Annotations, and for a detailed description of annotations and their scope (tag span) and annotation rules, see the MERLIN Annotation scheme.

G_ Grammar

EA_category=/G_Agr/ agreement (subject and verb)
EA_category=/G_Art/ article
EA_category=/G_Clit/ ITA: clitic
EA_category=/G_Conj/ conjunction
EA_category=/G_Inflect_Inexist/ inexistent inflection (nouns, adj, verb)
EA_category=/G_Morphol_Wrong/ wrong inflection (nouns, pronouns, adj)
EA_category=/G_Neg/ negation general
G_Neg_g_neg_type="negdoub" CZE: double negation
EA_category=/G_Pos/ part of speech error
EA_category=/G_Prep/ preposition
EA_category=/G_Refl_pronrefl/ reflexive pronoun
G_Refl_type="pronreflposs" CZE: possessive reflexive pronoun
EA_category=/G_Valency/ verb valency: number of obligatory arguments
EA_category=/G_Verb_compl/ verb formation (morphol.)
EA_category=/G_Verb_main/ main verb
G_Verb_type="asp" verb: aspect (CZE+ITA)
G_Verb_type="md" verb: mood
G_Verb_type="tns" verb: tense
G_Verb_type="vc" verb: voice
EA_category=/G_Wo/ wor order general
G_Wo_type="womaincl" word order in main clause
G_Wo_type="wosubcl" word order in subordinate clause

O_ Orthography

EA_category=/O_Abbrev/ abbreviation
EA_category=/O_Apostr/ GER+ITA: apostrophe
EA_category=/O_Capit/ capitalization
EA_category=/O_Graph/ general grapheme error
O_Graph_type="act" CZE+ITA: diacritical marks
O_Graph_type="trans" grapheme transposition
EA_category=/O_Punct/ punctuation
EA_category=/O_Wordbd/ word boundary

G_ Intelligibility**

EA_category=/H_Intelltxt/ intelligibility of text
EA_category=/H_Intelltxt/H_Intellts/ intelligibility of sentence

V_ Vocabulary**

EA_category=/V_FS/ formulaic sequence
V_FS_type="colloc" formulaic sequence: collocation
V_FS_type="idiom" formulaic sequence: idiom
V_FS_type="commphras" formulaic sequence: communicative phraseologism
EA_category=/V_Sequence_lexgrammer_inc/ incomprehensible sequence caused by accumulation of lexical/grammatical error(s)
EA_category=/V_FS_form/ formulaic sequence: form error
V_form_word_fs_nonexist_range non-existing form (word or formulaic sequence)
EA_category=/V_semdenot_word_fs/ semantic error: denotation (word or formulaic sequence)
EA_category=/V_semconn_at_word_fs/ semantic error: connotation (attitude), (word or formulaic sequence)
EA_category=/V_semimprec/ semantic error: precision (word or formulaic sequence)
EA_category=/V_Wordform/ general word formation error
V_Wordform_type="deriv" word formation error: derivation
V_Wordform_type="comp" word formation error: composition

C_ Coherence/Cohesion**

EA_category=/C_Con_accur/ connector accuracy
EA_category=/C_Coh_jump/ content jumps
EA_category=/C_Coh_ref/ reference
EA_category=/C_Coh_txtstruct/ metacommunicative device

S_ Sociolinguistic appropriateness**

S_Txt_type="grfw" salutations/complimentary closes
S_Txt_type="opcl" opening/closing formulae
S_Form_type="gen" inappropriate style (formality)
S_Form_type="addr" inappropriate addressing (formality)
S_Var_type="clit" ITA: lexicalised clitics (verbi procomplementari)
S_Var_type="duppron" ITA: personal pronoun redundancy
S_Var_type="synstr" ITA: marked syntactic structures
S_Var_type="che" ITA: 'che polivalente'
S_Var_type="woweil" GER: main clause word order after 'weil'
S_Var_type="partik" GER: modal particles

P_ Pragmatics**

EA_category=/P_Pol_dir/ politeness: overly direct language form
EA_category=/P_Request/ REQUEST general
P_Request_type="direct" direct REQUEST
P_Request_type="indirect" indirect REQUEST

** Note: these error categories are only accessible for a subset of MERLIN texts. See MERLIN Annotations / Annotation structure.

Further specification of error categories

add superfluous (added) element
ambig ambigues - type of error can't be specified
ch wrong choice of element
merge elements are wrongly merged
o omitted element
pos wrong position
split elements are wrongly split

3 Narrowing the search using metadata

Use ANNIS's Query Builder to search for features or a combination of features while narrowing the query based on specific metadata.

  1. Open the ↘ Query Builder in ANNIS. 
  2. Choose Word sequences and meta information
  3. Select the corresponding feature and its attribute under Linguistic sequenceInitializeAdd.
  4. In the Toolbar, click on ↘ Create AQL Query to paste the query into the search field.

To restrict the query to a specific group of learners (e. g. by L1 or age) or a specific CEFR level (fair rating), select a metadata category before pasting the query into the search field (step 4) under  ↘  Meta informationAdd and tick the required attribute, e. g.:

_rating_fair_cefr CEFR level the test received in the re-rating
_author_L1 Mother tongue of the learner
_task_topic Task preceding the text

Alternatively you can copy the feature you are searching for from the feature list under section 2 and paste it into the ANNIS search field. Then, add the metadata using the following scheme, to restrict your search to specific texts:

  • & meta::_rating_fair_cefr="B1"  [A1, A2, B1+, B2]
  • & meta::_author_L1="German"  [English, Russian, Arabic, etc.]

hint bulb The ANNIS User Guide offers a thorough introduction to using the ANNIS interface.

4 Retrieve statistical information

To get an indication of the frequency of certain L2 features use the ANNIS search.

  1. Search for specific L2 features as described in section 2 or use global error categories.
  2. Then, click on ↘ Frequency Analysis [2] and subsequently on the right on ↘ Perform Frequency Analysis. You will retrieve a statistical analysis of the annotated features within the category in question.
  3. Amend your query according to the following scheme, to restrict the search to a certain CEFR level, e. g. B1: & meta::_rating_fair_cefr="B1".

Freq Analysis

Global error categories

EA_category=/G_.*/ phenomena at the grammatical level
EA_category=/O_.*/ phenomena at the orthographical level 
EA_category=/H_.*/ phenomena at the level of intellegibility
EA_category=/C_.*/ phenomena at the level of coherence / cohesion
EA_category=/V_.*/ phenomena at the lexical level
EA_category=/S_.*/ phenomena at the level of sociolinguistic appropriateness
EA_category=/P_.*/ phenomena at the pragmatic level