MERLIN - Multilingual Platform for European Reference Levels: Interlanguage Exploration in Context

MERLIN is a learner corpus that facilitates the systematic exploration of authentic learner productions for Italian, Czech and German. A special feature of MERLIN is its strong relation to the Common European Framework of Reference for Languages (CEFR): The corpus contains reliably CEFR-related texts of learners at the reference levels A1 – C1. The MERLIN platform describes various usage scenarios for the corpus.


L2 corpora are systematically designed digital data collections that lend themselves to researching second language acquisition, but do have aditionally practical implications for the teaching, learning and assessment of foreign and second languages. There are more L2 corpora for German as a foreign language, e.g. FALKO which shares many design features with MERLIN, DISKO, KanDeL, BeMaTaC or parts of the GeWiss corpus.

For MERLIN, the Common European Framework of Reference for Languages (CEFR) plays a particular role, as it is one of the most important benchmarks for teaching and certification of languages as well as for the development of curricula: language courses, language tests and textbooks are regularly related to the levels of the CEFR. So far, however, empirical evidence is lacking on which learner language characteristics are more or less typical or expected at individual levels.

Project aims and results

MERLIN addresses this demand for Czech, German and Italian and provides an error-annotated corpus of learner texts. All MERLIN texts have been related to the CEFR in a methodologically sophisticated way by professional assessors; the reliability of the assessments was thoroughly verified. Thus, MERLIN helps illustrating the CEFR levels for Czech, Italian and German. With the help of the web-based search engine and visualization tool ANNIS, learner texts at all levels can be searched for (inter)language features relevant from practitioners’, research and intrinsic CEFR perspectives.

The project thus addresses a broad target audience, with its relevance to anyone teaching, testing, or learning one of the three target languages in Europe.


All MERLIN data and ressources are freely accesible under a Creative Commons licence (CC BY-SA 4.0). They are part of the CLARIN infrastructure (European Research Infrastructure for Language Resources and Technology). For searching the MERLIN corpus you can refer to this site (, as well as to ANNIS search engine at the Humboldt University of Berlin.

MERLIN is also available for download in various formats (see MERLIN homepage / Download MERLIN texts and resources).  For annotations, a multi-layer standoff format has been used. Thus, MERLIN data can be easily processed and annotated further, e.g. using  EXMARaLDA tools.

The MERLIN project was funded from 2012 until 2014 by the EU Lifelong Learning Programme under project number 518989-LLP-1-2011-1-DE-KA2-KA2MP.

Since 2018, corpus data are available through the CLARIN network. In 2021, the platform and the search functionalities have been revised.