The MERLIN corpus

The MERLIN corpus contains 2,286 texts for learners of Italian, German and Czech that were taken from written examinations of acknowledged test institutions. The exams aim to test knowledge across the levels A1-C1 of the Common European Framework of Reference (CEFR).

Origin of the texts

The corpus comprises written prodcutions from standardized high-quality language tests from telc Frankfurt (for German and Italian) and the Test centre of the Institute of Language and Preparatory Studies (ÚJOP) of the Charles University in Prague (for Czech).

The tasks are systematically related to the Common European Framework of Reference for Languages (CEFR). They were in use until 2013 and are now freely available on this platform.

The relation to the Common European Framework of Reference - the MERLIN rating grid

To ensure an immediate relation to the CEFR, specially trained testers re-rated all exam texts using the MERLIN MERLIN rating grid that was developed within the project.

The reliability of the ratings was subjected to rigorous statistical verification procedures to correct rating tendencies (e.g. leniency/harshness). As a result, a reliable rating profile has been created for each text in the corpus. The profile reflects both a general holistic overall level and the individual rating criteria detailed below:

general linguistic range
vocabulary range
vocabulary control
grammatical accuracy
coherence and cohesion
sociolinguistic appropriateness
command of orthography

The page MERLIN for research goes into more detail about the procedure of the re-ratings.

Test tasks

In the following, a comprehensive overview and detailed description of all test tasks which form the basis of the written test productions – the MERLIN texts – is provided. The linked PDF documents contain detailed information about the tasks, a brief description of the test parameters, and the specific characteristics of the intended text, e.g. regarding topic, register, domain.

Hint: In square brackets are the short names of the tasks as you find them in the file name of the MERLIN texts.

German

A1	[apartment-request] Informal e-mail: ask a friend for help with finding an apartment [swimming appointment] Informal e-mail: arrange an appointment with a friend [congratulation] Informal letter: congratulate to birth of a child
A2	[housing office-enquiry] Formal letter to housing office [pet sitting-request] Informal letter: ask friend to take care of pet [ticket-offer] Informal letter: offer a ticket not used to a friend
B1	[New Year-letter] Informal letter for New Year to a friend [visit-letter] Informal letter to a friend announcing a visit [birthday-letter] Informal letter: birthday congratulations
B2	[Au pair Agency-enquiry] Formal letter: ask for information at Au pair Agency [Au pair Agency-complaint] Formal letter: Au pair writes letter of complaint to Agency [internship-application] Formal letter: apply for internship in sales department
C1	[internship-application] Essay: why it's of value to learn German [learn German-essay] Online article: sticking to one's traditions and "assimilation" in a new environment [integration issues-essay] Report about the housing situation

Italian

A1	[appointment] Informal e-mail: reschedule an appointment [job search-advice] Informal e-mail: help a friend who is looking for work
A2	[see a friend]Informal letter: go see a friend [contact a friend] Informal letter: contact a friend after a long time [language course-advice] Informal letter: inform friends about language course
B1	[language course-enquiry] Formal letter: inform oneself about language course [cook with teacher] Informal letter: cook with teacher [wedding invitation] Informal letter: answer to a wedding invitation [job search-advice] Informal letter: help a friend who is looking for work after school-leaving exam
B2	[chat-advice] Informal letter: help someone who has problems with chats [language learning] Formal letter: describe experiences with language learning [hotel-complaint] Formal letter: complaining against a hotel [cooking evening-enquiry] Formal letter: ask for information about International Cooking Evenings [aid project-enquiry]Formal letter: inform oneself about an aid project [internship-application] Formal letter: apply for an internship in a company [internship fashion-application] Formal letter: apply for an internship in fashion sector

Czech

[birthday invitation] Informal e-mail: answering a birthday invitation
[swim in the sea-description] Description of a photo: swimming in the sea
[hotel-enquiry] Formal e-mail to a hotel
[playground-description] Description of a photo: Spielplatz
[photo with woman-description] Description of a photo: Frau am Fenster

[invitation-letter] Informal e-mail: answer to the email of Alena, a friend
[future plans-letter] Informal e-mail: answer to the email of Jana, a friend
[Tandem agency-enquiry] Informal e-mail: Information request, e-mail to a Tandem agency

[saying doma nejlepe-essay] Essay: Everywhere well but at home the best
[proverb kolace-essay] Essay: No pain no gain
[proverb v nouzi-essay] Essay: A friend in need is a friend indeed
[proverb vic hlav-essay] Essay: More people know more
[saying skola-essay] Essay: School is the basis of life
[proverb saty-essay] Essay: Clothes make the man

General notes on task description

The level of the test may differ from the level that the learner text received in the re-ratings.
The description of tasks is based on a grid that was developed fro these purposes by ALTE (Association of Language Testers in Europe). Please find more information on the grid in this document.
The author of the task descriptions is Olaf Bärenfänger. Please cite the task descriptions as: MERLIN project, task description: <name of the task>, 2014, http://merlin-platform.eu

Available metadata

Each text in the corpus is described with the following metadata. These details can be found in the header of individual text files.

Information about the author: age, gender, mother tongue (L1)
Information about the text: task ID and topic of the test taks, CEFR level of the test the written production was extracted from
Overall rating: CEFR level the test recieved in the re-rating (fair rating)
Single rating criteria: erreichtes GER-Niveau nach: general linguistic range | vocabulary range | vocabulary control | grammatical accuracy | coherence | sociolinguistic appropriateness | orthogaphy

For a comprehensive overview of the texts and the metadata associated with them, you can refer to the table metadata_ratings_indicators.cvs. It also covers, for each corpus text, numerous indicators targeting L2 features, as well as lexical, morphological, and syntactic complexity measures (for the German corpus).

The MERLIN corpus in figures

The following charts show the total number of texts at a given CEFR level and the amount of the annotations. The overviews also allow for a comparison of test level and actually rated level.