As learner language (L2) is regarded as an evolving language system in its own right, annotations were not merely based on error coding, but also took into account other linguistic characteristics.
In order to determine whether and to what extent a text deviates incorrectly, there must be a clear idea of what a learner presumably intended to write. In a learner text collection (learner corpus), it is important to make this interpretation explicit to make annotations more easily understandable and to avoid problems of reliability. Therefore, the MERLIN team formulated target hypotheses (TH) that are a corrected version of the learner texts. The team followed the rules developed for the FALKO corpus and adapted them to the project needs where necessary (cf. Reznicek/Lüdeling et al. 2012).
Target hypothesis 1 (TH1) = orthographically and grammatically correct version of the learner text
The "minimal target hypothesis" is a solely orthographically and grammatically correct version of the learner text, but might contain deviations from what a native speaker would say on other levels (e.g., lexical). TH1 interferes as little as possible with the learner text. They were written for the whole MERLIN corpus.
Target hypothesis 2 (TH2) = lexically and pragmatically accetable version of the learner text
The "extended target hypothesis" aims at creating an acceptable (for a native speaker) version of the original learner text. TH2 takes into account more language dimensions that often regard context-dependent phenomena like vocabulary and pragmatics. This assessment could only be made for a smaller part of the MERLIN corpus, the core corpus. It consists of a collection of texts which received either A2 or B2 ratings (for Italian: A2 and B1/B1+).
For examples and more details see MERLIN for research.