Challenges in the compilation, annotation, and analysis of learner corpus
data
This chapter highlights and discusses the special
characteristics of learner corpus data and the challenges they may present
for corpus compilation, annotation, and analysis. Because learner corpus and
SLA researchers use their data to study L2 production and development, it is
of utmost importance that the data are valid, that is, they represent
“authentic” L2 production, which means that the data must stem from the
studied learners’ own language production. I discuss challenges
in three areas: (1) multilingual practices and metalinguistic language use,
(2) lexical and constructional bias, often brought about by the wording of
task instructions or writing prompts that learners are asked to respond to,
and (3) learner corpus annotation in view of the “discourse of deficit” in
SLA. For each of these challenges solutions as to how they can be met are
offered.
Article outline
- 1.Introduction and general remarks
- 2.Challenges and how to respond to them
- 2.1Multilingual practices and metalinguistic language use
- 2.2Task effects
- 2.3“Discourse of deficit” and learner corpus annotation
- 3.Summary and conclusion
-
Notes
-
References
This content is being prepared for publication; it may be subject to changes.