Chapter 4
Post-editing neural machine translation in specialised
languages
The role of corpora in the translation of phraseological
structures
This study focuses on phraseology in specialised
texts and on students’ difficulties pertaining to phraseology in
post-editing neural machine translation output. It is undertaken
within the corpus-based methodological framework that we have
developed for several purposes, one of which being to assess the
impact of corpus use on translation and post-editing. The objective
of the study is to propose a descriptive analysis of typical student
errors related to phraseology in order to design tailored
pedagogical materials. We aim to show that, with consistent training
in querying corpora and in interpreting results in an appropriate
manner, students can manage to improve their productions when
translating specialised texts or when post-editing machine
translation output.
Article outline
- 1.Introduction
- 2.Theoretical background
- 2.1Corpora in translation training
- 2.2Machine translation and post-editing
- 2.3Phraseology in LSP and NMT
- 3.Context, methods and data
- 3.1Context
- 3.2Methods and data
- 4.Analysis of typical student errors in post-editing
phraseology
- 4.1Type 1: Overconfidence in MT (or under-editing of MT
output)
- 4.2Type 2: Underconfidence in MT (or over-editing of MT
output)
- 4.3Type 3: Failure to correct MT output
- 5.Constructing classroom activities
- 6.Towards the analysis of NMT output on MWUs
- 7.Conclusion
-
Notes
-
References
This content is being prepared for publication; it may be subject to changes.
References (49)
References
Aston, G. (1999). Corpus
use and learning to
translate. Textus, 12, 289–313.
Baker, M. (1998). Réexplorer la langue de la traduction: Une
approche par corpus (Investigating the language of
translation: A corpus-based
approach). Meta: Journal des traducteurs / Meta: Translators’
Journal, 43(4), 480–485.
Bojar, O., Chatterjee, R., Federmann, Ch., Graham, Y., et al. (2016). Findings
of the 2016 conference on machine
translation. Proceedings
of the first conference on machine translation: Volume
2, Shared task
papers (pp. 131–198). Association for Computational Linguistics. 
Bowker, L. (1998). Using
specialized monolingual native-language corpora as a
translation resource: A pilot
study. Meta:
Journal des traducteurs / Meta: Translators’
Journal, 43(4), 631–651. 

Bowker, L., & Bennison, P. (2003). Student
translation archive and student translation tracking system.
Design, development and
application. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora
in translator
education (pp. 103–118). St. Jerome Publishing.
Castagnoli, S., Ciobanu, D., Kübler, N., Kunz, K., & Volanschi, A. (2011). Designing
a learner translator corpus for training
purposes. In N. Kübler (Ed.), Corpora,
language, teaching, and resources: From theory to
practice (pp. 221–248). Peter Lang.
Colson, J. -P. (2019). Multi-word
units in machine translation: Why the tip of the iceberg
remains problematic – and a tentative corpus-driven
solution. MUMTT 2019, the 4th
Workshop on Multi-word Units in Machine Translation and
Translation Technology. [URL]. 
Corpas Pastor, G. (2013). All
that glitters is not gold when translating phraseological
units. In J. Monti, R. Mitkov, G. Corpas Pastor & V. Seretan (Eds.), Workshop
proceedings for multi-word units in machine translation and
translation
technologies (pp. 9–10). The European Association for Machine Translation.
Corpas Pastor, G., Mitkov, R., Afzal, N. & Pekar V. (2008). Translation
universals: Do they exist? A corpus-based NLP study of
convergence and
simplification. Proceedings
of the 8th AMTA
conference (pp. 75–81).
Coxhead A. & Hirsh D. (2007). A
pilot science-specific word
list. Revue
française de linguistique
appliquée, 12(2), 65–78. 

Espunya, A. (2014). The
UPF learner translation corpus as a resource for translator
training. Language Resources
and
Evaluation 48, 33–43. 

Frankenberg-Garcia, A. (2015). Training
translators to use corpora hands-on: Challenges and
reactions by a group of 13 students at a UK
university. Corpora, 10(2), 351–380. 

Gautier, L. (2003). Terminologie et phraséologie comparées du
droit constitutionnel en français et en
allemand. L’espace euro-méditerranéen: Une idiomaticité
partagée. [URL]
Gledhill C., & Kübler N. (2015). How
trainee translators analyse lexico-grammatical
patterns. In M. I. González-Rey (Ed.), Phraseology,
phraseodidactics and construction
grammar(s) (pp. 162–178). Special
issue of Journal of Social
Sciences 11(3). 

Gledhill, C. (2000). Collocations
in science writing. Language in
Performance Series,
22. Gunter Narr Verlag.
Granger, S. (1998). Prefabricated
patterns in advanced EFL writing: Collocations and lexical
phrases. In A. P. Cowie (Ed.) Phraseology:
Theory, analysis and
applications (pp. 145–160). Oxford. 

Granger, S., & Lefer, M. -A. (2020). The
multilingual student translation corpus: A resource for
translation teaching and
research. Language resources
&
evaluation, 54(4), 1183–1199. 

Granger, S., & Paquot, M. (2015). Electronic
lexicography goes local: Design and structures of a
needs-driven online academic writing
aid. Lexicographica –
International Annual for
Lexicography, 31(1), 118–141. 

House, J. (2008). Beyond
intervention: Universals in
translation. Transkom 1(1), 6–19.
Koponen, M. (2015). How
to teach machine translation post-editing? Experiences from
a post-editing
course. In M. Simard & S. O’Brien (Eds.), Proceedings
of 4th Workshop on Post-Editing Technology and Practice
(WPTP4). Miami, Nov. 3, 2015.
Kübler, N. (2003). Corpora
and LSP
translation. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora
in translator
education (pp. 25–42). St. Jerome Publishing.
Kübler, N. (2008). A
comparable learner translator corpus: Creation and
use. In P. Zweigenbaum (Ed.), Proceedings
of the Comparable Corpora Workshop of the LREC
Conference (pp. 73–78). May 28–30, 2008, Marrakech, Morocco.
Kübler, N., Mestivier-Volanschi, A., & Pecman, M. (2018). Teaching
specialised translation through corpus linguistics: Quality
assessment and methodology evaluation by experimental
approach. META:
Journal des traducteurs / Meta: Translators’
Journal, 63(3), 806–824. 

Kübler, N., Mestivier, A., Pecman, M., & Zimina, M. (2016). Exploitation quantitative de corpus de
traductions annotés selon la typologie d’erreurs pour
améliorer les méthodes d’enseignement de la traduction
spécialisée. Actes des 13es Journées internationales d’Analyse
statistique des Données
Textuelles, 731–741. 7–10 June 2016, Nice, France.
Kübler, N., Pecman, M., & Mestivier-Volanschi, A. (2015). Étude sur l’utilisation des corpus dans
l’enseignement de la terminologie et de la traduction
spécialisée, Terrains de recherche en linguistique appliquée (TRELA
2015). July 2015, Paris, France.
Kübler, N., Mestivier, A., & Pecman, M. (2021). Using
comparable corpora for translating and post-editing complex
noun phrases in specialised texts: Insights from
English-to-French in specialised
translation. In S. Granger, & M-A. Lefer (Eds.), Extending
the scope of corpus-based translation
studies (pp. 237–266). Bloomsbury publishing.
Kunilovskaya, M., & Morgoun, N. (2016). Available
corpora and error-annotated student translations in
translator
education. Proceedings
of the 6th Conference. The Future of
Education, 121–125. Libreria
Universitaria.
Laviosa-Braithwaite, S. (2001). Universals
of
translation. In M. Baker (Ed.), Routledge
encyclopedia of translation
studies (pp. 288–291). Routledge.
Loock, R. (2020). No
more rage against the machine: How the corpus-based
identification of machine-translationese can lead to student
empowerment. The Journal of
specialised translation
(JoSTrans), 34, 150–170.
Loock, R., Mariaule, M., & Oster, C. (2013). Traductologie de corpus et qualité: Étude de
cas. Tralogy
II, Session 5 – Assessing Quality in MT / Mesure de la
qualité en
TA. 17–18 January 2013, Paris.
Maniez, F. (2001). Extraction d’une phraséologie bilingue en
langue de spécialité : Corpus parallèles et corpus
comparables. Études terminologiques et
linguistiques. Meta, 46(3), 552–563. 

Maniez, F. (2017). An appraisal of recent breakthroughs in
machine translation: The case of past participle-based
compound adjectives in ESP (Evaluation des récentes avancées
de la traduction automatique: Le cas des adjectifs composés
formés à partir d’un participe passé en anglais de
spécialité). ASp 72, 29–48. 

Martikainen, H. (2019). Post-editing
neural MT in medical LSP: Lexico-grammatical patterns and
distortion in the communication of specialized
knowledge. Informatics,
Special Issue “Advances in Computer-Aided Translation
Technology”, 6. 

Martikainen, H. (2020). Enseigner une approche raisonnée de la
traduction automatique à l’ère du
numérique. Traduction et humanités
numériques. November 2020, Università
Ca’Foscari, Venice, Italy.
Martikainen, H., & Kübler, N. (2016). Ergonomie cognitive de la post-édition de
traduction automatique: Enjeux pour la qualité des
traductions, ILCEA. Revue de l’Institut des langues et cultures
d’Europe, Amérique, Afrique, Asie et
Australie 27. 

Mauranen, A. (2008). Universal
tendencies in
translation. In G. Anderman & M. Rogers (Eds.), Incorporating
corpora: The linguist and the
translator (pp. 32–48). Multilingual matters.
Monti, J., Seretan, V., Corpas Pastor, G. Mitkov, R. (2018). Multiword
units in machine translation and translation
technology. In R. Mitkov, J. Monti, G. Corpas Pastor & V. Seretan (Eds.), Multiword
units in machine translation and translation
technology, Current Issues in
Linguistic Theory,
341 (pp. 2–37). John Benjamins. 

O’Brien, S. (2002). Teaching
post-editing: A proposal for course
content. Proceedings of 6th
EAMT Workshop Teaching Machine
Translation, 99–106. Manchester, UK.
Pecman, M. (2007). Approche onomasiologique de la langue
scientifique générale. Revue française de linguistique appliquée «
Lexique des écrits scientifiques
», 12(2), 79–96. 

Pecman, M., & Kübler, N. (2011). ARTES:
An online lexical database for research and teaching in
specialized translation and
communication. Proceedings
from International Workshop on Lexical Resources (WoLeR)
2011, 86–93. 1–5 August 2011, Ljubljana, Slovenia.
Toral, A. (2019). Post-editese:
An exacerbated
Translationese, The 17th
Machine Translation
Summit. 19–23 August 2019, Dublin
City University, Dublin, Ireland.
Tutin A. (2007). Modélisation linguistique et annotation des
collocations: application au lexique transdisciplinaire des
écrits
scientifiques. In S. Koeva, D. Maurel, & M. Silberztein (Eds.), Formaliser les langues avec
l’ordinateur (pp. 189–215). Presses universitaires de Franche-Comté. 

Zanettin, F. (1998). Bilingual
comparable corpora and the training of
translators. META: Journal des traducteurs / Meta: Translators’
Journal, 43(4), 616–630. 
