Chapter 5. Evaluating a bracketing protocol for multiword terms

León-Araúz, Pilar; Cabezas-García, Melania

Part of

Recent Advances in Multiword Units in Machine Translation and Translation Technology
Edited by Johanna Monti, Gloria Corpas Pastor, Ruslan Mitkov and Carlos Manuel Hidalgo-Ternero
[Current Issues in Linguistic Theory 366] 2024
► pp. 79–102

Chapter 5
Evaluating a bracketing protocol for multiword terms

Pilar León-Araúz | University of Granada

Melania Cabezas-García | University of Granada

Multiword terms (MWTs) are frequently used to encapsulate and convey meaning in scientific and technical texts. However, they can also make these texts difficult to understand because the relations between constituents are not transparent. When MWTs have more than two constituents, a dependency analysis (bracketing) is often necessary to facilitate their interpretation. NLP has proposed various models to automatize bracketing operations, but none has been entirely satisfactory. This paper presents a protocol that combines various models and applies it to a set of three-constituent MWTs in order to: (i) sort rules by their disambiguation potential, based on their likelihood of retrieving results from any corpus and their ability to solve bracketing; and (ii) ascertain the influence of corpus size and type in the results obtained.

Keywords: multiword term, bracketing, structural disambiguation, corpus, terminology

Article outline

1.Introduction
2.Bracketing models
3.Materials and methods
- 3.1MWT extraction and manual bracketing
- 3.2Queries
- 3.3Bracketing rules
4.Results
- 4.1Rule comparison
  - 4.1.1Quantitative performance of the rules
  - 4.1.2Qualitative performance of the rules
  - 4.1.3Quantitative and qualitative performance of the rules
- 4.2Comparison of corpora
- 4.3Comparison of MWT bracketing
5.Conclusions
References
Appendix

This content is being prepared for publication; it may be subject to changes.

References (12)

References

Balyan, R. & Chatterjee, N. (2015). Translating noun compounds using semantic relations. Computer Speech and Language, 32, 91–108.

Barrière, C., & Ménard, P. A. (2014). Multiword noun compound bracketing using Wikipedia. In Proceedings of the First Workshop on Computational Approaches to Compound Analysis (pp. 72–80). ACL and Dublin City University.

Cabezas-García, M., & León-Araúz, P. (2019). On the structural disambiguation of multi-word terms. In G. Corpas Pastor & R. Mitkov (Eds.), Computational and corpus-based phraseology, Lecture Notes in Computer Science, 11755 (pp. 46–60). Springer.

Girju, R., Moldovan, D., Tatu, M., & Antohe, D. (2005). On the semantics of noun compounds. Computer Speech & Language, 19(4), 479–496.

Grefenstette, G. (1994). Explorations in automatic thesaurus discovery. Kluwer Academic Press.

Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The sketch engine. In G. Williams & S. Vessier (Eds.), Proceedings of the Eleventh EURALEX International Congress (pp. 105–116). EURALEX.

Lauer, M. (1995). Designing statistical language learners: Experiments on noun compounds. PhD dissertation. Macquarie University, Australia.

León-Araúz, P., Cabezas-García, M., & Faber, P. (2021). Multiword-term bracketing and representation in terminological knowledge bases. In Seventh Biennial Conference on Electronic Lexicography, eLex 2021 (pp. 139–163). Lexical Computing.

Marcus, M. (1980). A theory of syntactic recognition for natural language. MIT Press.

Nakov, P. (2007). Using the web as an implicit training set: Application to noun compound syntax and semantics. PhD dissertation. University of California at Berkeley.

Nakov, P., & Hearst, M. (2005). Search engine statistics beyond the n-gram: Application to noun compound bracketing. In Proceedings of the Ninth Conference on Computational Natural Language Learning, CoNLL 2005 (pp. 17–24). ACL.

Pustejovsky, J., Anick, P., & Bergler, S. (1993). Lexical semantic techniques for corpus analysis. Computational Linguistics, 19(2), 331–358.

Chapter 5Evaluating a bracketing protocol for multiword terms

Chapter 5
Evaluating a bracketing protocol for multiword terms