Chapter 5
Evaluating a bracketing protocol for multiword terms
Multiword terms (MWTs) are frequently used to
encapsulate and convey meaning in scientific and technical texts.
However, they can also make these texts difficult to understand
because the relations between constituents are not transparent. When
MWTs have more than two constituents, a dependency analysis
(bracketing) is often necessary to facilitate their interpretation.
NLP has proposed various models to automatize bracketing operations,
but none has been entirely satisfactory. This paper presents a
protocol that combines various models and applies it to a set of
three-constituent MWTs in order to: (i) sort rules by their
disambiguation potential, based on their likelihood of retrieving
results from any corpus and their ability to solve bracketing; and
(ii) ascertain the influence of corpus size and type in the results
obtained.
Article outline
- 1.Introduction
- 2.Bracketing models
- 3.Materials and methods
- 3.1MWT extraction and manual bracketing
- 3.2Queries
- 3.3Bracketing rules
- 4.Results
- 4.1Rule comparison
- 4.1.1Quantitative performance of the rules
- 4.1.2Qualitative performance of the rules
- 4.1.3Quantitative and qualitative performance of the
rules
- 4.2Comparison of corpora
- 4.3Comparison of MWT bracketing
- 5.Conclusions
-
References
-
Appendix
This content is being prepared for publication; it may be subject to changes.
References (12)
References
Balyan, R. & Chatterjee, N. (2015). Translating
noun compounds using semantic
relations. Computer Speech
and
Language, 32, 91–108. 

Barrière, C., & Ménard, P. A. (2014). Multiword
noun compound bracketing using
Wikipedia. In Proceedings
of the First Workshop on Computational Approaches to
Compound
Analysis (pp. 72–80). ACL and Dublin City University. 

Cabezas-García, M., & León-Araúz, P. (2019). On
the structural disambiguation of multi-word
terms. In G. Corpas Pastor & R. Mitkov (Eds.), Computational
and corpus-based
phraseology, Lecture Notes in
Computer Science,
11755 (pp. 46–60). Springer. 

Girju, R., Moldovan, D., Tatu, M., & Antohe, D. (2005). On
the semantics of noun
compounds. Computer Speech
&
Language, 19(4), 479–496. 

Grefenstette, G. (1994). Explorations
in automatic thesaurus
discovery. Kluwer Academic Press. 

Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The
sketch
engine. In G. Williams & S. Vessier (Eds.), Proceedings
of the Eleventh EURALEX International
Congress (pp. 105–116). EURALEX.
Lauer, M. (1995). Designing
statistical language learners: Experiments on noun
compounds. PhD
dissertation. Macquarie University, Australia.
León-Araúz, P., Cabezas-García, M., & Faber, P. (2021). Multiword-term
bracketing and representation in terminological knowledge
bases. In Seventh
Biennial Conference on Electronic Lexicography,
eLex
2021 (pp. 139–163). Lexical Computing.
Marcus, M. (1980). A
theory of syntactic recognition for natural
language. MIT Press.
Nakov, P. (2007). Using
the web as an implicit training set: Application to noun
compound syntax and
semantics. PhD
dissertation. University of
California at
Berkeley.
Nakov, P., & Hearst, M. (2005). Search
engine statistics beyond the n-gram: Application to noun
compound
bracketing. In Proceedings
of the Ninth Conference on Computational Natural
Language Learning, CoNLL
2005 (pp. 17–24). ACL. 
Pustejovsky, J., Anick, P., & Bergler, S. (1993). Lexical
semantic techniques for corpus
analysis. Computational
Linguistics, 19(2), 331–358.