Subject index
A
- accessibility
5, 30, 73, 75, 77–79, 81, 89, 91–97, 100, 109–110, 112
- annotation
11–12, 16, 28–29, 40, 52, 55–58, 61–64, 75, 83, 90, 94–95, 142–143, 145, 148, 153–155, 161, 164–165
- contact influence
148, 153–154, 161–165
- multilingual language use
57–58, 64
- part-of-speech (POS)
11–17, 28, 75, 83
B
- balance
11, 22–23, 29, 74, 76, 78, 84, 93
- big-data(see very large corpora)
- bilingualism
145, 149, 154, 156, 160–163
- British Library Newspapers database
69–85
- British National Corpus (BNC)
4, 21, 40–43, 45–48, 51, 98, 100, 127, 135
C
- Clean Corpus of Historical American English51
- codeswitching
56, 58, 158, 160, 165
- collocation
36–37, 48–51, 83, 126, 137
- common nouns
37–38, 40–45, 158
- compilation
2, 4–6, 9–10, 16, 22, 29–30, 56, 59, 63–65, 68–69, 76–79, 85, 93–94, 98–101, 109–110, 131, 135, 137–138
- computational linguistics
39, 51
- Constituent Likelihood Automatic Word-tagging System (CLAWS)
12, 15, 40, 45
- contact-induced semantic shifts
62, 142–165
- Corpus of Contemporary American English (COCA)
14, 21, 29, 45, 94, 127, 134–135
- Corpus of Global Web-based English (GloWbE)
42–44
- Corpus of Historical American English (COHA)
13–14, 18–23, 29, 94, 127, 134
D
- databases
4–5, 10–11, 23, 29–30, 69, 72, 74–75, 85
- diachronic corpora
9, 12, 18, 22, 29, 45, 129, 131, 147
- Digital Humanities
24, 30, 68–73, 75–76, 81, 85, 130
- discourse of deficit
55–56, 61
- dispersion
2, 20–21, 29, 42, 52, 116
E
- Eighteenth Century Collections Online (ECCO)
11, 23–27, 30
- EF Cambridge Open Language Database (EFCAMDAT)59
F
- false positives
57, 91, 143, 154–155, 157–162, 165
- formality
134, 136, 145, 154, 156, 158, 160
- foreignizing
56, 58, 61, 63
G
- genre
4, 6, 10–11, 22–23, 29, 58, 78–79, 93, 108–110, 115, 121, 126–139
- genre categorization/classification
18, 126–139
- genre evolution
15–16, 22, 132
- God’s truth fallacy
1, 3, 24, 36, 52, 89–90, 93, 99
H
- historical corpora
9–13, 15–16, 22, 27–30, 69, 131
- historical corpus linguistics
4, 9–30, 36, 69–85
- historical lexis / historical spelling
12, 17, 26–27
- historical text databases
10–11, 23–27, 29–30
- homographs
143, 156, 158–162
I
- interlanguage
56–58, 60–63
- International Corpus of English (ICE)
59, 62
- International Corpus of Learner English (ICLE)
59, 62, 127
K
- keyword analysis
74, 126, 137
L
- language contact
56, 143, 145, 148, 155–165
- language variation
10, 142–143, 146, 165
- learner corpus (research)
55–65, 127
- lengthwise analysis
106, 111, 118–120
- lengthwise scaling
118–120
- lexical diversity
108, 114, 122–123
- Lost Generation Corpus
129, 138
- Louvain International Database of Spoken English Interlanguage (LINDSEI)57
M
- metadata
11, 22–24, 26–27, 29–30, 64–65, 69, 75, 81, 93, 95, 98, 116–117, 146, 156
- multilingualism
56–58, 61–64
- multiple correspondence analysis
120–121
- multi-word units
36–37, 39–41, 45
- Mystery of the Vanishing Reliability
1, 89–90, 94, 99
N
- n-grams
59, 75, 100, 126, 137
- named entity recognition
36, 39, 51, 75, 127
- natural language processing (NLP)
6, 27, 36, 39, 51, 60, 147
- neural networks
27, 147, 152, 164
- neural word embeddings
142–144, 147–148, 152–153
- News on the Web Corpus (NOW)
49, 51
- normalization
26, 107–109, 118, 121
O
- Open American National Corpus
98, 100
- optical character recognition (OCR)
11–12, 24–27, 29, 74, 81–85
P
- Parsed Corpus of Early English Correspondence (PCEEC)
15–17, 28
- Philologist’s dilemma
1, 9–11, 28, 89, 92–93, 99
- POS (part-of-speech) category change(see word class change)
- precision
2, 25, 29, 35, 39, 82–83
- proper nouns / proper names
12, 17, 36–48, 158–160
Q
- query building
15–16, 25–27, 29, 51
- quotations
36–37, 58, 97–98, 101
R
- reference corpora
93, 96, 128
- regional variation
43, 49, 143, 145–146, 155, 164–165
- register
69, 73, 78–81, 84–85, 112, 136
- register analysis
79, 84, 111, 120
- reliability
1, 11, 24, 36, 51, 90, 94, 115, 120, 146
- replicability
64, 73, 89–92, 99, 101–102
- representativeness
2–3, 10, 22–23, 29, 36, 68, 76–81, 84–85, 90, 92–94, 109, 127
- resampling
93, 120–121, 123
S
- sampling
2, 9–11, 18, 20–22, 24–27, 29, 59, 65, 69, 76–78, 80, 84, 92–93, 121–122, 145
- semantic change
144–145, 147–148, 150–151, 165
(see also contact-induced semantic shifts)
- Spanish Learner Language Oral Corpora (SPLLOC)58
- (near-)synonyms
41, 45–50, 156
T
- task instruction / task effects
55–56, 59–61, 64
- text categorization(see text type)
- text sampling(see sampling)
- Twitter
110, 117, 146, 149
- Twitter corpora
101, 121, 142–165
V
- vector space models (VSMs)
147–149
- very large corpora
9–11, 18, 27–28, 30, 93, 112, 120
- Vocabulary-Based Discourse Unit117
W
- word-class change
12–15, 28
- writing prompt
55–56, 59, 64
This content is being prepared for publication; it may be subject to changes.