326025912
03
01
01
JB
John Benjamins Publishing Company
01
JB code
SCL 121 Eb
15
9789027246462
06
10.1075/scl.121
13
2024033458
DG
002
02
01
SCL
02
1388-0373
Studies in Corpus Linguistics
121
01
Investigating Wikipedia
Linguistic corpus building, exploration and analysis
01
scl.121
01
https://benjamins.com
02
https://benjamins.com/catalog/scl.121
1
B01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d'Azur
2
B01
Harald Lüngen
Lüngen, Harald
Harald
Lüngen
Leibniz-Institut für Deutsche Sprache
3
B01
Laura Herzberg
Herzberg, Laura
Laura
Herzberg
University of Mannheim
01
eng
275
vi
261
+ index
LAN009000
v.2006
CFX
2
24
JB Subject Scheme
LIN.APPL
Applied linguistics
24
JB Subject Scheme
LIN.COMPUT
Computational & corpus linguistics
24
JB Subject Scheme
LIN.CORP
Corpus linguistics
06
01
The present volume is intended as a reference book on Wikipedia corpus studies, from corpus construction to exploration and analysis. Wikipedia is a complex object, difficult to manipulate for linguists and corpus researchers. In addition to the encyclopedic articles consulted by millions of users, it contains vast spaces of written discussions, aka talk pages, where Wikipedia authors negotiate the collaborative editing of articles, make evaluations, or discuss related topics. The proposed volume covers Wikipedia articles, their revision histories, and discussions, with a focus on discussions, which have not been studied extensively so far and have also been neglected in previous corpus building efforts. Wikipedia discussions are instances of computer-mediated communication (CMC), thus constituting a completely different, interaction-oriented linguistic genre. Sophisticated tools and methods of linguistic annotation and corpus exploration are needed to exploit the huge and valuable corpus resources that can be constructed from the Wikipedia discussions. The present volume aims at encouraging and facilitating Wikipedia corpus studies, providing standards, recommendations, and innovative methods to build and explore Wikipedia corpora, and presenting corpus studies that make the most of the peculiarities of Wikipedia.
04
09
01
https://benjamins.com/covers/475/scl.121.png
04
03
01
https://benjamins.com/covers/475_jpg/9789027215963.jpg
04
03
01
https://benjamins.com/covers/475_tif/9789027215963.tif
06
09
01
https://benjamins.com/covers/1200_front/scl.121.hb.png
07
09
01
https://benjamins.com/covers/125/scl.121.png
25
09
01
https://benjamins.com/covers/1200_back/scl.121.hb.png
27
09
01
https://benjamins.com/covers/3d_web/scl.121.hb.png
10
01
JB code
scl.121.toc
v
vi
2
Miscellaneous
1
01
Table of contents
10
01
JB code
scl.121.int
1
10
10
Chapter
2
01
Introduction
1
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azur
2
A01
Harald Lüngen
Lüngen, Harald
Harald
Lüngen
Leibniz-Institut für Deutsche Sprache
3
A01
Laura Herzberg
Herzberg, Laura
Laura
Herzberg
University of Mannheim
4
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
10
01
JB code
scl.121.p1
Section header
3
01
Part I. Building Wikipedia corpora
10
01
JB code
scl.121.01hod
12
44
33
Chapter
4
01
Chapter 1. Building a comparable corpus of online discussions on Wikipedia
The EFG WikiCorpus
1
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
20
corpus building
20
TEI CMC-core
20
Wikipedia talk pages
01
This chapter presents the EFG WikiCorpus, a corpus composed of all the <target target-type="index-entry-marker">talk pages</target> dedicated to (co)writing an article in the English, French and German <target target-type="index-entry-marker">Wikipedia</target>s. This chapter explains the place of talk pages in Wikipedia and describes what is the basic structure of a talk page before detailing the building process of the EFG WikiCorpus: from the Wikipedia archives to a TEI resource encoded according to the <target target-type="index-entry-marker">TEI CMC-core</target> schema. It concludes with a quantitative overview of the EFG WikiCorpus and the EFG WikiDemoCorpus, a derived subcorpus used for qualitative analyses in various contributions of this volume.
10
01
JB code
scl.121.02kra
45
74
30
Chapter
5
01
Chapter 2. Mining parallel corpora from Wikipedia
1
A01
Olivier Kraif
Kraif, Olivier
Olivier
Kraif
(Université Grenoble Alpes, LIDILEM)
20
parallel
corpora
20
sentence alignment
20
translation
20
Wikipedia as a corpus
01
In this article, we address the issue of Wikipedia as a multilingual resource to extract <target target-type="index-entry-marker">parallel corpora</target> that are useful in multilingual terminology extraction or machine translation. While most previous work in this field assumes that Wikipedia is suitable for mining <target target-type="index-entry-marker">comparable corpora</target>, we concentrate on the actual place of translation in the editorial process of Wikipedia to examine the possibility of extracting parallel corpora, that is, texts where source segments can be linked to their translations. After identifying the different projects, tools and recommendations that allow contributors to enrich Wikipedia by exercising their skills as translators, we conduct an experiment in which we download pairs of articles containing translations. We show the importance of performing a temporal alignment of the versions to be downloaded before launching the actual sentence-level <target target-type="index-entry-marker">alignment</target>. This strategy allows us to obtain a large volume of parallel texts with good-quality sentence-to-sentence <target target-type="index-entry-marker">alignment</target>.
10
01
JB code
scl.121.p2
75
1
Section header
6
01
Part II. Interactions on Wikipedia talk pages
10
01
JB code
scl.121.03tan
76
106
31
Chapter
7
01
Chapter 3. Exploring interactions in Wikipedia talk pages
1
A01
Ludovic Tanguy
Tanguy, Ludovic
Ludovic
Tanguy
University of Toulouse Jean Jaurès, CLLE
2
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azur
3
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
20
interaction patterns
20
monologues
20
multi-party conversations
20
online discussions
20
Wikipedia talk pages
01
In this chapter we analyze how users interact on <target target-type="index-entry-marker">Wikipedia talk pages</target>, focusing on the patterns that emerge from a large corpus of 5 million threads across three languages. These patterns take three simple features into account: who posts, when they post, and after whom they post. We begin with an overview and a closer examination of some extreme behaviours: threads with the highest number of users and posts, or those with the longest duration. We then propose fine-grained typologies to analyze the content of a sample of monologal and trilogal openings. This analysis allows us to identify recurring behaviours, some of which we consider specific to the Wikipedia and its contributors, and others more common in online discussions.
10
01
JB code
scl.121.04her
107
133
27
Chapter
8
01
Chapter 4. Investigating reply relations on Wikipedia talk pages to reconstruct interactional strategies of Wikipedia authors
1
A01
Laura Herzberg
Herzberg, Laura
Laura
Herzberg
University of Mannheim
2
A01
Harald Lüngen
Lüngen, Harald
Harald
Lüngen
Leibniz-Institut für Deutsche Sprache
20
CMC
20
computer-mediated communication
20
interactional structures
20
referencing strategies
20
reply relations
20
Wikipedia talk pages
01
This chapter presents the <target target-type="index-entry-marker">annotation</target> and analysis of interpretative <target target-type="index-entry-marker">reply relations</target> on Wikipedia talk pages using data from the WikiDemoCorpus (WDC). Building on an approach of annotating interpretative <target target-type="index-entry-marker">reply relations</target> to analyze these relations in Wikipedia talk page posts, the chapter presents nine reply relation categories found in the German WDC. Additionally, linguistic cues for each category and the Wikipedia discussion pages overall are explained in detail, illustrated through reply relation targets. The results of the linguistic annotation are threefold: First, we provide an annotation scheme that can be used by third parties to produce more data according to their needs. Second, we shed light on and quantify the numerous ways Wikipedia authors reply to each other’s posts on talk pages. Finally, we provide richly annotated data that can be used for further analyses, such as identifying interactional relations on higher levels or training tasks in machine learning algorithms.
10
01
JB code
scl.121.05gre
134
154
21
Chapter
9
01
Chapter 5. <i>Sockpuppets</i> , <i>Wikifants,</i> and <i>Honeypots</i>
Metaphorical patterns in digital discourses on Wikipedia talk pages
1
A01
Eva Gredel
Gredel, Eva
Eva
Gredel
University of Duisburg-Essen
20
collaboration
20
corpus linguistics
20
discourse
20
metaphors
20
Wikipedia
01
This chapter analyses the discursive construction of <target target-type="index-entry-marker">collaborative</target> text production via <target target-type="index-entry-marker">metaphors</target> on <target target-type="index-entry-marker">talk pages</target> of the English, French, and German Wikipedia. The cross-linguistic study makes use of the <target target-type="index-entry-marker">WikiDemoCorpus</target> (WDC) to qualitatively reveal the source domains of these metaphors. Situated within the framework of <target target-type="index-entry-marker">discourse analysis</target> in the tradition of Foucault (1966, 1969, 1971), the study analyses how Wikipedia authors use metaphors to create a certain point of view in their collaborations on talk pages. Additionally, the <target target-type="index-entry-marker">Wikipedia corpora</target> available at the Leibniz Institute for the German Language (IDS) via the Corpus Search, Management and Analysis System II (COSMAS II) are used for exemplary quantification. Finally, the potential of further corpus linguistics methods for metaphor identification and analysis is discussed.
10
01
JB code
scl.121.p3
155
1
Section header
10
01
Part III. Visualizing and exploring cooperation and conflicts in Wikipedia
10
01
JB code
scl.121.06lan
156
177
22
Chapter
11
01
Chapter 6. Exploring the evolution of Wikipedia articles through Contropedia
1
A01
David Laniado
Laniado, David
David
Laniado
Eurecat – Technology Centre of Catalonia
2
A01
Michele Mauri
Mauri, Michele
Michele
Mauri
Politecnico di Milano
3
A01
Erik Borra
Borra, Erik
Erik
Borra
Politecnico di Milano
20
article evolution
20
Contropedia
20
controversy
mapping
20
data visualization
20
Wikipedia revision
histories
01
Wikipedia is not just a corpus of encyclopedic articles; it also includes an entire edit <target target-type="index-entry-marker">history</target> that details the evolution of each article. However, such records are often unknown to the general public, and too complex for researchers to use. <target target-type="index-entry-marker">Contropedia</target> uses innovative techniques to allow for the visualization, exploration and investigation of the evolution of Wikipedia articles and disputes about their content. This chapter introduces <target target-type="index-entry-marker">Contropedia</target> and provides an in-depth analysis of two articles from the WikiDemoCorpus: the English language article ‘Chiropractic’ and a multilingual comparison of articles on the European migration crisis. These use cases illustrate Contropedia’s visual and analytical modules for analyzing <target target-type="index-entry-marker">controversies</target> within an article and for cross-cultural comparison.
10
01
JB code
scl.121.07flo
178
206
29
Chapter
12
01
Chapter 7. Live exploration of Wikipedia editing dynamics with visual analytics
WhoColor and Interactive Wikipedia Article Analysis Notebooks
1
A01
Fabian Flöck
Flöck, Fabian
Fabian
Flöck
2
A01
Roberto Ulloa
Ulloa, Roberto
Roberto
Ulloa
GESIS – Leibniz Institute for the Social Sciences
3
A01
Maribel Acosta
Acosta, Maribel
Maribel
Acosta
Technical University of Munich
20
IWAAN
20
visual analytics
20
Wikipedia revision history
20
WikiWho
01
The <target target-type="index-entry-marker">revision histories</target> of Wikipedia articles are a rich source of data about the interactions of editors with each other and with the content, yet they are not straightforward to mine or understand. We describe two tools for visual analytics that support this effort: (i) An interactive browser extension to study word authorship, age, and <target target-type="index-entry-marker">conflict</target> dynamics, which provides an overlay on live Wikipedia articles; and (ii) a novel interactive Jupyter Notebook package that allows us to run analyses of editorial dynamics out-of-the-box and is easily modifiable. Both leverage live data for any article on demand from several Web APIs, centering on our own WikiWho service, providing the most accurate mining of live word-level changes currently available. We show how these tools enable the exploration of the survival of content, productivity of editors, conflict dynamics, and other metrics through low-barrier interfaces while providing the opportunity for more quantitative investigations via access to the notebooks’ underlying data structures.
10
01
JB code
scl.121.08pou
207
236
30
Chapter
13
01
Chapter 8. Disagreements and conflicts in Wikipedia talk pages
1
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azur
2
A01
Marie Chandelier
Chandelier, Marie
Marie
Chandelier
Laboratoire BCL Université Côte d’Azur
20
annotation
20
conflicts
20
disagreements
20
pragmatics
20
Wikipedia talk pages
01
Collaborative situations inevitably generate <target target-type="index-entry-marker">disagreements</target> and even <target target-type="index-entry-marker">conflicts</target>. Over the last decade, Wikipedia has been extensively studied, but mostly from the perspective of social sciences. Conflicts have been described in great detail, along with phenomena such as <target target-type="index-entry-marker">quality</target>, coordination, or in relation to maintenance work. However, most of the studies available did not define, nor measure conflicts from a linguistic perspective. The following chapter concentrates on two types of <target target-type="index-entry-marker">speech act</target>s, which we consider to be of particular significance, namely speech acts expressing disagreement on the one hand, and a set of clearly offensive speech acts, such as <target target-type="index-entry-marker">insults</target> or derogatory language, on the other hand. Our approach is <target target-type="index-entry-marker">corpus-based</target>: the two typologies which are presented in our chapter were developed based on an <target target-type="index-entry-marker">annotation of speech acts</target> in a corpus of <target target-type="index-entry-marker">Wikipedia discussions</target> in French, partly extracted from the WikiDemoCorpus. The chapter then expands on the annotation which was carried out, and on the specific features of conflicts and disagreements in Wikipedia.
10
01
JB code
scl.121.09car
237
264
28
Chapter
14
01
Chapter 9. To each their own truth
Epistemic regimes on Wikipedia talk pages
1
A01
Guillaume Carbou
Carbou, Guillaume
Guillaume
Carbou
Université de Bordeaux, SPH
2
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
3
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azure
4
A01
Gilles Sahut
Sahut, Gilles
Gilles
Sahut
University of Toulouse Jean Jaurès, LERASS
20
conception of “truth”
20
epistemic regime
20
Wikipedia rules
20
Wikipedia talk pages
01
This chapter presents five <target target-type="index-entry-marker">epistemic regimes</target> we defined for characterizing the conception of “truth” that underpin <target target-type="index-entry-marker">debates</target> amongst Wikipedian editors on talk pages. The chapter starts by defining each epistemic regime and how it could be understood according to the Wikipedia rules. Then, we propose a method to evaluate the operationality of these regimes from a corpus standpoint and to analyse to what extent these regimes are adopted by Wikipedians. The analysis of the <target target-type="index-entry-marker">annotation</target> of 324 posts extracted from the French WikiDemoCorpus largely supports our epistemic model and provides new insights into the ideological backgrounds of Wikipedians.
10
01
JB code
scl.121.index
265
266
2
Miscellaneous
15
01
Index
02
JBENJAMINS
John Benjamins Publishing Company
01
John Benjamins Publishing Company
Amsterdam/Philadelphia
NL
02
October 2024
20241015
2024
John Benjamins B.V.
02
WORLD
13
15
9789027215963
01
JB
3
John Benjamins e-Platform
03
jbe-platform.com
09
WORLD
10
20241015
01
00
120.00
EUR
R
01
00
101.00
GBP
Z
01
gen
00
156.00
USD
S
525025911
03
01
01
JB
John Benjamins Publishing Company
01
JB code
SCL 121 Hb
15
9789027215963
13
2024033457
BB
01
SCL
02
1388-0373
Studies in Corpus Linguistics
121
01
Investigating Wikipedia
Linguistic corpus building, exploration and analysis
01
scl.121
01
https://benjamins.com
02
https://benjamins.com/catalog/scl.121
1
B01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d'Azur
2
B01
Harald Lüngen
Lüngen, Harald
Harald
Lüngen
Leibniz-Institut für Deutsche Sprache
3
B01
Laura Herzberg
Herzberg, Laura
Laura
Herzberg
University of Mannheim
01
eng
275
vi
261
+ index
LAN009000
v.2006
CFX
2
24
JB Subject Scheme
LIN.APPL
Applied linguistics
24
JB Subject Scheme
LIN.COMPUT
Computational & corpus linguistics
24
JB Subject Scheme
LIN.CORP
Corpus linguistics
06
01
The present volume is intended as a reference book on Wikipedia corpus studies, from corpus construction to exploration and analysis. Wikipedia is a complex object, difficult to manipulate for linguists and corpus researchers. In addition to the encyclopedic articles consulted by millions of users, it contains vast spaces of written discussions, aka talk pages, where Wikipedia authors negotiate the collaborative editing of articles, make evaluations, or discuss related topics. The proposed volume covers Wikipedia articles, their revision histories, and discussions, with a focus on discussions, which have not been studied extensively so far and have also been neglected in previous corpus building efforts. Wikipedia discussions are instances of computer-mediated communication (CMC), thus constituting a completely different, interaction-oriented linguistic genre. Sophisticated tools and methods of linguistic annotation and corpus exploration are needed to exploit the huge and valuable corpus resources that can be constructed from the Wikipedia discussions. The present volume aims at encouraging and facilitating Wikipedia corpus studies, providing standards, recommendations, and innovative methods to build and explore Wikipedia corpora, and presenting corpus studies that make the most of the peculiarities of Wikipedia.
04
09
01
https://benjamins.com/covers/475/scl.121.png
04
03
01
https://benjamins.com/covers/475_jpg/9789027215963.jpg
04
03
01
https://benjamins.com/covers/475_tif/9789027215963.tif
06
09
01
https://benjamins.com/covers/1200_front/scl.121.hb.png
07
09
01
https://benjamins.com/covers/125/scl.121.png
25
09
01
https://benjamins.com/covers/1200_back/scl.121.hb.png
27
09
01
https://benjamins.com/covers/3d_web/scl.121.hb.png
10
01
JB code
scl.121.toc
v
vi
2
Miscellaneous
1
01
Table of contents
10
01
JB code
scl.121.int
1
10
10
Chapter
2
01
Introduction
1
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azur
2
A01
Harald Lüngen
Lüngen, Harald
Harald
Lüngen
Leibniz-Institut für Deutsche Sprache
3
A01
Laura Herzberg
Herzberg, Laura
Laura
Herzberg
University of Mannheim
4
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
10
01
JB code
scl.121.p1
Section header
3
01
Part I. Building Wikipedia corpora
10
01
JB code
scl.121.01hod
12
44
33
Chapter
4
01
Chapter 1. Building a comparable corpus of online discussions on Wikipedia
The EFG WikiCorpus
1
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
20
corpus building
20
TEI CMC-core
20
Wikipedia talk pages
01
This chapter presents the EFG WikiCorpus, a corpus composed of all the <target target-type="index-entry-marker">talk pages</target> dedicated to (co)writing an article in the English, French and German <target target-type="index-entry-marker">Wikipedia</target>s. This chapter explains the place of talk pages in Wikipedia and describes what is the basic structure of a talk page before detailing the building process of the EFG WikiCorpus: from the Wikipedia archives to a TEI resource encoded according to the <target target-type="index-entry-marker">TEI CMC-core</target> schema. It concludes with a quantitative overview of the EFG WikiCorpus and the EFG WikiDemoCorpus, a derived subcorpus used for qualitative analyses in various contributions of this volume.
10
01
JB code
scl.121.02kra
45
74
30
Chapter
5
01
Chapter 2. Mining parallel corpora from Wikipedia
1
A01
Olivier Kraif
Kraif, Olivier
Olivier
Kraif
(Université Grenoble Alpes, LIDILEM)
20
parallel
corpora
20
sentence alignment
20
translation
20
Wikipedia as a corpus
01
In this article, we address the issue of Wikipedia as a multilingual resource to extract <target target-type="index-entry-marker">parallel corpora</target> that are useful in multilingual terminology extraction or machine translation. While most previous work in this field assumes that Wikipedia is suitable for mining <target target-type="index-entry-marker">comparable corpora</target>, we concentrate on the actual place of translation in the editorial process of Wikipedia to examine the possibility of extracting parallel corpora, that is, texts where source segments can be linked to their translations. After identifying the different projects, tools and recommendations that allow contributors to enrich Wikipedia by exercising their skills as translators, we conduct an experiment in which we download pairs of articles containing translations. We show the importance of performing a temporal alignment of the versions to be downloaded before launching the actual sentence-level <target target-type="index-entry-marker">alignment</target>. This strategy allows us to obtain a large volume of parallel texts with good-quality sentence-to-sentence <target target-type="index-entry-marker">alignment</target>.
10
01
JB code
scl.121.p2
75
1
Section header
6
01
Part II. Interactions on Wikipedia talk pages
10
01
JB code
scl.121.03tan
76
106
31
Chapter
7
01
Chapter 3. Exploring interactions in Wikipedia talk pages
1
A01
Ludovic Tanguy
Tanguy, Ludovic
Ludovic
Tanguy
University of Toulouse Jean Jaurès, CLLE
2
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azur
3
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
20
interaction patterns
20
monologues
20
multi-party conversations
20
online discussions
20
Wikipedia talk pages
01
In this chapter we analyze how users interact on <target target-type="index-entry-marker">Wikipedia talk pages</target>, focusing on the patterns that emerge from a large corpus of 5 million threads across three languages. These patterns take three simple features into account: who posts, when they post, and after whom they post. We begin with an overview and a closer examination of some extreme behaviours: threads with the highest number of users and posts, or those with the longest duration. We then propose fine-grained typologies to analyze the content of a sample of monologal and trilogal openings. This analysis allows us to identify recurring behaviours, some of which we consider specific to the Wikipedia and its contributors, and others more common in online discussions.
10
01
JB code
scl.121.04her
107
133
27
Chapter
8
01
Chapter 4. Investigating reply relations on Wikipedia talk pages to reconstruct interactional strategies of Wikipedia authors
1
A01
Laura Herzberg
Herzberg, Laura
Laura
Herzberg
University of Mannheim
2
A01
Harald Lüngen
Lüngen, Harald
Harald
Lüngen
Leibniz-Institut für Deutsche Sprache
20
CMC
20
computer-mediated communication
20
interactional structures
20
referencing strategies
20
reply relations
20
Wikipedia talk pages
01
This chapter presents the <target target-type="index-entry-marker">annotation</target> and analysis of interpretative <target target-type="index-entry-marker">reply relations</target> on Wikipedia talk pages using data from the WikiDemoCorpus (WDC). Building on an approach of annotating interpretative <target target-type="index-entry-marker">reply relations</target> to analyze these relations in Wikipedia talk page posts, the chapter presents nine reply relation categories found in the German WDC. Additionally, linguistic cues for each category and the Wikipedia discussion pages overall are explained in detail, illustrated through reply relation targets. The results of the linguistic annotation are threefold: First, we provide an annotation scheme that can be used by third parties to produce more data according to their needs. Second, we shed light on and quantify the numerous ways Wikipedia authors reply to each other’s posts on talk pages. Finally, we provide richly annotated data that can be used for further analyses, such as identifying interactional relations on higher levels or training tasks in machine learning algorithms.
10
01
JB code
scl.121.05gre
134
154
21
Chapter
9
01
Chapter 5. <i>Sockpuppets</i> , <i>Wikifants,</i> and <i>Honeypots</i>
Metaphorical patterns in digital discourses on Wikipedia talk pages
1
A01
Eva Gredel
Gredel, Eva
Eva
Gredel
University of Duisburg-Essen
20
collaboration
20
corpus linguistics
20
discourse
20
metaphors
20
Wikipedia
01
This chapter analyses the discursive construction of <target target-type="index-entry-marker">collaborative</target> text production via <target target-type="index-entry-marker">metaphors</target> on <target target-type="index-entry-marker">talk pages</target> of the English, French, and German Wikipedia. The cross-linguistic study makes use of the <target target-type="index-entry-marker">WikiDemoCorpus</target> (WDC) to qualitatively reveal the source domains of these metaphors. Situated within the framework of <target target-type="index-entry-marker">discourse analysis</target> in the tradition of Foucault (1966, 1969, 1971), the study analyses how Wikipedia authors use metaphors to create a certain point of view in their collaborations on talk pages. Additionally, the <target target-type="index-entry-marker">Wikipedia corpora</target> available at the Leibniz Institute for the German Language (IDS) via the Corpus Search, Management and Analysis System II (COSMAS II) are used for exemplary quantification. Finally, the potential of further corpus linguistics methods for metaphor identification and analysis is discussed.
10
01
JB code
scl.121.p3
155
1
Section header
10
01
Part III. Visualizing and exploring cooperation and conflicts in Wikipedia
10
01
JB code
scl.121.06lan
156
177
22
Chapter
11
01
Chapter 6. Exploring the evolution of Wikipedia articles through Contropedia
1
A01
David Laniado
Laniado, David
David
Laniado
Eurecat – Technology Centre of Catalonia
2
A01
Michele Mauri
Mauri, Michele
Michele
Mauri
Politecnico di Milano
3
A01
Erik Borra
Borra, Erik
Erik
Borra
Politecnico di Milano
20
article evolution
20
Contropedia
20
controversy
mapping
20
data visualization
20
Wikipedia revision
histories
01
Wikipedia is not just a corpus of encyclopedic articles; it also includes an entire edit <target target-type="index-entry-marker">history</target> that details the evolution of each article. However, such records are often unknown to the general public, and too complex for researchers to use. <target target-type="index-entry-marker">Contropedia</target> uses innovative techniques to allow for the visualization, exploration and investigation of the evolution of Wikipedia articles and disputes about their content. This chapter introduces <target target-type="index-entry-marker">Contropedia</target> and provides an in-depth analysis of two articles from the WikiDemoCorpus: the English language article ‘Chiropractic’ and a multilingual comparison of articles on the European migration crisis. These use cases illustrate Contropedia’s visual and analytical modules for analyzing <target target-type="index-entry-marker">controversies</target> within an article and for cross-cultural comparison.
10
01
JB code
scl.121.07flo
178
206
29
Chapter
12
01
Chapter 7. Live exploration of Wikipedia editing dynamics with visual analytics
WhoColor and Interactive Wikipedia Article Analysis Notebooks
1
A01
Fabian Flöck
Flöck, Fabian
Fabian
Flöck
2
A01
Roberto Ulloa
Ulloa, Roberto
Roberto
Ulloa
GESIS – Leibniz Institute for the Social Sciences
3
A01
Maribel Acosta
Acosta, Maribel
Maribel
Acosta
Technical University of Munich
20
IWAAN
20
visual analytics
20
Wikipedia revision history
20
WikiWho
01
The <target target-type="index-entry-marker">revision histories</target> of Wikipedia articles are a rich source of data about the interactions of editors with each other and with the content, yet they are not straightforward to mine or understand. We describe two tools for visual analytics that support this effort: (i) An interactive browser extension to study word authorship, age, and <target target-type="index-entry-marker">conflict</target> dynamics, which provides an overlay on live Wikipedia articles; and (ii) a novel interactive Jupyter Notebook package that allows us to run analyses of editorial dynamics out-of-the-box and is easily modifiable. Both leverage live data for any article on demand from several Web APIs, centering on our own WikiWho service, providing the most accurate mining of live word-level changes currently available. We show how these tools enable the exploration of the survival of content, productivity of editors, conflict dynamics, and other metrics through low-barrier interfaces while providing the opportunity for more quantitative investigations via access to the notebooks’ underlying data structures.
10
01
JB code
scl.121.08pou
207
236
30
Chapter
13
01
Chapter 8. Disagreements and conflicts in Wikipedia talk pages
1
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azur
2
A01
Marie Chandelier
Chandelier, Marie
Marie
Chandelier
Laboratoire BCL Université Côte d’Azur
20
annotation
20
conflicts
20
disagreements
20
pragmatics
20
Wikipedia talk pages
01
Collaborative situations inevitably generate <target target-type="index-entry-marker">disagreements</target> and even <target target-type="index-entry-marker">conflicts</target>. Over the last decade, Wikipedia has been extensively studied, but mostly from the perspective of social sciences. Conflicts have been described in great detail, along with phenomena such as <target target-type="index-entry-marker">quality</target>, coordination, or in relation to maintenance work. However, most of the studies available did not define, nor measure conflicts from a linguistic perspective. The following chapter concentrates on two types of <target target-type="index-entry-marker">speech act</target>s, which we consider to be of particular significance, namely speech acts expressing disagreement on the one hand, and a set of clearly offensive speech acts, such as <target target-type="index-entry-marker">insults</target> or derogatory language, on the other hand. Our approach is <target target-type="index-entry-marker">corpus-based</target>: the two typologies which are presented in our chapter were developed based on an <target target-type="index-entry-marker">annotation of speech acts</target> in a corpus of <target target-type="index-entry-marker">Wikipedia discussions</target> in French, partly extracted from the WikiDemoCorpus. The chapter then expands on the annotation which was carried out, and on the specific features of conflicts and disagreements in Wikipedia.
10
01
JB code
scl.121.09car
237
264
28
Chapter
14
01
Chapter 9. To each their own truth
Epistemic regimes on Wikipedia talk pages
1
A01
Guillaume Carbou
Carbou, Guillaume
Guillaume
Carbou
Université de Bordeaux, SPH
2
A01
Lydia-Mai Ho-Dac
Ho-Dac, Lydia-Mai
Lydia-Mai
Ho-Dac
University of Toulouse Jean Jaurès, CLLE
3
A01
Céline Poudat
Poudat, Céline
Céline
Poudat
Laboratoire BCL Université Côte d’Azure
4
A01
Gilles Sahut
Sahut, Gilles
Gilles
Sahut
University of Toulouse Jean Jaurès, LERASS
20
conception of “truth”
20
epistemic regime
20
Wikipedia rules
20
Wikipedia talk pages
01
This chapter presents five <target target-type="index-entry-marker">epistemic regimes</target> we defined for characterizing the conception of “truth” that underpin <target target-type="index-entry-marker">debates</target> amongst Wikipedian editors on talk pages. The chapter starts by defining each epistemic regime and how it could be understood according to the Wikipedia rules. Then, we propose a method to evaluate the operationality of these regimes from a corpus standpoint and to analyse to what extent these regimes are adopted by Wikipedians. The analysis of the <target target-type="index-entry-marker">annotation</target> of 324 posts extracted from the French WikiDemoCorpus largely supports our epistemic model and provides new insights into the ideological backgrounds of Wikipedians.
10
01
JB code
scl.121.index
265
266
2
Miscellaneous
15
01
Index
02
JBENJAMINS
John Benjamins Publishing Company
01
John Benjamins Publishing Company
Amsterdam/Philadelphia
NL
02
October 2024
20241015
2024
John Benjamins B.V.
02
WORLD
01
JB
1
John Benjamins Publishing Company
+31 20 6304747
+31 20 6739773
bookorder@benjamins.nl
01
https://benjamins.com
01
WORLD
US CA MX
10
20241015
01
02
JB
1
00
120.00
EUR
R
02
02
JB
1
00
127.20
EUR
R
01
JB
10
bebc
+44 1202 712 934
+44 1202 712 913
sales@bebc.co.uk
03
GB
10
20241015
02
02
JB
1
00
101.00
GBP
Z
01
JB
2
John Benjamins North America
+1 800 562-5666
+1 703 661-1501
benjamins@presswarehouse.com
01
https://benjamins.com
01
US CA MX
10
20241015
01
gen
02
JB
1
00
156.00
USD