Replication Data for: CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media (doi:10.7910/DVN/SS4LNN)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Replication Data for: CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media

Identification Number:

doi:10.7910/DVN/SS4LNN

Distributor:

Harvard Dataverse

Date of Distribution:

2019-06-25

Version:

1

Bibliographic Citation:

Pan, Jennifer; Zhang, Han, 2019, "Replication Data for: CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media", https://doi.org/10.7910/DVN/SS4LNN, Harvard Dataverse, V1, UNF:6:26KE5u8/rqgZAoNS8X8wXg== [fileUNF]

Study Description

Citation

Title:

Replication Data for: CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media

Identification Number:

doi:10.7910/DVN/SS4LNN

Authoring Entity:

Pan, Jennifer (Stanford University, Department of Communication)

Zhang, Han (Princeton University, Department of Sociology)

Distributor:

Harvard Dataverse

Access Authority:

Pan, Jennifer

Access Authority:

Zhang, Pan

Depositor:

Pan, Jennifer

Date of Deposit:

2019-06-25

Holdings Information:

https://doi.org/10.7910/DVN/SS4LNN

Study Scope

Keywords:

Social Sciences, collective action, deep learning, event data, social media, China

Abstract:

Protest event analysis is an important method for the study of collective action and social movements and typically draws on traditional media reports as the data source. We introduce collective action from social media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classifier to identify social media posts about offline collective action. We implement CASM on Chinese social media data and identify more than 100,000 collective action events from 2010 to 2017 (CASM-China). We evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other protest data sets. We assess the effect of online censorship and find it does not substantially limit our identification of events. Compared to other protest data sets, CASM-China identifies relatively more rural, land-related protests and relatively few collective action events related to ethnic and religious conflict.

Notes:

We recommend you view files in Tree structure and begin with the readme.txt

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Zhang Han and Jennifer Pan. 2019. “CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media” Sociological Methodology 49: 1-59.

Identification Number:

10.1177/0081175019860244

Bibliographic Citation:

Zhang Han and Jennifer Pan. 2019. “CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media” Sociological Methodology 49: 1-59.

File Description--f3455389

File: protest_events.tab

  • Number of cases: 136330

  • No. of variables per record: 1

  • Type of File: text/tab-separated-values

Notes:

UNF:6:wbuMQn7wm4tl11Io3al6oQ==

File Description--f3455432

File: keyword_search_placebo_count.tab

  • Number of cases: 180

  • No. of variables per record: 1

  • Type of File: text/tab-separated-values

Notes:

UNF:6:ZxN0FDr+XlhoWkcNZctu9w==

Also used for plot/SuppFigure7

File Description--f3455413

File: firststage_c1_text_image_precision_recall.tab

  • Number of cases: 6046

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:6:hxAPly5qKIGA1G6HfLHv1Q==

File Description--f3455404

File: c1c2.tab

  • Number of cases: 2408

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:6:rL3sAGsTp2RrS1yDdkgrcQ==

File Description--f3455403

File: c1c2_precision_recall_crossvalidation.tab

  • Number of cases: 6413

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:6:mkyvOgenEhyzR7efni3q8Q==

File Description--f3455402

File: secondstage_c2_text_image_precision_recall.tab

  • Number of cases: 3853

  • No. of variables per record: 4

  • Type of File: text/tab-separated-values

Notes:

UNF:6:NKAiaoqeTkNJgn5gbpy7Cg==

File Description--f3455410

File: keyword_vs_num_event_700000_fig4.tab

  • Number of cases: 4500

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:6:BOr7j3rkVv5QTcQtfK23Tw==

File Description--f3455426

File: precision_vs_keyword.tab

  • Number of cases: 2560

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:6:qHEFiyMlqC8j0XwBEChMJQ==

File Description--f3455429

File: firststage_c1_SVM_NB.tab

  • Number of cases: 6038

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:6:VueXQuSkwhSwntSBqd6gWg==

Variable Description

List of Variables:

Variables

event_id forms issues

f3455389 Location:

Variable Format: character

Notes: UNF:6:wbuMQn7wm4tl11Io3al6oQ==

month count_all month_numeric quarter Dataset year

f3455432 Location:

Variable Format: character

Notes: UNF:6:ZxN0FDr+XlhoWkcNZctu9w==

label

f3455413 Location:

Variable Format: character

Notes: UNF:6:/WKB02PwmgYsUaAvUUl4WA==

precision

f3455413 Location:

Summary Statistics: Valid 6046.0; Mean 0.3928149639659638; Min. 0.16902784562774792; Max. 1.0; StDev 0.21272211487785242

Variable Format: numeric

Notes: UNF:6:HMKRgUbpYsQmCO9gH/xqng==

recall

f3455413 Location:

Summary Statistics: Max. 1.0; StDev 0.24434564628902133; Mean 0.8316877924352603; Min. 0.002890173410404624; Valid 6046.0

Variable Format: numeric

Notes: UNF:6:XkzU2089TYovplWPOCq1sg==

label

f3455404 Location:

Variable Format: character

Notes: UNF:6:VbmLr5BCRwmU99POlnD+WQ==

precision

f3455404 Location:

Summary Statistics: Valid 2408.0; StDev 0.19843002381807667; Mean 0.6561784132954082; Min. 0.3862520458265139; Max. 1.0

Variable Format: numeric

Notes: UNF:6:ePL9lu2FbYZ9gjX7w8j01A==

recall

f3455404 Location:

Summary Statistics: Min. 0.00211864406779661; Valid 2408.0; Max. 1.0; StDev 0.2784397034783831; Mean 0.7100924179289374

Variable Format: numeric

Notes: UNF:6:uiEp/74k1CTmxZPLsd+HmA==

label

f3455403 Location:

Variable Format: character

Notes: UNF:6:fPFJY05MN31qzjjPmzkzpA==

precision

f3455403 Location:

Summary Statistics: Valid 6413.0; Min. 0.35019241341396373; StDev 0.2056319379073097; Mean 0.8141775929693685; Max. 1.0

Variable Format: numeric

Notes: UNF:6:bBv3mtaY3MeaqBDnvSY4VA==

recall

f3455403 Location:

Summary Statistics: Max. 0.9996659242761693; StDev 0.3165098962640183; Min. 0.0015082956259426848; Mean 0.6208728005956752; Valid 6413.0

Variable Format: numeric

Notes: UNF:6:K8kuyfoC2yYRo65HurFLlg==

label

f3455402 Location:

Variable Format: character

Notes: UNF:6:RvEjpk9rT+6KKdRzq5zt9Q==

precision

f3455402 Location:

Summary Statistics: Max. 1.0; Mean 0.6848123751991867; StDev 0.19064082014163491; Valid 3853.0; Min. 0.3965665236051502;

Variable Format: numeric

Notes: UNF:6:c3Q63cuB0AIvGzLYCeTvOg==

recall

f3455402 Location:

Summary Statistics: Max. 1.0; Mean 0.7029771444704974; StDev 0.28192710238756225; Min. 0.0021598272138228943; Valid 3853.0

Variable Format: numeric

Notes: UNF:6:4Fcvgiujxg8/SJElDdlYzg==

threshold

f3455402 Location:

Summary Statistics: Max. 1.0919091868846813; Mean 0.7705188097591968; Valid 3853.0; StDev 0.20437443582886783; Min. 0.16134203970432281

Variable Format: numeric

Notes: UNF:6:9T2CstOvsCNfzAarpKYidA==

month

f3455410 Location:

Variable Format: character

Notes: UNF:6:O/XMkrySXj7/stLtO+oGrw==

nunique

f3455410 Location:

Summary Statistics: Max. 3126.0; Mean 1306.645555555568; Valid 4500.0; Min. 2.0; StDev 775.2087042562923

Variable Format: numeric

Notes: UNF:6:IRufL0DUEM+jNmdzQo6MJg==

rank

f3455410 Location:

Summary Statistics: Valid 4500.0; Min. 1.0; StDev 14.432473386915156; Mean 25.5; Max. 50.0;

Variable Format: numeric

Notes: UNF:6:ikVCsyzjacSJi7zCXrf62Q==

precision

f3455426 Location:

Summary Statistics: Mean 0.8924112448859461; StDev 0.1222086015892413; Max. 0.9831081081081081; Min. 0.24090203404994343; Valid 2560.0

Variable Format: numeric

Notes: UNF:6:5yicMrkeNYov1JxVvD9UEw==

rank

f3455426 Location:

Summary Statistics: Min. 10.0; StDev 14.14489856907796; Valid 2560.0; Max. 50.0; Mean 30.0

Variable Format: numeric

Notes: UNF:6:19Zex4EAgUSxVumQ8R8b6w==

recall

f3455426 Location:

Summary Statistics: Max. 1.0; StDev 0.21225415939853337; Min. 0.03178590933915893; Valid 2560.0; Mean 0.622310502768809;

Variable Format: numeric

Notes: UNF:6:rYMxh1EOHDUGy5TlCF94cQ==

label

f3455429 Location:

Variable Format: character

Notes: UNF:6:TWro7UTlIZoER+ZhoImqnw==

precision

f3455429 Location:

Summary Statistics: Min. 0.0; Max. 1.0; Mean 0.3698565312123263; Valid 6038.0; StDev 0.16754473718512444

Variable Format: numeric

Notes: UNF:6:tBzZODFBTMf0oJ7mvsPsSw==

recall

f3455429 Location:

Summary Statistics: Max. 1.0; Min. 0.0; Mean 0.8309617126216056; StDev 0.26159205941475533; Valid 6038.0;

Variable Format: numeric

Notes: UNF:6:I3gaMv8h51OylDenGlcv2g==

Other Study-Related Materials

Label:

readme.txt

Notes:

text/plain

Other Study-Related Materials

Label:

requirements.txt

Notes:

text/plain

Other Study-Related Materials

Label:

CASM_c1_deep_text.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

CASM_c2_deep_text.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

CASM_generate_predicted_probability_image.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

CASM_generate_predicted_probability_text.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

common_operations.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

dependency.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

LSTM_text_dependency.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

word_preprocessing.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

image.json

Notes:

application/json

Other Study-Related Materials

Label:

text-stage1.json

Notes:

application/json

Other Study-Related Materials

Label:

text-stage2.json

Notes:

application/json

Other Study-Related Materials

Label:

weights_image.h5

Notes:

application/x-h5

Other Study-Related Materials

Label:

weights_text-stage1.h5

Notes:

application/x-h5

Other Study-Related Materials

Label:

weights_text-stage2.hdf5

Notes:

application/x-hdf5

Other Study-Related Materials

Label:

protest_posts.csv

Notes:

text/csv

Other Study-Related Materials

Label:

protest_region_count_prefecture_log_scale.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

CN-shi-A.dbf

Notes:

application/dbf

Other Study-Related Materials

Label:

CN-shi-A.prj

Notes:

application/prj

Other Study-Related Materials

Label:

CN-shi-A.sbn

Notes:

application/sbn

Other Study-Related Materials

Label:

CN-shi-A.sbx

Notes:

application/sbx

Other Study-Related Materials

Label:

CN-shi-A.shp

Notes:

application/shp

Other Study-Related Materials

Label:

CN-shi-A.shx

Notes:

application/shx

Other Study-Related Materials

Label:

CASM_divide_irrelevant.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

CASM_Wickedonna.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

wicked_events_2018-11-29.csv

Notes:

text/csv

Other Study-Related Materials

Label:

firststage_c1_text_image_precision_recall.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

firststage_c1_text_image_precision_recall.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

c1c2_vs_c1_precision_recall.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

c1c2_vs_c1_precision_recall.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

c1c2_vs_cross_validation_precision_recall.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

c1c2_vs_cross_validation_precision_recall.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

keywordCoverageSet.txt

Notes:

text/plain

Other Study-Related Materials

Label:

keyword_coverage_wickedonna.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

supfig1.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

keywordCoverageSet_keyword_year.txt

Notes:

text/plain

Other Study-Related Materials

Label:

keyword_coverage_wickedonna_by_year.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

supfig2.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

keywordCoverageSet_keyword_prov.txt

Notes:

text/plain

Other Study-Related Materials

Label:

keyword_coverage_wickedonna_by_province.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

supfig3.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

keyword_vs_num_event_700000.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

supfig4.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

precision_vs_keyword.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

supfig5.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

firststage_c1_SVM_NB_precision_recall.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

supfig6.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

CASM_keyword_irrelevant_count.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

supfig7.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

high_frequency_protest_words.txt

Notes:

text/plain

Other Study-Related Materials

Label:

jieba.dict.big.txt

Notes:

text/plain

Other Study-Related Materials

Label:

propaganda_media_words.txt

Notes:

text/plain

Other Study-Related Materials

Label:

stopwords1.txt

Notes:

text/plain

Other Study-Related Materials

Label:

village_level_dict.txt

Notes:

text/plain

Other Study-Related Materials

Label:

village_level_dict_reversed.txt

Notes:

text/plain

Other Study-Related Materials

Label:

vocab_pos_grievance.dict

Notes:

application/octet-stream

Other Study-Related Materials

Label:

vocab_pos_KGP_50000.dict

Notes:

application/octet-stream