|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Replication Data for: CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media |
|
Identification Number: |
doi:10.7910/DVN/SS4LNN |
|
Distributor: |
Harvard Dataverse |
|
Date of Distribution: |
2019-06-25 |
|
Version: |
1 |
|
Bibliographic Citation: |
Pan, Jennifer; Zhang, Han, 2019, "Replication Data for: CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media", https://doi.org/10.7910/DVN/SS4LNN, Harvard Dataverse, V1, UNF:6:26KE5u8/rqgZAoNS8X8wXg== [fileUNF] |
|
Citation |
|
|
Title: |
Replication Data for: CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media |
|
Identification Number: |
doi:10.7910/DVN/SS4LNN |
|
Authoring Entity: |
Pan, Jennifer (Stanford University, Department of Communication) |
|
Zhang, Han (Princeton University, Department of Sociology) |
|
|
Distributor: |
Harvard Dataverse |
|
Access Authority: |
Pan, Jennifer |
|
Access Authority: |
Zhang, Pan |
|
Depositor: |
Pan, Jennifer |
|
Date of Deposit: |
2019-06-25 |
|
Holdings Information: |
https://doi.org/10.7910/DVN/SS4LNN |
|
Study Scope |
|
|
Keywords: |
Social Sciences, collective action, deep learning, event data, social media, China |
|
Abstract: |
Protest event analysis is an important method for the study of collective action and social movements and typically draws on traditional media reports as the data source. We introduce collective action from social media (CASM)—a system that uses convolutional neural networks on image data and recurrent neural networks with long short-term memory on text data in a two-stage classifier to identify social media posts about offline collective action. We implement CASM on Chinese social media data and identify more than 100,000 collective action events from 2010 to 2017 (CASM-China). We evaluate the performance of CASM through cross-validation, out-of-sample validation, and comparisons with other protest data sets. We assess the effect of online censorship and find it does not substantially limit our identification of events. Compared to other protest data sets, CASM-China identifies relatively more rural, land-related protests and relatively few collective action events related to ethnic and religious conflict. |
|
Notes: |
We recommend you view files in Tree structure and begin with the readme.txt |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Access |
|
|
Notes: |
<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a> |
|
Other Study Description Materials |
|
|
Related Publications |
|
|
Citation |
|
|
Title: |
Zhang Han and Jennifer Pan. 2019. “CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media” Sociological Methodology 49: 1-59. |
|
Identification Number: |
10.1177/0081175019860244 |
|
Bibliographic Citation: |
Zhang Han and Jennifer Pan. 2019. “CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media” Sociological Methodology 49: 1-59. |
|
File Description--f3455389 |
|
|
File: protest_events.tab |
|
|
|
|
Notes: |
UNF:6:wbuMQn7wm4tl11Io3al6oQ== |
|
File Description--f3455432 |
|
|
File: keyword_search_placebo_count.tab |
|
|
|
|
Notes: |
UNF:6:ZxN0FDr+XlhoWkcNZctu9w== |
|
Also used for plot/SuppFigure7 |
|
|
File Description--f3455413 |
|
|
File: firststage_c1_text_image_precision_recall.tab |
|
|
|
|
Notes: |
UNF:6:hxAPly5qKIGA1G6HfLHv1Q== |
|
File Description--f3455404 |
|
|
File: c1c2.tab |
|
|
|
|
Notes: |
UNF:6:rL3sAGsTp2RrS1yDdkgrcQ== |
|
File Description--f3455403 |
|
|
File: c1c2_precision_recall_crossvalidation.tab |
|
|
|
|
Notes: |
UNF:6:mkyvOgenEhyzR7efni3q8Q== |
|
File Description--f3455402 |
|
|
File: secondstage_c2_text_image_precision_recall.tab |
|
|
|
|
Notes: |
UNF:6:NKAiaoqeTkNJgn5gbpy7Cg== |
|
File Description--f3455410 |
|
|
File: keyword_vs_num_event_700000_fig4.tab |
|
|
|
|
Notes: |
UNF:6:BOr7j3rkVv5QTcQtfK23Tw== |
|
File Description--f3455426 |
|
|
File: precision_vs_keyword.tab |
|
|
|
|
Notes: |
UNF:6:qHEFiyMlqC8j0XwBEChMJQ== |
|
File Description--f3455429 |
|
|
File: firststage_c1_SVM_NB.tab |
|
|
|
|
Notes: |
UNF:6:VueXQuSkwhSwntSBqd6gWg== |
|
List of Variables: |
|
|
Variables |
|
|
f3455389 Location: |
Variable Format: character Notes: UNF:6:wbuMQn7wm4tl11Io3al6oQ== |
|
f3455432 Location: |
Variable Format: character Notes: UNF:6:ZxN0FDr+XlhoWkcNZctu9w== |
|
f3455413 Location: |
Variable Format: character Notes: UNF:6:/WKB02PwmgYsUaAvUUl4WA== |
|
f3455413 Location: |
Summary Statistics: Valid 6046.0; Mean 0.3928149639659638; Min. 0.16902784562774792; Max. 1.0; StDev 0.21272211487785242 Variable Format: numeric Notes: UNF:6:HMKRgUbpYsQmCO9gH/xqng== |
|
f3455413 Location: |
Summary Statistics: Max. 1.0; StDev 0.24434564628902133; Mean 0.8316877924352603; Min. 0.002890173410404624; Valid 6046.0 Variable Format: numeric Notes: UNF:6:XkzU2089TYovplWPOCq1sg== |
|
f3455404 Location: |
Variable Format: character Notes: UNF:6:VbmLr5BCRwmU99POlnD+WQ== |
|
f3455404 Location: |
Summary Statistics: Valid 2408.0; StDev 0.19843002381807667; Mean 0.6561784132954082; Min. 0.3862520458265139; Max. 1.0 Variable Format: numeric Notes: UNF:6:ePL9lu2FbYZ9gjX7w8j01A== |
|
f3455404 Location: |
Summary Statistics: Min. 0.00211864406779661; Valid 2408.0; Max. 1.0; StDev 0.2784397034783831; Mean 0.7100924179289374 Variable Format: numeric Notes: UNF:6:uiEp/74k1CTmxZPLsd+HmA== |
|
f3455403 Location: |
Variable Format: character Notes: UNF:6:fPFJY05MN31qzjjPmzkzpA== |
|
f3455403 Location: |
Summary Statistics: Valid 6413.0; Min. 0.35019241341396373; StDev 0.2056319379073097; Mean 0.8141775929693685; Max. 1.0 Variable Format: numeric Notes: UNF:6:bBv3mtaY3MeaqBDnvSY4VA== |
|
f3455403 Location: |
Summary Statistics: Max. 0.9996659242761693; StDev 0.3165098962640183; Min. 0.0015082956259426848; Mean 0.6208728005956752; Valid 6413.0 Variable Format: numeric Notes: UNF:6:K8kuyfoC2yYRo65HurFLlg== |
|
f3455402 Location: |
Variable Format: character Notes: UNF:6:RvEjpk9rT+6KKdRzq5zt9Q== |
|
f3455402 Location: |
Summary Statistics: Max. 1.0; Mean 0.6848123751991867; StDev 0.19064082014163491; Valid 3853.0; Min. 0.3965665236051502; Variable Format: numeric Notes: UNF:6:c3Q63cuB0AIvGzLYCeTvOg== |
|
f3455402 Location: |
Summary Statistics: Max. 1.0; Mean 0.7029771444704974; StDev 0.28192710238756225; Min. 0.0021598272138228943; Valid 3853.0 Variable Format: numeric Notes: UNF:6:4Fcvgiujxg8/SJElDdlYzg== |
|
f3455402 Location: |
Summary Statistics: Max. 1.0919091868846813; Mean 0.7705188097591968; Valid 3853.0; StDev 0.20437443582886783; Min. 0.16134203970432281 Variable Format: numeric Notes: UNF:6:9T2CstOvsCNfzAarpKYidA== |
|
f3455410 Location: |
Variable Format: character Notes: UNF:6:O/XMkrySXj7/stLtO+oGrw== |
|
f3455410 Location: |
Summary Statistics: Max. 3126.0; Mean 1306.645555555568; Valid 4500.0; Min. 2.0; StDev 775.2087042562923 Variable Format: numeric Notes: UNF:6:IRufL0DUEM+jNmdzQo6MJg== |
|
f3455410 Location: |
Summary Statistics: Valid 4500.0; Min. 1.0; StDev 14.432473386915156; Mean 25.5; Max. 50.0; Variable Format: numeric Notes: UNF:6:ikVCsyzjacSJi7zCXrf62Q== |
|
f3455426 Location: |
Summary Statistics: Mean 0.8924112448859461; StDev 0.1222086015892413; Max. 0.9831081081081081; Min. 0.24090203404994343; Valid 2560.0 Variable Format: numeric Notes: UNF:6:5yicMrkeNYov1JxVvD9UEw== |
|
f3455426 Location: |
Summary Statistics: Min. 10.0; StDev 14.14489856907796; Valid 2560.0; Max. 50.0; Mean 30.0 Variable Format: numeric Notes: UNF:6:19Zex4EAgUSxVumQ8R8b6w== |
|
f3455426 Location: |
Summary Statistics: Max. 1.0; StDev 0.21225415939853337; Min. 0.03178590933915893; Valid 2560.0; Mean 0.622310502768809; Variable Format: numeric Notes: UNF:6:rYMxh1EOHDUGy5TlCF94cQ== |
|
f3455429 Location: |
Variable Format: character Notes: UNF:6:TWro7UTlIZoER+ZhoImqnw== |
|
f3455429 Location: |
Summary Statistics: Min. 0.0; Max. 1.0; Mean 0.3698565312123263; Valid 6038.0; StDev 0.16754473718512444 Variable Format: numeric Notes: UNF:6:tBzZODFBTMf0oJ7mvsPsSw== |
|
f3455429 Location: |
Summary Statistics: Max. 1.0; Min. 0.0; Mean 0.8309617126216056; StDev 0.26159205941475533; Valid 6038.0; Variable Format: numeric Notes: UNF:6:I3gaMv8h51OylDenGlcv2g== |
|
Label: |
readme.txt |
|
Notes: |
text/plain |
|
Label: |
requirements.txt |
|
Notes: |
text/plain |
|
Label: |
CASM_c1_deep_text.py |
|
Notes: |
text/x-python-script |
|
Label: |
CASM_c2_deep_text.py |
|
Notes: |
text/x-python-script |
|
Label: |
CASM_generate_predicted_probability_image.py |
|
Notes: |
text/x-python-script |
|
Label: |
CASM_generate_predicted_probability_text.py |
|
Notes: |
text/x-python-script |
|
Label: |
common_operations.py |
|
Notes: |
text/x-python-script |
|
Label: |
dependency.py |
|
Notes: |
text/x-python-script |
|
Label: |
LSTM_text_dependency.py |
|
Notes: |
text/x-python-script |
|
Label: |
word_preprocessing.py |
|
Notes: |
text/x-python-script |
|
Label: |
image.json |
|
Notes: |
application/json |
|
Label: |
text-stage1.json |
|
Notes: |
application/json |
|
Label: |
text-stage2.json |
|
Notes: |
application/json |
|
Label: |
weights_image.h5 |
|
Notes: |
application/x-h5 |
|
Label: |
weights_text-stage1.h5 |
|
Notes: |
application/x-h5 |
|
Label: |
weights_text-stage2.hdf5 |
|
Notes: |
application/x-hdf5 |
|
Label: |
protest_posts.csv |
|
Notes: |
text/csv |
|
Label: |
protest_region_count_prefecture_log_scale.R |
|
Notes: |
type/x-r-syntax |
|
Label: |
CN-shi-A.dbf |
|
Notes: |
application/dbf |
|
Label: |
CN-shi-A.prj |
|
Notes: |
application/prj |
|
Label: |
CN-shi-A.sbn |
|
Notes: |
application/sbn |
|
Label: |
CN-shi-A.sbx |
|
Notes: |
application/sbx |
|
Label: |
CN-shi-A.shp |
|
Notes: |
application/shp |
|
Label: |
CN-shi-A.shx |
|
Notes: |
application/shx |
|
Label: |
CASM_divide_irrelevant.R |
|
Notes: |
type/x-r-syntax |
|
Label: |
CASM_Wickedonna.R |
|
Notes: |
type/x-r-syntax |
|
Label: |
wicked_events_2018-11-29.csv |
|
Notes: |
text/csv |
|
Label: |
firststage_c1_text_image_precision_recall.pdf |
|
Notes: |
application/pdf |
|
Label: |
firststage_c1_text_image_precision_recall.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
c1c2_vs_c1_precision_recall.pdf |
|
Notes: |
application/pdf |
|
Label: |
c1c2_vs_c1_precision_recall.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
c1c2_vs_cross_validation_precision_recall.pdf |
|
Notes: |
application/pdf |
|
Label: |
c1c2_vs_cross_validation_precision_recall.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
keywordCoverageSet.txt |
|
Notes: |
text/plain |
|
Label: |
keyword_coverage_wickedonna.pdf |
|
Notes: |
application/pdf |
|
Label: |
supfig1.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
keywordCoverageSet_keyword_year.txt |
|
Notes: |
text/plain |
|
Label: |
keyword_coverage_wickedonna_by_year.pdf |
|
Notes: |
application/pdf |
|
Label: |
supfig2.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
keywordCoverageSet_keyword_prov.txt |
|
Notes: |
text/plain |
|
Label: |
keyword_coverage_wickedonna_by_province.pdf |
|
Notes: |
application/pdf |
|
Label: |
supfig3.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
keyword_vs_num_event_700000.pdf |
|
Notes: |
application/pdf |
|
Label: |
supfig4.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
precision_vs_keyword.pdf |
|
Notes: |
application/pdf |
|
Label: |
supfig5.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
firststage_c1_SVM_NB_precision_recall.pdf |
|
Notes: |
application/pdf |
|
Label: |
supfig6.r |
|
Notes: |
type/x-r-syntax |
|
Label: |
CASM_keyword_irrelevant_count.pdf |
|
Notes: |
application/pdf |
|
Label: |
supfig7.R |
|
Notes: |
type/x-r-syntax |
|
Label: |
high_frequency_protest_words.txt |
|
Notes: |
text/plain |
|
Label: |
jieba.dict.big.txt |
|
Notes: |
text/plain |
|
Label: |
propaganda_media_words.txt |
|
Notes: |
text/plain |
|
Label: |
stopwords1.txt |
|
Notes: |
text/plain |
|
Label: |
village_level_dict.txt |
|
Notes: |
text/plain |
|
Label: |
village_level_dict_reversed.txt |
|
Notes: |
text/plain |
|
Label: |
vocab_pos_grievance.dict |
|
Notes: |
application/octet-stream |
|
Label: |
vocab_pos_KGP_50000.dict |
|
Notes: |
application/octet-stream |