|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches |
|
Identification Number: |
doi:10.7910/DVN/SGI0VK |
|
Distributor: |
Harvard Dataverse |
|
Date of Distribution: |
2026-01-27 |
|
Version: |
1 |
|
Bibliographic Citation: |
Blinova, Daria; Gayathri Emuru; Rakesh Emuru; Kushagradheer Shridheer Srivastava; Rulis, Mina; Sunita Chandrasekaran; Bagozzi, Benjamin, 2026, "Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches", https://doi.org/10.7910/DVN/SGI0VK, Harvard Dataverse, V1 |
|
Citation |
|
|
Title: |
Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches |
|
Identification Number: |
doi:10.7910/DVN/SGI0VK |
|
Authoring Entity: |
Blinova, Daria (University of Delaware) |
|
Gayathri Emuru (University of Delaware) |
|
|
Rakesh Emuru (University of Delaware) |
|
|
Kushagradheer Shridheer Srivastava (University of Delaware) |
|
|
Rulis, Mina (University of Pennsylvania) |
|
|
Sunita Chandrasekaran (University of Delaware) |
|
|
Bagozzi, Benjamin (University of Delaware) |
|
|
Distributor: |
Harvard Dataverse |
|
Access Authority: |
Bagozzi, Benjamin |
|
Depositor: |
Bagozzi, Benjamin |
|
Date of Deposit: |
2026-01-13 |
|
Holdings Information: |
https://doi.org/10.7910/DVN/SGI0VK |
|
Study Scope |
|
|
Keywords: |
Computer and Information Science, Social Sciences |
|
Abstract: |
This Dataverse entry incldues a dataset of interlinked multimodal political communications from the Russian government, addressing persistent deficiencies in the availability of social text- and image-based data for authoritarian politics contexts. The dataset comprises two large corpora of official speeches delivered by senior actors within the Kremlin and the Russian Ministry of Foreign Affairs over multiple decades. For each speech, we provide Russian- and English-language texts, associated images and captions where available, and harmonized metadata including (e.g.) dates, speakers, (geo)locations, and official government content tags. Unique identifiers link images to speeches and align Russian and English versions of the same communication texts. We further augment these linked datasets with validated topical annotations for both speech texts and speech images, which are generated via transformer-based multimodal topic modeling and refined by a Russian politics expert. The resulting data resources support multimodal, multilingual, temporal, and/or spatial analyses of (authoritarian) political communication and offer a valuable testbed for social science research and large language model (LLM) applications in political domains. |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Access |
|
|
Notes: |
<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a> |
|
Other Study Description Materials |
|
|
Label: |
kremlin_english_images.zip |
|
Notes: |
application/zip |
|
Label: |
kremlin_mid_en_ru_auxiliary_files.zip |
|
Text: |
Auxiliary outputs from BERTopic topic modeling for Kremlin & MID corpora (EN/RU): interactive HTML topic explorers and long-format topic-probability files for text and images |
|
Notes: |
application/zip |
|
Label: |
kremlin_mid_en_ru_final_csvs.zip |
|
Text: |
Final curated CSVs for all four corpora (Kremlin EN/RU, MID EN/RU), including metadata + curated text/image topic IDs, labels, groups, and topic probabilities. |
|
Notes: |
application/zip |
|
Label: |
kremlin_russian_images.zip |
|
Notes: |
application/zip |
|
Label: |
mid_english_images.zip |
|
Notes: |
application/zip |
|
Label: |
mid_russian_images.zip |
|
Notes: |
application/zip |
|
Label: |
README.txt |
|
Notes: |
text/plain |