Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches (doi:10.7910/DVN/SGI0VK)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches

Identification Number:

doi:10.7910/DVN/SGI0VK

Distributor:

Harvard Dataverse

Date of Distribution:

2026-01-27

Version:

1

Bibliographic Citation:

Blinova, Daria; Gayathri Emuru; Rakesh Emuru; Kushagradheer Shridheer Srivastava; Rulis, Mina; Sunita Chandrasekaran; Bagozzi, Benjamin, 2026, "Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches", https://doi.org/10.7910/DVN/SGI0VK, Harvard Dataverse, V1

Study Description

Citation

Title:

Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches

Identification Number:

doi:10.7910/DVN/SGI0VK

Authoring Entity:

Blinova, Daria (University of Delaware)

Gayathri Emuru (University of Delaware)

Rakesh Emuru (University of Delaware)

Kushagradheer Shridheer Srivastava (University of Delaware)

Rulis, Mina (University of Pennsylvania)

Sunita Chandrasekaran (University of Delaware)

Bagozzi, Benjamin (University of Delaware)

Distributor:

Harvard Dataverse

Access Authority:

Bagozzi, Benjamin

Depositor:

Bagozzi, Benjamin

Date of Deposit:

2026-01-13

Holdings Information:

https://doi.org/10.7910/DVN/SGI0VK

Study Scope

Keywords:

Computer and Information Science, Social Sciences

Abstract:

This Dataverse entry incldues a dataset of interlinked multimodal political communications from the Russian government, addressing persistent deficiencies in the availability of social text- and image-based data for authoritarian politics contexts. The dataset comprises two large corpora of official speeches delivered by senior actors within the Kremlin and the Russian Ministry of Foreign Affairs over multiple decades. For each speech, we provide Russian- and English-language texts, associated images and captions where available, and harmonized metadata including (e.g.) dates, speakers, (geo)locations, and official government content tags. Unique identifiers link images to speeches and align Russian and English versions of the same communication texts. We further augment these linked datasets with validated topical annotations for both speech texts and speech images, which are generated via transformer-based multimodal topic modeling and refined by a Russian politics expert. The resulting data resources support multimodal, multilingual, temporal, and/or spatial analyses of (authoritarian) political communication and offer a valuable testbed for social science research and large language model (LLM) applications in political domains.

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

Other Study-Related Materials

Label:

kremlin_english_images.zip

Notes:

application/zip

Other Study-Related Materials

Label:

kremlin_mid_en_ru_auxiliary_files.zip

Text:

Auxiliary outputs from BERTopic topic modeling for Kremlin & MID corpora (EN/RU): interactive HTML topic explorers and long-format topic-probability files for text and images

Notes:

application/zip

Other Study-Related Materials

Label:

kremlin_mid_en_ru_final_csvs.zip

Text:

Final curated CSVs for all four corpora (Kremlin EN/RU, MID EN/RU), including metadata + curated text/image topic IDs, labels, groups, and topic probabilities.

Notes:

application/zip

Other Study-Related Materials

Label:

kremlin_russian_images.zip

Notes:

application/zip

Other Study-Related Materials

Label:

mid_english_images.zip

Notes:

application/zip

Other Study-Related Materials

Label:

mid_russian_images.zip

Notes:

application/zip

Other Study-Related Materials

Label:

README.txt

Notes:

text/plain