Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches (doi:10.7910/DVN/SGI0VK)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description
Citation
Title:	Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches
Identification Number:	doi:10.7910/DVN/SGI0VK
Distributor:	Harvard Dataverse
Date of Distribution:	2026-01-27
Version:	1
Bibliographic Citation:	Blinova, Daria; Gayathri Emuru; Rakesh Emuru; Kushagradheer Shridheer Srivastava; Rulis, Mina; Sunita Chandrasekaran; Bagozzi, Benjamin, 2026, "Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches", https://doi.org/10.7910/DVN/SGI0VK, Harvard Dataverse, V1
Study Description
Citation
Title:	Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches
Identification Number:	doi:10.7910/DVN/SGI0VK
Authoring Entity:	Blinova, Daria (University of Delaware)
	Gayathri Emuru (University of Delaware)
	Rakesh Emuru (University of Delaware)
	Kushagradheer Shridheer Srivastava (University of Delaware)
	Rulis, Mina (University of Pennsylvania)
	Sunita Chandrasekaran (University of Delaware)
	Bagozzi, Benjamin (University of Delaware)
Distributor:	Harvard Dataverse
Access Authority:	Bagozzi, Benjamin
Depositor:	Bagozzi, Benjamin
Date of Deposit:	2026-01-13
Holdings Information:	https://doi.org/10.7910/DVN/SGI0VK
Study Scope
Keywords:	Computer and Information Science, Social Sciences
Abstract:	This Dataverse entry incldues a dataset of interlinked multimodal political communications from the Russian government, addressing persistent deficiencies in the availability of social text- and image-based data for authoritarian politics contexts. The dataset comprises two large corpora of official speeches delivered by senior actors within the Kremlin and the Russian Ministry of Foreign Affairs over multiple decades. For each speech, we provide Russian- and English-language texts, associated images and captions where available, and harmonized metadata including (e.g.) dates, speakers, (geo)locations, and official government content tags. Unique identifiers link images to speeches and align Russian and English versions of the same communication texts. We further augment these linked datasets with validated topical annotations for both speech texts and speech images, which are generated via transformer-based multimodal topic modeling and refined by a Russian politics expert. The resulting data resources support multimodal, multilingual, temporal, and/or spatial analyses of (authoritarian) political communication and offer a valuable testbed for social science research and large language model (LLM) applications in political domains.
Methodology and Processing
Sources Statement
Data Access
Notes:	<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>
Other Study Description Materials
Other Study-Related Materials
Label:	kremlin_english_images.zip
Notes:	application/zip
Other Study-Related Materials
Label:	kremlin_mid_en_ru_auxiliary_files.zip
Text:	Auxiliary outputs from BERTopic topic modeling for Kremlin & MID corpora (EN/RU): interactive HTML topic explorers and long-format topic-probability files for text and images
Notes:	application/zip
Other Study-Related Materials
Label:	kremlin_mid_en_ru_final_csvs.zip
Text:	Final curated CSVs for all four corpora (Kremlin EN/RU, MID EN/RU), including metadata + curated text/image topic IDs, labels, groups, and topic probabilities.
Notes:	application/zip
Other Study-Related Materials
Label:	kremlin_russian_images.zip
Notes:	application/zip
Other Study-Related Materials
Label:	mid_english_images.zip
Notes:	application/zip
Other Study-Related Materials
Label:	mid_russian_images.zip
Notes:	application/zip
Other Study-Related Materials
Label:	README.txt
Notes:	text/plain