United Nations General Debate Corpus 1946-2025 (doi:10.7910/DVN/0TJX8Y)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

United Nations General Debate Corpus 1946-2025

Identification Number:

doi:10.7910/DVN/0TJX8Y

Distributor:

Harvard Dataverse

Date of Distribution:

2017-05-12

Version:

14

Bibliographic Citation:

Jankin, Slava; Baturo, Alexander; Dasandi, Niheer, 2017, "United Nations General Debate Corpus 1946-2025", https://doi.org/10.7910/DVN/0TJX8Y, Harvard Dataverse, V14

Study Description

Citation

Title:

United Nations General Debate Corpus 1946-2025

Identification Number:

doi:10.7910/DVN/0TJX8Y

Authoring Entity:

Jankin, Slava (University of Birmingham)

Baturo, Alexander (Dublin City University)

Dasandi, Niheer (University of Birmingham)

Producer:

Jankin, Slava

Date of Production:

2026

Distributor:

Harvard Dataverse

Distributor:

Jankin, Slava

Access Authority:

Jankin, Slava

Depositor:

Jankin, Slava

Date of Deposit:

2026

Holdings Information:

https://doi.org/10.7910/DVN/0TJX8Y

Study Scope

Keywords:

Arts and Humanities, Computer and Information Science, Social Sciences, United Nations, UN General Assembly General Debate, Global Affairs, text as data, natural language processing

Topic Classification:

International Relations, Political Science, Natural Language Processing, Quantitative Text Analysis

Abstract:

Every year since 1946, the General Debate has taken place at the beginning of the UN General Assembly session. Representatives from all UN member states deliver an address, discussing the issues they consider most important in global politics, revealing their governments’ positions, and seeking to persuade other states of their perspectives. The annual UN General Debate statements provide invaluable information for scholars of international relations – comparable globally and over time. However, these texts are stored as poor quality images without relevant metadata, preventing researchers from applying data science methods. This paper introduces the complete UN General Debate Corpus (UNGDC). Building on a previous incomplete release of UNGDC, we have extended the corpus to cover the entire 1946-present period, included additional data on all speakers, and provided advanced search and data visualisation tools on a <a href="https://www.ungdc.bham.ac.uk">new website</a>. The complete corpus contains 11,141 speeches from 202 countries, including historical countries – making it the most comprehensive, unique, and accessible collection of global political speeches.

Time Period:

1946-2025

Kind of Data:

text corpus

Kind of Data:

collection of speeches

Notes:

This release contains UN General Assembly General Debate speeches 1946-2025. Additional information is available from <a href="https://www.ungdc.bham.ac.uk">here</a>.

Methodology and Processing

Sources Statement

Data Sources:

United Nations Library

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Jankin, S., Baturo, A., & Dasandi, N. (2025). Words to unite nations: The complete United Nations General Debate Corpus, 1946–present. Journal of Peace Research, 62(4), 1339-1351.

Identification Number:

10.1177/00223433241275335

Bibliographic Citation:

Jankin, S., Baturo, A., & Dasandi, N. (2025). Words to unite nations: The complete United Nations General Debate Corpus, 1946–present. Journal of Peace Research, 62(4), 1339-1351.

Other Study-Related Materials

Label:

README.txt

Notes:

text/plain

Other Study-Related Materials

Label:

Speakers_by_session.xlsx

Notes:

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Other Study-Related Materials

Label:

UNGDC_1946-2025.tar.gz

Text:

UN General Debate Corpus

Notes:

application/x-gzip