Global Artificial Intelligence Indicator Database (GAID), 1998–2025 (Version 2) (doi:10.7910/DVN/PUMGYU)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Global Artificial Intelligence Indicator Database (GAID), 1998–2025 (Version 2)

Identification Number:

doi:10.7910/DVN/PUMGYU

Distributor:

Harvard Dataverse

Date of Distribution:

2026-01-13

Version:

1

Bibliographic Citation:

Hung, Jason, 2026, "Global Artificial Intelligence Indicator Database (GAID), 1998–2025 (Version 2)", https://doi.org/10.7910/DVN/PUMGYU, Harvard Dataverse, V1

Study Description

Citation

Title:

Global Artificial Intelligence Indicator Database (GAID), 1998–2025 (Version 2)

Identification Number:

doi:10.7910/DVN/PUMGYU

Authoring Entity:

Hung, Jason (University of Cambridge)

Distributor:

Harvard Dataverse

Access Authority:

Hung, Jason

Depositor:

Hung, Jason

Date of Deposit:

2026-01-13

Holdings Information:

https://doi.org/10.7910/DVN/PUMGYU

Study Scope

Keywords:

Computer and Information Science, Social Sciences

Abstract:

Overview: The Global Artificial Intelligence Indicator Database (GAID) Version 2.0 represents a significant expansion of the longitudinal panel dataset, providing the most comprehensive, harmonized overview of the global AI landscape. Spanning 1998 to 2025, GAID Version 2.0 integrates, standardizes, and surgically cleans high-fidelity indicators from eight additional premier AI monitoring authorities, including Epoch AI, UNESCO Global AI Ethics Observatory, MacroPolo, IEA, WIPO, and the World Bank. Surgical Data Quality & Integrity: Unlike raw index exports, GAID Version 2.0 has undergone a multi-stage "surgical cleaning" and metadata healing pipeline to ensure 100% data integrity. Key technical enhancements include: Harmonized Longitudinal Structure: Multi-source data consolidated into a "Long Format" (Tidy Data), optimized for R, Stata, Python, and SPSS. Universal Geographic Standardization: 259,546 observations across 227 countries and territories (expanded from 214 in Version 1.0) mapped to standardized ISO3 alpha-3 codes. Advanced Metadata Healing: 100% completeness across metadata fields (Source_File, Source_Type, Source_Year), ensuring full replicability. Unit Harmonization: Standardized formatting of economic indicators into legacy-aligned units (e.g., USD Billions, USD Thousands) and theoretical ranges ([0, 1] for Ratios; [0, 100] for Scores). Expanded Dataset Scope (Version 2.0): Temporal Range: 1998 – 2025 Geographic Scope: 227 Countries/Territories Indicator Density: 24,453 unique metrics Observation Count: 259,546 verified rows. New Thematic Domains Include: Technical Trends & Benchmarks (Epoch AI): State-of-the-art AI model performance across 39 benchmarks (MMLU, GSM8K, etc.), total training compute (FLOPs), and model parameter counts. AI Infrastructure & Energy (IEA & Epoch): National AI cluster power capacity (MW), data center hub capacity (Operating vs. Planned), and compute stock (H100 equivalents). Global AI Talent (MacroPolo): Top-tier researcher flows tracking undergraduate origin vs. graduate study and current work locations. Real-world Usage Polling (Epoch AI): Granular survey data on AI service adoption (ChatGPT, Claude, etc.), use-case frequency, and workplace tool provision. Ethics & Governance (UNESCO & World Bank): AI readiness assessment scores, GovTech maturity indices, and digital citizen engagement frameworks. Innovation (WIPO): National AI-related patent publication intensity. Technical Usage Note: Researchers must consult the accompanying codebook (w1_v2_CODEBOOK_MASTER_AI_DATA.pdf) for the Categorical Metric Dictionary. The documentation follows a domain-based table structure providing precise data ranges (Theoretical vs. Unbounded), units of measure, and original definitions for all 24,000+ metrics. Compilation Pipeline: This dataset was produced via a sequential four-stage pipeline: (1) master_compiler_FINAL.py, (2) master_compiler_v2.py, (3) fix_micronesia_country_names.py, and (4) heal_source_file_metadata.py.

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/licenses/by-nc/4.0">CC BY-NC 4.0</a>

Other Study Description Materials

Other Study-Related Materials

Label:

1_macropolo_global_talent.zip

Text:

This compressed folder contains the complete replication package for the MacroPolo Global AI Talent Tracker ingestion module. It includes the Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide full transparency and replicability for the "Human Capital" domain of the GAID.

Notes:

application/zip

Other Study-Related Materials

Label:

2_unesco_ai_ethics_governance.zip

Text:

This compressed folder contains the complete replication package for the UNESCO Global AI Ethics and Governance Observatory ingestion module. It includes the original source data, Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide the underlying data and code for the "Legal/Regulatory" and "Governance/Digital Infrastructure" domains of the GAID, specifically focusing on the UNESCO Readiness Assessment Methodology (RAM).

Notes:

application/zip

Other Study-Related Materials

Label:

3_iea_energy_ai.zip

Text:

This compressed folder contains the complete replication package for the IEA's Energy and AI Observatory ingestion module. It includes the Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide the empirical foundation for the "Technological/Infrastructural" domain of the GAID, specifically addressing the power capacity and physical hub footprint required for national AI deployment.

Notes:

application/zip

Other Study-Related Materials

Label:

4_epoch_ai_technical_trends.zip

Text:

This compressed folder contains the complete replication package for the Epoch AI ingestion module. It includes the original source data, Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide full replicability for the "Technological/Infrastructural," "Economic/Technological," and "Social/Usage" domains of the GAID. It specifically tracks the evolution of frontier compute, model performance, and national adoption trends.

Notes:

application/zip

Other Study-Related Materials

Label:

5. tortoise_media_ai_index.zip

Text:

This compressed folder contains the complete replication package for the Tortoise Media - The Global AI Index ingestion module. It includes the Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide the underlying data and code for the "Economic/Strategic" and "Governance/Digital Infrastructure" domains of the GAID. This module tracks relative global competitiveness through composite scoring.

Notes:

application/zip

Other Study-Related Materials

Label:

6. wipo_ai_patent.zip

Text:

This compressed folder contains the complete replication package for the WIPO (World Intellectual Property Organisation) - AI Patent Landscapes ingestion module. It includes the original source data, Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide the empirical foundation for the "Innovation/Intellectual Property" domain of the GAID. This module quantifies national output in the global AI R&D ecosystem through the lens of formal patent activity.

Notes:

application/zip

Other Study-Related Materials

Label:

7. coursera_ai_skills.zip

Text:

This compressed folder contains the complete replication package for the Coursera - Global Skills Report (AI & Digital Skills) ingestion module. It includes the original source data, Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide the empirical foundation for the "Human Capital/Education" domain of the GAID. This module tracks the "supply side" of the global AI ecosystem by measuring how effectively national workforces are acquiring the skills necessary for AI-driven economic growth.

Notes:

application/zip

Other Study-Related Materials

Label:

8. world_bank_govtech.zip

Text:

This compressed folder contains the complete replication package for the World Bank - GovTech Maturity Index (GTMI) ingestion module. It includes the original source data, Python scripts for automated data extraction and cleaning, and the standardized output CSV used in the GAID Wave 1 Version 2 compilation—to provide the underlying data and code for the "Legal/Regulatory" and "Governance/Digital Infrastructure" domains of the GAID. This module tracks the degree to which national governments have established the necessary digital foundations and institutional frameworks for effective AI adoption.

Notes:

application/zip

Other Study-Related Materials

Label:

gaid_w1_v2.zip

Text:

This compressed folder serves as the Core Replication and Documentation Package for the GAID w1, v2. It contains the complete technical pipeline required to reproduce the dataset (w1, v2) from raw ingestion files to the final longitudinal panel. The core output is the GAID_MASTER_V2_COMPILATION_FINAL.csv— the finalized version of the database containing 259,546 verified observations across 227 countries and territories (1998–2025); the compilation pipeline is a suite of specialized Python scripts, including master_compiler_v2.py, heal_source_file_metadata.py, and fix_micronesia_country_names.py, which execute the surgical cleaning, deduplication, and geographic standardization protocols; the primary documentation is w1_v2_CODEBOOK_MASTER_AI_DATA.pdf (Markdown-integrated version)—the exhaustive Categorical Metric Dictionary providing indicator types, units, theoretical ranges, and precise definitions for all metrics.

Notes:

application/zip