<codeBook xmlns="ddi:codebook:2_5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:codebook:2_5 https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" version="2.5"><docDscr><citation><titlStmt><titl>Global Artificial Intelligence News Headlines (GAIN-H) Corpus</titl><IDNo agency="DOI">doi:10.7910/DVN/7C6FNO</IDNo></titlStmt><distStmt><distrbtr source="archive">Harvard Dataverse</distrbtr><distDate>2026-06-05</distDate></distStmt><verStmt source="archive"><version date="2026-06-09" type="RELEASED">1</version></verStmt><biblCit>Samuel, Jim; Siritha Chidipothu; Khanna, Tanya; Lakra, Ashish; Vidhi Gala, 2026, "Global Artificial Intelligence News Headlines (GAIN-H) Corpus", https://doi.org/10.7910/DVN/7C6FNO, Harvard Dataverse, V1, UNF:6:Axjbvx2xbD9mrqy6cDc5Jw== [fileUNF]</biblCit></citation></docDscr><stdyDscr><citation><titlStmt><titl>Global Artificial Intelligence News Headlines (GAIN-H) Corpus</titl><IDNo agency="DOI">doi:10.7910/DVN/7C6FNO</IDNo></titlStmt><rspStmt><AuthEnty>Samuel, Jim</AuthEnty><AuthEnty>Siritha Chidipothu</AuthEnty><AuthEnty>Khanna, Tanya</AuthEnty><AuthEnty>Lakra, Ashish</AuthEnty><AuthEnty>Vidhi Gala</AuthEnty><othId>Jim Samuel</othId><othId>Tanya Khanna</othId><othId>Ashish Lakra</othId><othId>Vidhi Gala</othId></rspStmt><prodStmt/><distStmt><distrbtr source="archive">Harvard Dataverse</distrbtr><depDate>2026-06-03</depDate></distStmt><holdings URI="https://doi.org/10.7910/DVN/7C6FNO"/></citation><stdyInfo><subject><keyword xml:lang="en">Computer and Information Science</keyword><keyword xml:lang="en">Social Sciences</keyword></subject><abstract>The Global Artificial Intelligence News Headlines (GAIN-H) is an open-access public informatics collection of three complementary datasets containing over 2.5 million artificial intelligence-related news headlines gathered from global news sources across multiple languages, countries, and time periods. The repository was created to support interdisciplinary research on how artificial intelligence is represented, framed, and discussed within the public sphere.

The collection includes: (1) a metadata-rich corpus with temporal, linguistic, and URL-structural features; (2) a large-scale longitudinal corpus optimized for temporal analysis; and (3) an extended multilingual corpus containing search-term metadata that enables keyword-stratified analysis of AI discourse. Together, these datasets span more than two decades of AI-related news coverage and provide researchers with resources for studying media framing, sentiment, public discourse, AI governance, communication, computational social science, and natural language processing.

The repository is intended for researchers, policymakers, educators, journalists, practitioners seeking to examine trends in AI-related media coverage across time, geography, language, and thematic domains. The datasets are released to promote transparency, reproducibility, and evidence-based research on the societal implications of artificial intelligence.

The datasets were developed as part of the RAISE (Rethinking AI for Shared Empowerment) initiative at the MPI Program, Bloustein School, Rutgers University, and AIXosphere AI behavioral trends research.</abstract><sumDscr/></stdyInfo><method><dataColl><sources/></dataColl><anlyInfo/></method><dataAccs><setAvail/><useStmt/><notes type="DVN:TOU" level="dv">&lt;a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0&lt;/a></notes></dataAccs><othrStdyMat/></stdyDscr><fileDscr ID="f13987192" URI="https://dataverse.harvard.edu/api/access/datafile/13987192"><fileTxt><fileName>Global Artificial Intelligence News Headlines (GAIN-H) Corpus Dataset 1.tab</fileName><dimensns><caseQnty>60168</caseQnty><varQnty>19</varQnty></dimensns><fileType>text/tab-separated-values</fileType></fileTxt><notes level="file" type="VDC:UNF" subject="Universal Numeric Fingerprint">UNF:6:QQyeZ85+ugQm8lFmZK+2kg==</notes></fileDscr><fileDscr ID="f13989194" URI="https://dataverse.harvard.edu/api/access/datafile/13989194"><fileTxt><fileName>Global Artificial Intelligence News Headlines (GAIN-H) Corpus Dataset 2.tab</fileName><dimensns><caseQnty>277428</caseQnty><varQnty>5</varQnty></dimensns><fileType>text/tab-separated-values</fileType></fileTxt><notes level="file" type="VDC:UNF" subject="Universal Numeric Fingerprint">UNF:6:XXazCN8Yef4e/4Y+gOXORw==</notes></fileDscr><dataDscr><var ID="v40558435" name="title" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">title</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:26wk+LUGYCDgWILd4tnztQ==</notes></var><var ID="v40558423" name="link" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">link</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:5E7XNebf6GEquZbtPkNlFA==</notes></var><var ID="v40558425" name="date" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">date</labl><varFormat type="character" formatname="yyyy-MM-dd" category="date"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:5gQnKs9SP/zTsWmgTdGMmA==</notes></var><var ID="v40558437" name="source" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">source</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:TLn38oRKPcjCl9kYC2AZ+Q==</notes></var><var ID="v40558428" name="country" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">country</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:gBn/G7CYcefEuaCSgz3pPQ==</notes></var><var ID="v40558422" name="language" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">language</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:WgrKGkA/mUfMd4rwECAHRQ==</notes></var><var ID="v40558431" name="translated_title" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">translated_title</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:TfbqA5k7fZYhb5AfoNnAig==</notes></var><var ID="v40558424" name="Day_of_Week" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Day_of_Week</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:o0kOWNKx6gXOfOlxvuhvEA==</notes></var><var ID="v40558420" name="Month" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Month</labl><sumStat type="stdev">3.338992393720864</sumStat><sumStat type="max">12.0</sumStat><sumStat type="mean">6.914589150378958</sumStat><sumStat type="min">1.0</sumStat><sumStat type="vald">60168.0</sumStat><sumStat type="invd">0.0</sumStat><sumStat type="medn">7.0</sumStat><sumStat type="mode">.</sumStat><varFormat type="numeric"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:RLP/1zR+OsX6M6o1nsthgg==</notes></var><var ID="v40558426" name="Year" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Year</labl><sumStat type="mode">.</sumStat><sumStat type="max">2023.0</sumStat><sumStat type="invd">0.0</sumStat><sumStat type="medn">2023.0</sumStat><sumStat type="stdev">0.8736259495169177</sumStat><sumStat type="min">2020.0</sumStat><sumStat type="vald">60168.0</sumStat><sumStat type="mean">2022.3023201701901</sumStat><varFormat type="numeric"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:uBmEEb+IS6DlmzWqc4lQaQ==</notes></var><var ID="v40558432" name="Quarter" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Quarter</labl><sumStat type="stdev">1.116907303049801</sumStat><sumStat type="mean">2.6526891370828385</sumStat><sumStat type="min">1.0</sumStat><sumStat type="max">4.0</sumStat><sumStat type="mode">.</sumStat><sumStat type="vald">60168.0</sumStat><sumStat type="medn">3.0</sumStat><sumStat type="invd">0.0</sumStat><varFormat type="numeric"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:U1ZY07K/Ifzc776gTdFGCA==</notes></var><var ID="v40558434" name="Is_Weekend" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Is_Weekend</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:D3hbJiTgOjm4/3+os4DJvw==</notes></var><var ID="v40558427" name="Is_Holiday" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Is_Holiday</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:H45u+iRQBnz6QycMimIhzQ==</notes></var><var ID="v40558430" name="Final_URL" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Final_URL</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:YTRLSKdM3xzD9VXNSEah3Q==</notes></var><var ID="v40558421" name="Domain" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Domain</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:a9YZ9zXdzoSaLYveliccPQ==</notes></var><var ID="v40558433" name="Subdomain" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">Subdomain</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:QIRgzLt9nXawZodYUuKUdA==</notes></var><var ID="v40558438" name="URL_Depth" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">URL_Depth</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:UHlDOvSxIPqidhBKqU0iBw==</notes></var><var ID="v40558436" name="TLD" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">TLD</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:nMqIqkm849U3F8xPRFzewg==</notes></var><var ID="v40558429" name="URL_Length" intrvl="discrete"><location fileid="f13987192"/><labl level="variable">URL_Length</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:YHnZD3W3tnsO3eFxvxjysA==</notes></var><var ID="v40580005" name="No" intrvl="discrete"><location fileid="f13989194"/><labl level="variable">No</labl><sumStat type="stdev">80086.70957780698</sumStat><sumStat type="vald">277428.0</sumStat><sumStat type="max">277427.0</sumStat><sumStat type="invd">0.0</sumStat><sumStat type="mean">138713.5</sumStat><sumStat type="min">0.0</sumStat><sumStat type="medn">138713.5</sumStat><sumStat type="mode">.</sumStat><varFormat type="numeric"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:rGhDWMhiBGv3ZHaAaoThqg==</notes></var><var ID="v40580006" name="date" intrvl="discrete"><location fileid="f13989194"/><labl level="variable">date</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:jQlztzPc0mMwwxHlm8Puvw==</notes></var><var ID="v40580003" name="title" intrvl="discrete"><location fileid="f13989194"/><labl level="variable">title</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:+uLI5FQyWW0pa866Tv8pWA==</notes></var><var ID="v40580004" name="source" intrvl="discrete"><location fileid="f13989194"/><labl level="variable">source</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:dg6f0KiurzvTSZwOvxOdLg==</notes></var><var ID="v40580002" name="language" intrvl="discrete"><location fileid="f13989194"/><labl level="variable">language</labl><varFormat type="character"/><notes subject="Universal Numeric Fingerprint" level="variable" type="Dataverse:UNF">UNF:6:0epXYrTBdWLrDj790OJRCw==</notes></var></dataDscr><otherMat ID="f13987190" URI="https://dataverse.harvard.edu/api/access/datafile/13987190" level="datafile"><labl>Global Artificial Intelligence News Headlines (GAIN-H) Corpus Dataset 3.csv</labl><notes level="file" type="DATAVERSE:CONTENTTYPE" subject="Content/MIME Type">text/comma-separated-values</notes></otherMat></codeBook>