Government Microblog Comment Popularity Prediction dataset (doi:10.7910/DVN/FGJCTT)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Government Microblog Comment Popularity Prediction dataset

Identification Number:

doi:10.7910/DVN/FGJCTT

Distributor:

Harvard Dataverse

Date of Distribution:

2026-01-08

Version:

1

Bibliographic Citation:

Hou, Jingrui, 2026, "Government Microblog Comment Popularity Prediction dataset", https://doi.org/10.7910/DVN/FGJCTT, Harvard Dataverse, V1, UNF:6:iGxVtoaBItz1CbXVgWVvAg== [fileUNF]

Study Description

Citation

Title:

Government Microblog Comment Popularity Prediction dataset

Identification Number:

doi:10.7910/DVN/FGJCTT

Authoring Entity:

Hou, Jingrui (https://ror.org/033vjfk17)

Distributor:

Harvard Dataverse

Access Authority:

Hou, Jingrui

Depositor:

Hou, Jingrui

Date of Deposit:

2025-08-16

Holdings Information:

https://doi.org/10.7910/DVN/FGJCTT

Study Scope

Keywords:

Social Sciences

Abstract:

Dataset utilized in Government Microblog Comment Popularity Prediction

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

File Description--f13322428

File: feature.GB18030.v10.tab

  • Number of cases: 15728

  • No. of variables per record: 50

  • Type of File: text/tab-separated-values

Notes:

UNF:6:iGxVtoaBItz1CbXVgWVvAg==

Variable Description

List of Variables:

Variables

ID

f13322428 Location:

Summary Statistics: Min. 0.0; Mean 7863.5; StDev 4540.426852180311; Max. 15727.0; Valid 15728.0

Variable Format: numeric

Notes: UNF:6:GKzl3DvgPhZDjGaAPDaKOw==

comment_id

f13322428 Location:

Variable Format: character

Notes: UNF:6:LzlFu1SEbTieI0XGbnwwOA==

blog_id

f13322428 Location:

Summary Statistics: Valid 15728.0; Min. 4.71E15; Max. 4.72E15; StDev 4.806390564516931E12; Mean 4.716378433367243E15

Variable Format: numeric

Notes: UNF:6:hQtBb/ZD/T3bOQb66Dp7Yg==

user_name

f13322428 Location:

Variable Format: character

Notes: UNF:6:nj0f10aMSqCXRiKqE6pesw==

comment_like_num

f13322428 Location:

Summary Statistics: Mean 15.879577822991108; Min. 0.0; StDev 227.14951913943545; Max. 16712.0; Valid 15728.0;

Variable Format: numeric

Notes: UNF:6:Z1aDbTnLi53tTJGDLbZOFQ==

child_comment_num

f13322428 Location:

Summary Statistics: Mean 0.4926246185147513; Valid 15728.0; StDev 6.8148078797835465; Min. 0.0; Max. 558.0

Variable Format: numeric

Notes: UNF:6:jeVe6G+m94Re2Zi7irDx3g==

num_char

f13322428 Location:

Summary Statistics: StDev 21.850596348845258; Min. 0.0; Mean 16.847978128179147; Valid 15728.0; Max. 161.0

Variable Format: numeric

Notes: UNF:6:M6M4ysyLdMPxlEXHHRgn5w==

num_word

f13322428 Location:

Summary Statistics: Mean 10.498156154628823; Valid 15728.0; StDev 13.647833722937365; Min. 0.0; Max. 106.0;

Variable Format: numeric

Notes: UNF:6:Zj2z8rU0XZqbDeR3HG6Hpw==

num_sentence

f13322428 Location:

Summary Statistics: Mean 1.221706510681577; Valid 15728.0; Min. 1.0; Max. 12.0; StDev 0.6763313808038529

Variable Format: numeric

Notes: UNF:6:BrgYLvQaqhJ5xbqnhVqTng==

avg_sentence_len

f13322428 Location:

Summary Statistics: Valid 15728.0; StDev 14.219361227106424; Max. 146.0; Mean 13.096991892933964; Min. 0.0

Variable Format: numeric

Notes: UNF:6:PIv048tNiypZFidV+kpFag==

num_noun

f13322428 Location:

Summary Statistics: StDev 3.4390666736344406; Mean 2.4382629704984575; Valid 15728.0; Min. 0.0; Max. 34.0;

Variable Format: numeric

Notes: UNF:6:mdDX/xaZqSFvpXx2siGIDg==

num_verb

f13322428 Location:

Summary Statistics: Max. 31.0; Min. 0.0; Mean 2.0200915564598136; Valid 15728.0; StDev 2.8238108996456845

Variable Format: numeric

Notes: UNF:6:UhfNNSIljclh+JkGQfeAfQ==

num_modifiers

f13322428 Location:

Summary Statistics: Valid 15728.0; Min. 0.0; Max. 17.0; Mean 1.0235885045778377; StDev 1.5709062561528078;

Variable Format: numeric

Notes: UNF:6:DzJJWgN1zM4CzKJ3TUT51g==

num_1st_pronoun

f13322428 Location:

Summary Statistics: Min. 0.0; Max. 8.0; Valid 15728.0; Mean 0.14992370295015287; StDev 0.48989794261501857

Variable Format: numeric

Notes: UNF:6:iJdp0sw8JeTyiq6up+BR/Q==

num_2nd_pronoun

f13322428 Location:

Summary Statistics: Max. 7.0; Min. 0.0; Valid 15728.0; Mean 0.07152848423194061; StDev 0.3200085098576281

Variable Format: numeric

Notes: UNF:6:MjKqDVTZhGUiTGV+KN41ug==

num_3rd_pronoun

f13322428 Location:

Summary Statistics: Valid 15728.0; Mean 0.058748728382499345; StDev 0.3273929201735113; Max. 9.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:Oy3nliMJug57r78M3tKSOg==

num_exclamation_mark

f13322428 Location:

Summary Statistics: Max. 11.0; Mean 0.22603001017293106; Min. 0.0; StDev 0.6660791471504666; Valid 15728.0

Variable Format: numeric

Notes: UNF:6:k1IcsdqgjvffrXpEul/1Gw==

num_question_mark

f13322428 Location:

Summary Statistics: Valid 15728.0; StDev 0.5180579522350306; Mean 0.1657553407934902; Min. 0.0; Max. 12.0;

Variable Format: numeric

Notes: UNF:6:AC3/mK58yHQHpZwTXF23vQ==

num_punctuation

f13322428 Location:

Summary Statistics: Max. 30.0; Valid 15728.0; StDev 2.46506423502341; Mean 1.3550991861647965; Min. 0.0

Variable Format: numeric

Notes: UNF:6:Qbo6hjcXOvLcJFs8UAXT9Q==

num_topics

f13322428 Location:

Summary Statistics: StDev 0.3480319294557233; Min. 0.0; Valid 15728.0; Mean 0.08589776195320359; Max. 8.0

Variable Format: numeric

Notes: UNF:6:hTVzfImS8sNpNMUYQgADmA==

topics

f13322428 Location:

Variable Format: character

Notes: UNF:6:0/t91mLL5AUVWINfNawZ1w==

cosine_sim

f13322428 Location:

Summary Statistics: Mean 0.15540667194627372; Min. 0.0; Max. 0.995085965; Valid 15728.0; StDev 0.17182652239394924

Variable Format: numeric

Notes: UNF:6:1KNc14TUQ+epWmcjgiqs+A==

Jaccard_sim

f13322428 Location:

Summary Statistics: Min. 0.0; Valid 15728.0; Mean 0.04126616259562547; Max. 0.982758621; StDev 0.06648724691785261

Variable Format: numeric

Notes: UNF:6:ddPwq5bSvgmlCngl5xF+ag==

simUP

f13322428 Location:

Summary Statistics: Mean 0.2548726945394203; Max. 0.971600681; StDev 0.2122691014347592; Min. 0.0; Valid 15728.0

Variable Format: numeric

Notes: UNF:6:GdqhKzu1iHSM8ZUKcAStCg==

polarity_score

f13322428 Location:

Summary Statistics: Mean 0.5010501591115216; StDev 0.5455257844180756; Max. 1.0; Min. -1.0; Valid 15728.0;

Variable Format: numeric

Notes: UNF:6:WmZ+IM3CYynFQxM4LwkV8A==

num_sensitive

f13322428 Location:

Summary Statistics: StDev 0.10889483843520667; Min. 0.0; Max. 4.0; Valid 15728.0; Mean 0.00979145473041686;

Variable Format: numeric

Notes: UNF:6:Wc8EnIOyfSAAj9yhWu6wJw==

sensitive

f13322428 Location:

Variable Format: character

Notes: UNF:6:VXZL47xwUN8WQ8M5nY7QlQ==

num_emoji

f13322428 Location:

Summary Statistics: Max. 18.0; StDev 1.1539692191031878; Mean 0.6640386571719201; Valid 15728.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:RmvSW9RCQiauBSAcU8Ymrw==

num_mentioned_users

f13322428 Location:

Summary Statistics: StDev 2.631721929033332; Min. 0.0; Max. 80.0; Mean 0.4852492370294966; Valid 15728.0;

Variable Format: numeric

Notes: UNF:6:Htc3mdO/h/X1i4+n0Jto+g==

is_responded_by_gov

f13322428 Location:

Summary Statistics: Valid 15728.0; StDev 0.0; Mean 0.0; Max. 0.0; Min. 0.0;

Variable Format: numeric

Notes: UNF:6:8jWH3flV8iOWP1tFzeofNw==

comment_hour

f13322428 Location:

Summary Statistics: StDev 5.787875712946738; Min. 0.0; Max. 23.0; Valid 15364.0; Mean 14.016336891434499

Variable Format: numeric

Notes: UNF:6:xWrSUTJY8NsE1gBfJUsRBA==

time_lag

f13322428 Location:

Summary Statistics: Valid 15364.0; Max. 86399.0; Min. 7.0; Mean 20198.710426972677; StDev 23341.74183000986

Variable Format: numeric

Notes: UNF:6:EjaOkiIrBL45ARCsdD0mEw==

verified_type

f13322428 Location:

Summary Statistics: Max. 220.0; Valid 15356.0; StDev 42.55180531348363; Mean 7.7634800729356845; Min. -1.0

Variable Format: numeric

Notes: UNF:6:LweUv+sZmEqRv++eLoKKqQ==

is_gov_official

f13322428 Location:

Variable Format: character

Notes: UNF:6:5Jz0md5H/C6Zj8B/9iXRfw==

user_age

f13322428 Location:

Summary Statistics: Mean 2808.2142671179395; Valid 15364.0; Min. 342.0; Max. 4873.0; StDev 1228.2109560721024

Variable Format: numeric

Notes: UNF:6:S0tcs8p2pmYbMxqC/P67Jw==

status_cnt

f13322428 Location:

Summary Statistics: Max. 554549.0; Min. 0.0; Mean 6221.771164365681; Valid 15356.0; StDev 15876.412382730688

Variable Format: numeric

Notes: UNF:6:beL/5bb9nuTyXJ5fXP6T5Q==

followers_cnt

f13322428 Location:

Summary Statistics: Valid 15363.0; StDev 0.05981656561529631; Mean 0.0014971034303197471; Min. 0.0; Max. 6.0

Variable Format: numeric

Notes: UNF:6:5lwftZ1WX6alOzcpE1AcDQ==

following_cnt

f13322428 Location:

Summary Statistics: Mean 532.4088304245847; StDev 922.9489621875982; Valid 15356.0; Max. 20000.0; Min. 0.0;

Variable Format: numeric

Notes: UNF:6:isuSwU9SvINA7v1Q5FIOFQ==

sunshine_credit_level

f13322428 Location:

Variable Format: character

Notes: UNF:6:Vn385qg8E0mQW2trRA7LAA==

num_gov_term

f13322428 Location:

Summary Statistics: Valid 15728.0; Min. 0.0; Mean 0.974567650050871; Max. 46.0; StDev 2.190574255624228

Variable Format: numeric

Notes: UNF:6:h/pHU4qijGVXi156GJVe7w==

num_gov_usr_mentioned

f13322428 Location:

Summary Statistics: Max. 1.0; Min. 0.0; StDev 0.07707957066732234; Valid 15728.0; Mean 0.005976602238046938

Variable Format: numeric

Notes: UNF:6:lHgV1/nEy/A1n3hP5hY3nA==

num_gov_topic_mentioned

f13322428 Location:

Summary Statistics: Valid 15728.0; Max. 1.0; Min. 0.0; Mean 0.018374872838249626; StDev 0.13430705038322632

Variable Format: numeric

Notes: UNF:6:qeFfwUnPSdpGzKRUOXIQaw==

all_emoji_sent_scores_pos

f13322428 Location:

Summary Statistics: StDev 1.8365354090969068; Valid 15728.0; Mean 0.8421286876907391; Min. 0.0; Max. 32.0

Variable Format: numeric

Notes: UNF:6:aF7vti3IgLt9RbQV/Os5hQ==

all_emoji_sent_scores_neg

f13322428 Location:

Summary Statistics: Valid 15728.0; Max. 0.0; Min. -30.0; StDev 1.4237594740065747; Mean -0.3948372329603409

Variable Format: numeric

Notes: UNF:6:dpZmyi/hBE/7cCcKdGdF5A==

polarity_gov_terms

f13322428 Location:

Summary Statistics: Min. 0.0; Valid 15728.0; StDev 0.2640947742297655; Mean 0.08764085449198872; Max. 3.267948487

Variable Format: numeric

Notes: UNF:6:9bKZpAzxJofKYsHxkXCYvA==

gender

f13322428 Location:

Variable Format: character

Notes: UNF:6:XVGosakKCDoiqZAZsFbHOw==

mentioned_users

f13322428 Location:

Variable Format: character

Notes: UNF:6:YLBIKNLaVwjW8BnYmwXWlw==

raw_text

f13322428 Location:

Variable Format: character

Notes: UNF:6:XPiD1ANUWktkrGT+98D/Qw==

plain_text

f13322428 Location:

Variable Format: character

Notes: UNF:6:pQ4+VV95HiVrQkhe4FUMZQ==

emoji_alt

f13322428 Location:

Variable Format: character

Notes: UNF:6:cFS+HrmDemKWL03h+svPZQ==

Other Study-Related Materials

Label:

feature_names.csv

Notes:

text/comma-separated-values

Other Study-Related Materials

Label:

gmc_bert_binary.py

Notes:

text/x-python

Other Study-Related Materials

Label:

gmc_bert_triple.py

Notes:

text/x-python

Other Study-Related Materials

Label:

gmc_random_binary.py

Notes:

text/x-python

Other Study-Related Materials

Label:

gmc_random_triple.py

Notes:

text/x-python

Other Study-Related Materials

Label:

t01_binary_prediction_cleaned_k_fold.py

Notes:

text/x-python

Other Study-Related Materials

Label:

t02_triple_prediction_cleaned2_k_fold.py

Notes:

text/x-python