Chinese patent in Google patent public data (doi:10.7910/DVN/ZVTIP1)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Chinese patent in Google patent public data

Identification Number:

doi:10.7910/DVN/ZVTIP1

Distributor:

Harvard Dataverse

Date of Distribution:

2026-02-20

Version:

2

Bibliographic Citation:

Li, Ji; Shi, Dongbo, 2026, "Chinese patent in Google patent public data", https://doi.org/10.7910/DVN/ZVTIP1, Harvard Dataverse, V2

Study Description

Citation

Title:

Chinese patent in Google patent public data

Identification Number:

doi:10.7910/DVN/ZVTIP1

Authoring Entity:

Li, Ji (Xiamen University)

Shi, Dongbo (https://ror.org/05t6hvr95)

Distributor:

Harvard Dataverse

Access Authority:

Li, Ji

Depositor:

Li, Ji

Date of Deposit:

2026-02-14

Holdings Information:

https://doi.org/10.7910/DVN/ZVTIP1

Study Scope

Keywords:

Social Sciences, Chinese patents, Google Patent, Patent Data, Data Validation, Innovation

Abstract:

Research-ready Chinese invention patent dataset (1985-2024) from Google Patent Public Data, including: (1) 25.4 million processed patent records in 13 relational tables, (2) GPPD-ADCP Master Key—a validated crosswalk with official CNIPA records, and (3) open-source processing scripts. The dataset has been systematically validated against the official CNIPA authority file, achieving 99.96% coverage rate.

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/licenses/by/4.0">CC BY 4.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Li, Ji and Shi, Dongbo, Chinese Patent in Google Patent Public Data: A Guide to Processing and Validation (February 14, 2026). Available at SSRN: https://ssrn.com/abstract=6259679 or http://dx.doi.org/10.2139/ssrn.6259679

Identification Number:

SSRN

Bibliographic Citation:

Li, Ji and Shi, Dongbo, Chinese Patent in Google Patent Public Data: A Guide to Processing and Validation (February 14, 2026). Available at SSRN: https://ssrn.com/abstract=6259679 or http://dx.doi.org/10.2139/ssrn.6259679

Other Study-Related Materials

Label:

00_readme.pdf

Notes:

application/pdf

Other Study-Related Materials

Label:

00version_history.md

Text:

version history log

Notes:

text/markdown

Other Study-Related Materials

Label:

01_GPPD_CN_patent_schema.png

Notes:

image/png

Other Study-Related Materials

Label:

cn_abstract_split.zip.001

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_abstract_split.zip.002

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_abstract_split.zip.003

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_abstract_split.zip.004

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_abstract_split.zip.005

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_abstract_split.zip.006

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_abstract_split.zip.007

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_abstract_split.zip.008

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_app_pub_number.txt.zip

Text:

Master identifiers and family IDs.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_assignee.txt.zip

Text:

Patent assignees.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_backward.txt.zip

Text:

Backward citations.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_child.txt.zip

Text:

Identification of parent and child applications.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_date.txt.zip

Text:

Application, publication, and priority dates.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_embedding_split.zip.001

Text:

High-dimensional semantic vectors.

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_embedding_split.zip.002

Text:

High-dimensional semantic vectors.

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_embedding_split.zip.003

Text:

High-dimensional semantic vectors.

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_embedding_split.zip.004

Text:

High-dimensional semantic vectors.

Notes:

application/x-rar

Other Study-Related Materials

Label:

cn_examiner.txt.zip

Text:

Patent examiner names.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_inventor.txt.zip

Text:

Inventor names.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_ipc.txt.zip

Text:

International Patent Classification (IPC) codes.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_ipc_v2.txt.zip

Text:

IPC data table (v2.0). Features publication numbers for global linkage and an 'is_first' flag for primary IPC codes.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_npl.txt.zip

Text:

Non patent literature (NPL) references.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_title.txt.zip

Text:

Patent titles.

Notes:

application/zip

Other Study-Related Materials

Label:

cn_top_term.txt.zip

Text:

NLP-extracted technical keywords.

Notes:

application/zip

Other Study-Related Materials

Label:

GPPD_ADCP_Invention_MasterKey.txt.zip

Text:

GPPD-ADCP Master Key

Notes:

application/zip

Other Study-Related Materials

Label:

GPPD_analyse.R

Text:

R script for generating the Master Key and validation stats.

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

patent_parser.py

Notes:

text/x-python-script

Other Study-Related Materials

Label:

process_gppd.sh

Text:

Shell script for filtering raw data and calculating coverage.

Notes:

application/x-sh

Other Study-Related Materials

Label:

researchdata_parser.py

Text:

Python parser for patents.research.

Notes:

text/x-python-script