|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Chinese patent in Google patent public data |
|
Identification Number: |
doi:10.7910/DVN/ZVTIP1 |
|
Distributor: |
Harvard Dataverse |
|
Date of Distribution: |
2026-02-20 |
|
Version: |
2 |
|
Bibliographic Citation: |
Li, Ji; Shi, Dongbo, 2026, "Chinese patent in Google patent public data", https://doi.org/10.7910/DVN/ZVTIP1, Harvard Dataverse, V2 |
|
Citation |
|
|
Title: |
Chinese patent in Google patent public data |
|
Identification Number: |
doi:10.7910/DVN/ZVTIP1 |
|
Authoring Entity: |
Li, Ji (Xiamen University) |
|
Shi, Dongbo (https://ror.org/05t6hvr95) |
|
|
Distributor: |
Harvard Dataverse |
|
Access Authority: |
Li, Ji |
|
Depositor: |
Li, Ji |
|
Date of Deposit: |
2026-02-14 |
|
Holdings Information: |
https://doi.org/10.7910/DVN/ZVTIP1 |
|
Study Scope |
|
|
Keywords: |
Social Sciences, Chinese patents, Google Patent, Patent Data, Data Validation, Innovation |
|
Abstract: |
Research-ready Chinese invention patent dataset (1985-2024) from Google Patent Public Data, including: (1) 25.4 million processed patent records in 13 relational tables, (2) GPPD-ADCP Master Key—a validated crosswalk with official CNIPA records, and (3) open-source processing scripts. The dataset has been systematically validated against the official CNIPA authority file, achieving 99.96% coverage rate. |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Access |
|
|
Notes: |
<a href="http://creativecommons.org/licenses/by/4.0">CC BY 4.0</a> |
|
Other Study Description Materials |
|
|
Related Publications |
|
|
Citation |
|
|
Title: |
Li, Ji and Shi, Dongbo, Chinese Patent in Google Patent Public Data: A Guide to Processing and Validation (February 14, 2026). Available at SSRN: https://ssrn.com/abstract=6259679 or http://dx.doi.org/10.2139/ssrn.6259679 |
|
Identification Number: |
SSRN |
|
Bibliographic Citation: |
Li, Ji and Shi, Dongbo, Chinese Patent in Google Patent Public Data: A Guide to Processing and Validation (February 14, 2026). Available at SSRN: https://ssrn.com/abstract=6259679 or http://dx.doi.org/10.2139/ssrn.6259679 |
|
Label: |
00_readme.pdf |
|
Notes: |
application/pdf |
|
Label: |
00version_history.md |
|
Text: |
version history log |
|
Notes: |
text/markdown |
|
Label: |
01_GPPD_CN_patent_schema.png |
|
Notes: |
image/png |
|
Label: |
cn_abstract_split.zip.001 |
|
Notes: |
application/x-rar |
|
Label: |
cn_abstract_split.zip.002 |
|
Notes: |
application/x-rar |
|
Label: |
cn_abstract_split.zip.003 |
|
Notes: |
application/x-rar |
|
Label: |
cn_abstract_split.zip.004 |
|
Notes: |
application/x-rar |
|
Label: |
cn_abstract_split.zip.005 |
|
Notes: |
application/x-rar |
|
Label: |
cn_abstract_split.zip.006 |
|
Notes: |
application/x-rar |
|
Label: |
cn_abstract_split.zip.007 |
|
Notes: |
application/x-rar |
|
Label: |
cn_abstract_split.zip.008 |
|
Notes: |
application/x-rar |
|
Label: |
cn_app_pub_number.txt.zip |
|
Text: |
Master identifiers and family IDs. |
|
Notes: |
application/zip |
|
Label: |
cn_assignee.txt.zip |
|
Text: |
Patent assignees. |
|
Notes: |
application/zip |
|
Label: |
cn_backward.txt.zip |
|
Text: |
Backward citations. |
|
Notes: |
application/zip |
|
Label: |
cn_child.txt.zip |
|
Text: |
Identification of parent and child applications. |
|
Notes: |
application/zip |
|
Label: |
cn_date.txt.zip |
|
Text: |
Application, publication, and priority dates. |
|
Notes: |
application/zip |
|
Label: |
cn_embedding_split.zip.001 |
|
Text: |
High-dimensional semantic vectors. |
|
Notes: |
application/x-rar |
|
Label: |
cn_embedding_split.zip.002 |
|
Text: |
High-dimensional semantic vectors. |
|
Notes: |
application/x-rar |
|
Label: |
cn_embedding_split.zip.003 |
|
Text: |
High-dimensional semantic vectors. |
|
Notes: |
application/x-rar |
|
Label: |
cn_embedding_split.zip.004 |
|
Text: |
High-dimensional semantic vectors. |
|
Notes: |
application/x-rar |
|
Label: |
cn_examiner.txt.zip |
|
Text: |
Patent examiner names. |
|
Notes: |
application/zip |
|
Label: |
cn_inventor.txt.zip |
|
Text: |
Inventor names. |
|
Notes: |
application/zip |
|
Label: |
cn_ipc.txt.zip |
|
Text: |
International Patent Classification (IPC) codes. |
|
Notes: |
application/zip |
|
Label: |
cn_ipc_v2.txt.zip |
|
Text: |
IPC data table (v2.0). Features publication numbers for global linkage and an 'is_first' flag for primary IPC codes. |
|
Notes: |
application/zip |
|
Label: |
cn_npl.txt.zip |
|
Text: |
Non patent literature (NPL) references. |
|
Notes: |
application/zip |
|
Label: |
cn_title.txt.zip |
|
Text: |
Patent titles. |
|
Notes: |
application/zip |
|
Label: |
cn_top_term.txt.zip |
|
Text: |
NLP-extracted technical keywords. |
|
Notes: |
application/zip |
|
Label: |
GPPD_ADCP_Invention_MasterKey.txt.zip |
|
Text: |
GPPD-ADCP Master Key |
|
Notes: |
application/zip |
|
Label: |
GPPD_analyse.R |
|
Text: |
R script for generating the Master Key and validation stats. |
|
Notes: |
type/x-r-syntax |
|
Label: |
patent_parser.py |
|
Notes: |
text/x-python-script |
|
Label: |
process_gppd.sh |
|
Text: |
Shell script for filtering raw data and calculating coverage. |
|
Notes: |
application/x-sh |
|
Label: |
researchdata_parser.py |
|
Text: |
Python parser for patents.research. |
|
Notes: |
text/x-python-script |