Stanford NLP Model Output for Biofuel Patent Classification (doi:10.7910/DVN/29374)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Stanford NLP Model Output for Biofuel Patent Classification

Identification Number:

doi:10.7910/DVN/29374

Distributor:

Harvard Dataverse

Date of Distribution:

2015-03-06

Version:

1

Bibliographic Citation:

Kessler, Jeff, 2015, "Stanford NLP Model Output for Biofuel Patent Classification", https://doi.org/10.7910/DVN/29374, Harvard Dataverse, V1

Study Description

Citation

Title:

Stanford NLP Model Output for Biofuel Patent Classification

Identification Number:

doi:10.7910/DVN/29374

Authoring Entity:

Kessler, Jeff (University of California, Davis)

Distributor:

Harvard Dataverse

Distributor:

Harvard Dataverse Network

Access Authority:

Jeff Kessler

Date of Deposit:

2015-03-06

Date of Distribution:

2015

Holdings Information:

https://doi.org/10.7910/DVN/29374

Study Scope

Keywords:

Biofuel Classifier

Topic Classification:

Natural Language Processing

Abstract:

This NLP model was generated using the Stanford NLP Classifier (available from: http://nlp.stanford.edu/software/classifier.shtml). The model was trained using a random selection of 700 manually classified biofuel patents from 1976 through 2013, and validated against 300 manually classified biofuel patents on January 03, 2014. Included are the classification results and associated patent numbers for both the manually trained patents, and for the automatically categorized patents.

Time Period:

1976-2013

Geographic Coverage:

United States

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

Other Study-Related Materials

Label:

Manual Classification.csv

Text:

This is the initial list of 1000 patents manually classified for use with training and validating the NLP model

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

ner-model.ser.gz

Text:

This is the model generated by the Stanford NLP Classifier

Notes:

application/x-gzip

Other Study-Related Materials

Label:

NLP Classification.csv

Text:

This is the list of patents and associated classifications based on the NLP model that was trained using the manually classified patents

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

patents_test.prop

Text:

This is the property file used for parameterizing the model

Notes:

text/plain; charset=US-ASCII