The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables (doi:10.7910/DVN/0EGYWY)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables

Identification Number:

doi:10.7910/DVN/0EGYWY

Distributor:

Harvard Dataverse

Date of Distribution:

2020-02-16

Version:

2

Bibliographic Citation:

Borrett, Camille; Laurer, Moritz, 2020, "The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables", https://doi.org/10.7910/DVN/0EGYWY, Harvard Dataverse, V2, UNF:6:+hH0OxeHX0vYGiIOintTyA== [fileUNF]

Study Description

Citation

Title:

The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables

Identification Number:

doi:10.7910/DVN/0EGYWY

Authoring Entity:

Borrett, Camille

Laurer, Moritz (Centre for European Policy Studies)

Other identifications and acknowledgements:

Andrea Renda

Producer:

Centre for European Policy Studies

Software used in Production:

R

Software used in Production:

Python

Grant Number:

822735

Distributor:

Harvard Dataverse

Access Authority:

Laurer, Moritz

Depositor:

Centre for European Policy Studies

Date of Deposit:

2020-02-16

Holdings Information:

https://doi.org/10.7910/DVN/0EGYWY

Study Scope

Keywords:

Computer and Information Science, Law, Social Sciences, Text-As-Data, European Union, Law, Text Analysis

Topic Classification:

Text-As-Data, European Union, Law

Abstract:

<h3>The CEPS EurLex dataset</h3> The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. <br> The dataset was collected by the <b>Centre for European Policy Studies (CEPS) for the TRIGGER project</b> (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU.<br> <br> <b>Brief description:</b> <br> - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. <br> - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). <br> - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables.<br> - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data.<br> <br> <b>Caveats</b>: <br> - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely.<br> - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset.<br> <br> <b>Other:</b><br> - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en<br> - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ <br> - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735

Time Period:

1952-2019

Date of Collection:

2019-07-01-2019-10-01

Kind of Data:

Legal data

Kind of Data:

Textual data

Methodology and Processing

Sources Statement

Data Sources:

The official EUR-Lex database: https://eur-lex.europa.eu/homepage.html

Data Access

Notes:

<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a>

Other Study Description Materials

File Description--f3724182

File: EurLex_all_no_text.tab

  • Number of cases: 142036

  • No. of variables per record: 23

  • Type of File: text/tab-separated-values

Notes:

UNF:6:ALexVT8UYNA2w3u4iQeR7Q==

Master file, without full text for easier use (Excel).

File Description--f3724186

File: EurLex_decisions_no_text.tab

  • Number of cases: 35798

  • No. of variables per record: 23

  • Type of File: text/tab-separated-values

Notes:

UNF:6:n+X1deQWBXScELg8426XKg==

Only decisions, without full text (Excel).

File Description--f3724185

File: EurLex_directives.tab

  • Number of cases: 4070

  • No. of variables per record: 24

  • Type of File: text/tab-separated-values

Notes:

UNF:6:qPnaXyWRbOjO2W4cozU45w==

Only directives, with full text.

File Description--f3724181

File: EurLex_directives_no_text.tab

  • Number of cases: 4070

  • No. of variables per record: 23

  • Type of File: text/tab-separated-values

Notes:

UNF:6:gb1IdA/ZViFt8g+15g9ZsQ==

Only directives, without full text (Excel).

File Description--f3724184

File: EurLex_regulations_no_text_all.tab

  • Number of cases: 102167

  • No. of variables per record: 23

  • Type of File: text/tab-separated-values

Notes:

UNF:6:BgwKe5WA7BTUkNggZoEgdQ==

All regulations, without full text (Excel)

Variable Description

List of Variables:

Variables

CELEX

f3724182 Location:

Variable Format: character

Notes: UNF:6:sQgiSdwBr+ZePrR8IAvxFA==

Actname

f3724182 Location:

Variable Format: character

Notes: UNF:6:UKTwN83Fq+X/uB5DV+w8pg==

Acttype

f3724182 Location:

Variable Format: character

Notes: UNF:6:jD79qBu6oJvqEbOxGFyE0Q==

Status

f3724182 Location:

Variable Format: character

Notes: UNF:6:likJiuv6TIt/Kx1KNJjJiw==

EUROVOC

f3724182 Location:

Variable Format: character

Notes: UNF:6:8nba5ubl2zqJQU34ctNzHA==

Subjectmatter

f3724182 Location:

Variable Format: character

Notes: UNF:6:0/HyJYHSkfptb13/IoEnPg==

Treaty

f3724182 Location:

Variable Format: character

Notes: UNF:6:mmO8trKN4F5gf+CnUZDmYw==

Legalbasiscelex

f3724182 Location:

Variable Format: character

Notes: UNF:6:MrJOPS6VRPCiziceSigKjA==

Authors

f3724182 Location:

Variable Format: character

Notes: UNF:6:tJeipnoktZ8E+9n+JxZrrw==

Procedurenumber

f3724182 Location:

Variable Format: character

Notes: UNF:6:k1nuJHfXLcXMy/Dvj6lPvg==

Datedocument

f3724182 Location:

Variable Format: character

Notes: UNF:6:ig+P+hPx6NThD2RkkEpKVg==

Datepublication

f3724182 Location:

Variable Format: character

Notes: UNF:6:kZeGxCX59ieWDizNoJhlEg==

Firstentryintoforce

f3724182 Location:

Variable Format: character

Notes: UNF:6:9GeiNQNCjCQi46obcIv5Vw==

Temporalstatus

f3724182 Location:

Variable Format: character

Notes: UNF:6:fZC/4+1/X+BCnm1XgaGjUA==

Actcites

f3724182 Location:

Variable Format: character

Notes: UNF:6:ldpclLBFp6+iuxt+J4eeYg==

Citeslinks

f3724182 Location:

Variable Format: character

Notes: UNF:6:sNAvC/BVd49AR+7hOMGivw==

Actammends

f3724182 Location:

Variable Format: character

Notes: UNF:6:2AibL+LDMuAFczgbfXCf2A==

Ammendslinks

f3724182 Location:

Variable Format: character

Notes: UNF:6:Fz/diaJTn5ddAXsKncqbLQ==

Eurlexlink

f3724182 Location:

Summary Statistics: Min. NaN; Max. NaN; Mean NaN; StDev NaN; Valid 0.0;

Variable Format: numeric

Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw==

ELIlink

f3724182 Location:

Summary Statistics: Max. NaN; Valid 0.0; Min. NaN; StDev NaN; Mean NaN;

Variable Format: numeric

Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw==

Proposallink

f3724182 Location:

Summary Statistics: Mean NaN; Min. NaN; StDev NaN; Valid 0.0; Max. NaN

Variable Format: numeric

Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw==

Oeillink

f3724182 Location:

Summary Statistics: StDev NaN; Max. NaN; Mean NaN; Valid 0.0; Min. NaN;

Variable Format: numeric

Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw==

Additionalinfo

f3724182 Location:

Variable Format: character

Notes: UNF:6:yCMQt09bao+21kwLugywBg==

CELEX

f3724186 Location:

Variable Format: character

Notes: UNF:6:lzZW297KHfWkgcOCZ9GJpw==

Actname

f3724186 Location:

Variable Format: character

Notes: UNF:6:7jaASSgUog3EsHuOAEz42A==

Acttype

f3724186 Location:

Variable Format: character

Notes: UNF:6:hdJHy7xb49gOCB0QTaaaiw==

Status

f3724186 Location:

Variable Format: character

Notes: UNF:6:xcyGZB40LOPrtkuKBmEXeQ==

EUROVOC

f3724186 Location:

Variable Format: character

Notes: UNF:6:NC2Vc78EtuR8QVj+B+Qspw==

Subjectmatter

f3724186 Location:

Variable Format: character

Notes: UNF:6:2OHtP1t2/jyik/nV4vuxoA==

Treaty

f3724186 Location:

Variable Format: character

Notes: UNF:6:6T5YS6Dn/08IsEPrG3fMQg==

Legalbasiscelex

f3724186 Location:

Variable Format: character

Notes: UNF:6:HlFqkkQyqtMQzTj73q3BTg==

Authors

f3724186 Location:

Variable Format: character

Notes: UNF:6:csIEC9StKR2Thc/1gmvomg==

Procedurenumber

f3724186 Location:

Variable Format: character

Notes: UNF:6:tEu4eScL023BZ/Skt0BAtg==

Datedocument

f3724186 Location:

Variable Format: character

Notes: UNF:6:e0wavxrkLUh2JF6MCU5/cQ==

Datepublication

f3724186 Location:

Variable Format: character

Notes: UNF:6:n8gBhvdK/Bbsq8u5vDDjkw==

Firstentryintoforce

f3724186 Location:

Variable Format: character

Notes: UNF:6:5N9hTVZPePE0whhu8fpChg==

Temporalstatus

f3724186 Location:

Variable Format: character

Notes: UNF:6:iHCKJjTVHDeD7Un8WUazDQ==

Actcites

f3724186 Location:

Variable Format: character

Notes: UNF:6:p3QWEkmE7DUcPwhD/PWeSg==

Citeslinks

f3724186 Location:

Variable Format: character

Notes: UNF:6:WyBymXnF5MV2L3SDywc9aw==

Actammends

f3724186 Location:

Variable Format: character

Notes: UNF:6:St5nTyhrcdbyJYUuEcf75w==

Ammendslinks

f3724186 Location:

Variable Format: character

Notes: UNF:6:Gxc7c5GDniGO/bMUi+JNVA==

Eurlexlink

f3724186 Location:

Variable Format: character

Notes: UNF:6:S4xh15qvm5i6vnUMSoSRfA==

ELIlink

f3724186 Location:

Summary Statistics: Max. NaN; StDev NaN; Mean NaN; Min. NaN; Valid 0.0

Variable Format: numeric

Notes: UNF:6:nvJDwiAqcRZfvn1e2Wm4bA==

Proposallink

f3724186 Location:

Summary Statistics: Max. NaN; Valid 0.0; Min. NaN; Mean NaN; StDev NaN

Variable Format: numeric

Notes: UNF:6:nvJDwiAqcRZfvn1e2Wm4bA==

Oeillink

f3724186 Location:

Summary Statistics: Mean NaN; Max. NaN; Min. NaN; StDev NaN; Valid 0.0

Variable Format: numeric

Notes: UNF:6:nvJDwiAqcRZfvn1e2Wm4bA==

Additionalinfo

f3724186 Location:

Variable Format: character

Notes: UNF:6:IUrSXHpaO+QpMDyOOjLkog==

CELEX

f3724185 Location:

Variable Format: character

Notes: UNF:6:OqI8ikIDpFt3+UCHYdspwA==

Act_name

f3724185 Location:

Variable Format: character

Notes: UNF:6:J6b21PSnfEY/5/AQaq+20g==

Act_type

f3724185 Location:

Variable Format: character

Notes: UNF:6:gc/f8nDJbTVhmwp31XdG1w==

Status

f3724185 Location:

Variable Format: character

Notes: UNF:6:KCiADxqlOZCua/ng3G+EGg==

EUROVOC

f3724185 Location:

Variable Format: character

Notes: UNF:6:REfCJBFpy2Kb4WzlwCuaHg==

Subject_matter

f3724185 Location:

Variable Format: character

Notes: UNF:6:HnNH9kMz5j85m8NfoQgoww==

Treaty

f3724185 Location:

Variable Format: character

Notes: UNF:6:ut6sk86TG7pQRaveOR1jhg==

Legal_basis_celex

f3724185 Location:

Variable Format: character

Notes: UNF:6:zzeXojQam1O76bznw1r6RQ==

Authors

f3724185 Location:

Variable Format: character

Notes: UNF:6:Qq0cz0AkmzHfnrNLj2FgDQ==

Procedure_number

f3724185 Location:

Variable Format: character

Notes: UNF:6:dDnx5THFokESAvKvoxidog==

Date_document

f3724185 Location:

Variable Format: character

Notes: UNF:6:c7v/dX3h/xnJlMSV4yZsEA==

Date_publication

f3724185 Location:

Variable Format: character

Notes: UNF:6:aYedgitIzXNPu6qsN78PxA==

First_entry_into_force

f3724185 Location:

Variable Format: character

Notes: UNF:6:3A/3gzkB5ReaDD/i75nXqw==

Temporal_status

f3724185 Location:

Summary Statistics: Min. NaN; StDev NaN; Valid 0.0; Mean NaN; Max. NaN

Variable Format: numeric

Notes: UNF:6:07bGLdAvY0kWn6B9j2JqkA==

Act_cites

f3724185 Location:

Variable Format: character

Notes: UNF:6:9+coX40fGRRSvaiuEDNFrA==

Cites_links

f3724185 Location:

Variable Format: character

Notes: UNF:6:jGYiSNpiqQ2FjgTHhtdIpw==

Act_ammends

f3724185 Location:

Variable Format: character

Notes: UNF:6:dML2DHcSz1EgzhdOVd2vnw==

Ammends_links

f3724185 Location:

Variable Format: character

Notes: UNF:6:Fo2i+RzRdm6vbjWYdQtVMA==

Eurlex_link

f3724185 Location:

Variable Format: character

Notes: UNF:6:YxPfbzm72poR1yYJdvGzzw==

ELI_link

f3724185 Location:

Variable Format: character

Notes: UNF:6:mvFIlfqjOk9ePiYtL6tGvg==

Proposal_link

f3724185 Location:

Variable Format: character

Notes: UNF:6:yr5pA4cs/lX0RJHvCMTmtA==

Oeil_link

f3724185 Location:

Variable Format: character

Notes: UNF:6:7dRS+39XvOpXsVG1quRYBA==

Additional_info

f3724185 Location:

Variable Format: character

Notes: UNF:6:W10Z9WTu2cbDwBuHgqhJCQ==

act_raw_text

f3724185 Location:

Variable Format: character

Notes: UNF:6:dOagXV6gsBamyDzL7082Ag==

CELEX

f3724181 Location:

Variable Format: character

Notes: UNF:6:OqI8ikIDpFt3+UCHYdspwA==

Actname

f3724181 Location:

Variable Format: character

Notes: UNF:6:J6b21PSnfEY/5/AQaq+20g==

Acttype

f3724181 Location:

Variable Format: character

Notes: UNF:6:gc/f8nDJbTVhmwp31XdG1w==

Status

f3724181 Location:

Variable Format: character

Notes: UNF:6:KCiADxqlOZCua/ng3G+EGg==

EUROVOC

f3724181 Location:

Variable Format: character

Notes: UNF:6:REfCJBFpy2Kb4WzlwCuaHg==

Subjectmatter

f3724181 Location:

Variable Format: character

Notes: UNF:6:HnNH9kMz5j85m8NfoQgoww==

Treaty

f3724181 Location:

Variable Format: character

Notes: UNF:6:ut6sk86TG7pQRaveOR1jhg==

Legalbasiscelex

f3724181 Location:

Variable Format: character

Notes: UNF:6:zzeXojQam1O76bznw1r6RQ==

Authors

f3724181 Location:

Variable Format: character

Notes: UNF:6:Qq0cz0AkmzHfnrNLj2FgDQ==

Procedurenumber

f3724181 Location:

Variable Format: character

Notes: UNF:6:dDnx5THFokESAvKvoxidog==

Datedocument

f3724181 Location:

Variable Format: character

Notes: UNF:6:c7v/dX3h/xnJlMSV4yZsEA==

Datepublication

f3724181 Location:

Variable Format: character

Notes: UNF:6:QVpcjFNnej0m/LrqSAB/cg==

Firstentryintoforce

f3724181 Location:

Variable Format: character

Notes: UNF:6:AB34p3TBu2cG/+v99fijgA==

Temporalstatus

f3724181 Location:

Summary Statistics: StDev NaN; Mean NaN; Min. NaN; Max. NaN; Valid 0.0;

Variable Format: numeric

Notes: UNF:6:07bGLdAvY0kWn6B9j2JqkA==

Actcites

f3724181 Location:

Variable Format: character

Notes: UNF:6:9+coX40fGRRSvaiuEDNFrA==

Citeslinks

f3724181 Location:

Variable Format: character

Notes: UNF:6:Dr1ZHHXwIRH/LzJlq7ApdQ==

Actammends

f3724181 Location:

Variable Format: character

Notes: UNF:6:dML2DHcSz1EgzhdOVd2vnw==

Ammendslinks

f3724181 Location:

Variable Format: character

Notes: UNF:6:gnKU9FxB32K6Xbcr80t47w==

Eurlexlink

f3724181 Location:

Variable Format: character

Notes: UNF:6:YxPfbzm72poR1yYJdvGzzw==

ELIlink

f3724181 Location:

Variable Format: character

Notes: UNF:6:mvFIlfqjOk9ePiYtL6tGvg==

Proposallink

f3724181 Location:

Variable Format: character

Notes: UNF:6:yr5pA4cs/lX0RJHvCMTmtA==

Oeillink

f3724181 Location:

Variable Format: character

Notes: UNF:6:7dRS+39XvOpXsVG1quRYBA==

Additionalinfo

f3724181 Location:

Variable Format: character

Notes: UNF:6:W10Z9WTu2cbDwBuHgqhJCQ==

CELEX

f3724184 Location:

Variable Format: character

Notes: UNF:6:Ccm94Fh9WZ1MFrgDpfWsEQ==

Actname

f3724184 Location:

Variable Format: character

Notes: UNF:6:AwZRd2NOamjBcwEOe3qmFg==

Acttype

f3724184 Location:

Variable Format: character

Notes: UNF:6:LETivE/Vm2Dlv5+FEUu3GQ==

Status

f3724184 Location:

Variable Format: character

Notes: UNF:6:xSGJjN6nKGLe5ifTImRyJw==

EUROVOC

f3724184 Location:

Variable Format: character

Notes: UNF:6:6zdzfV26z1Hhedt71iG7gw==

Subjectmatter

f3724184 Location:

Variable Format: character

Notes: UNF:6:lnLgiRpkbm+DsNHU89XaQg==

Treaty

f3724184 Location:

Variable Format: character

Notes: UNF:6:YCSmzKgrHODMCfePfFEVFg==

Legalbasiscelex

f3724184 Location:

Variable Format: character

Notes: UNF:6:wl/6hsmpFj05PhlddDahiQ==

Authors

f3724184 Location:

Variable Format: character

Notes: UNF:6:0uKMfSXfbkD431SiNMGjFQ==

Procedurenumber

f3724184 Location:

Variable Format: character

Notes: UNF:6:T8TFnaoVGoPTx6ASrkQWng==

Datedocument

f3724184 Location:

Variable Format: character

Notes: UNF:6:4nHTZZ/FyrtCvhlhiPg88g==

Datepublication

f3724184 Location:

Summary Statistics: Valid 0.0; Mean NaN; StDev NaN; Min. NaN; Max. NaN

Variable Format: numeric

Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg==

Firstentryintoforce

f3724184 Location:

Variable Format: character

Notes: UNF:6:WkxarQbYBpnFA6AwCbPoIQ==

Temporalstatus

f3724184 Location:

Summary Statistics: Min. NaN; Mean NaN; StDev NaN; Max. NaN; Valid 0.0;

Variable Format: numeric

Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg==

Actcites

f3724184 Location:

Variable Format: character

Notes: UNF:6:CN3YUBm6xB+UG3eHBrL1Ng==

Citeslinks

f3724184 Location:

Variable Format: character

Notes: UNF:6:sNdUOOxt3swurWranzVOAg==

Actammends

f3724184 Location:

Variable Format: character

Notes: UNF:6:3rVvRGeIdZYxzaLv0hlUPA==

Ammendslinks

f3724184 Location:

Variable Format: character

Notes: UNF:6:tqJmtq4yrYIXpd80JkHOow==

Eurlexlink

f3724184 Location:

Variable Format: character

Notes: UNF:6:lrw+VihwAyorQ4mN2HyUIA==

ELIlink

f3724184 Location:

Summary Statistics: Max. NaN; Mean NaN; StDev NaN; Min. NaN; Valid 0.0

Variable Format: numeric

Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg==

Proposallink

f3724184 Location:

Summary Statistics: StDev NaN; Min. NaN; Valid 0.0; Mean NaN; Max. NaN;

Variable Format: numeric

Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg==

Oeillink

f3724184 Location:

Summary Statistics: Mean NaN; StDev NaN; Valid 0.0; Max. NaN; Min. NaN

Variable Format: numeric

Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg==

Additionalinfo

f3724184 Location:

Variable Format: character

Notes: UNF:6:tN3y83tZjjGSyQl5SNngBg==

Other Study-Related Materials

Label:

CEPS_EurLex_codebook.pdf

Text:

A codebook, explaining the different variables available in the dataset

Notes:

application/pdf

Other Study-Related Materials

Label:

EurLex_all.csv

Text:

Master file. Contains all data. We recommend using this file.

Notes:

text/csv

Other Study-Related Materials

Label:

EurLex_decisions.csv

Text:

Only decisions, with full text.

Notes:

text/csv

Other Study-Related Materials

Label:

EurLex_regulations_1952_1990.csv

Text:

Only regulations from 1952 - 1990, with full text

Notes:

text/csv

Other Study-Related Materials

Label:

EurLex_regulations_1990_2000.csv

Text:

Only regulations from 1990 - 2000, with full text

Notes:

text/csv

Other Study-Related Materials

Label:

EurLex_regulations_2000_2019.csv

Text:

Only regulations from 2000 - 2019, with full text

Notes:

text/csv

Other Study-Related Materials

Label:

EurLex_regulations_all.csv

Text:

All regulations, with full text

Notes:

text/csv