|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables |
|
Identification Number: |
doi:10.7910/DVN/0EGYWY |
|
Distributor: |
Harvard Dataverse |
|
Date of Distribution: |
2020-02-16 |
|
Version: |
2 |
|
Bibliographic Citation: |
Borrett, Camille; Laurer, Moritz, 2020, "The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables", https://doi.org/10.7910/DVN/0EGYWY, Harvard Dataverse, V2, UNF:6:+hH0OxeHX0vYGiIOintTyA== [fileUNF] |
|
Citation |
|
|
Title: |
The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables |
|
Identification Number: |
doi:10.7910/DVN/0EGYWY |
|
Authoring Entity: |
Borrett, Camille |
|
Laurer, Moritz (Centre for European Policy Studies) |
|
|
Other identifications and acknowledgements: |
Andrea Renda |
|
Producer: |
Centre for European Policy Studies |
|
Software used in Production: |
R |
|
Software used in Production: |
Python |
|
Grant Number: |
822735 |
|
Distributor: |
Harvard Dataverse |
|
Access Authority: |
Laurer, Moritz |
|
Depositor: |
Centre for European Policy Studies |
|
Date of Deposit: |
2020-02-16 |
|
Holdings Information: |
https://doi.org/10.7910/DVN/0EGYWY |
|
Study Scope |
|
|
Keywords: |
Computer and Information Science, Law, Social Sciences, Text-As-Data, European Union, Law, Text Analysis |
|
Topic Classification: |
Text-As-Data, European Union, Law |
|
Abstract: |
<h3>The CEPS EurLex dataset</h3> The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. <br> The dataset was collected by the <b>Centre for European Policy Studies (CEPS) for the TRIGGER project</b> (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU.<br> <br> <b>Brief description:</b> <br> - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. <br> - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). <br> - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables.<br> - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data.<br> <br> <b>Caveats</b>: <br> - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely.<br> - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset.<br> <br> <b>Other:</b><br> - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en<br> - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ <br> - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735 |
|
Time Period: |
1952-2019 |
|
Date of Collection: |
2019-07-01-2019-10-01 |
|
Kind of Data: |
Legal data |
|
Kind of Data: |
Textual data |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Sources: |
The official EUR-Lex database: https://eur-lex.europa.eu/homepage.html |
|
Data Access |
|
|
Notes: |
<a href="http://creativecommons.org/publicdomain/zero/1.0">CC0 1.0</a> |
|
Other Study Description Materials |
|
|
File Description--f3724182 |
|
|
File: EurLex_all_no_text.tab |
|
|
|
|
Notes: |
UNF:6:ALexVT8UYNA2w3u4iQeR7Q== |
|
Master file, without full text for easier use (Excel). |
|
|
File Description--f3724186 |
|
|
File: EurLex_decisions_no_text.tab |
|
|
|
|
Notes: |
UNF:6:n+X1deQWBXScELg8426XKg== |
|
Only decisions, without full text (Excel). |
|
|
File Description--f3724185 |
|
|
File: EurLex_directives.tab |
|
|
|
|
Notes: |
UNF:6:qPnaXyWRbOjO2W4cozU45w== |
|
Only directives, with full text. |
|
|
File Description--f3724181 |
|
|
File: EurLex_directives_no_text.tab |
|
|
|
|
Notes: |
UNF:6:gb1IdA/ZViFt8g+15g9ZsQ== |
|
Only directives, without full text (Excel). |
|
|
File Description--f3724184 |
|
|
File: EurLex_regulations_no_text_all.tab |
|
|
|
|
Notes: |
UNF:6:BgwKe5WA7BTUkNggZoEgdQ== |
|
All regulations, without full text (Excel) |
|
|
List of Variables: |
|
|
Variables |
|
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:sQgiSdwBr+ZePrR8IAvxFA== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:UKTwN83Fq+X/uB5DV+w8pg== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:jD79qBu6oJvqEbOxGFyE0Q== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:likJiuv6TIt/Kx1KNJjJiw== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:8nba5ubl2zqJQU34ctNzHA== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:0/HyJYHSkfptb13/IoEnPg== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:mmO8trKN4F5gf+CnUZDmYw== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:MrJOPS6VRPCiziceSigKjA== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:tJeipnoktZ8E+9n+JxZrrw== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:k1nuJHfXLcXMy/Dvj6lPvg== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:ig+P+hPx6NThD2RkkEpKVg== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:kZeGxCX59ieWDizNoJhlEg== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:9GeiNQNCjCQi46obcIv5Vw== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:fZC/4+1/X+BCnm1XgaGjUA== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:ldpclLBFp6+iuxt+J4eeYg== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:sNAvC/BVd49AR+7hOMGivw== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:2AibL+LDMuAFczgbfXCf2A== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:Fz/diaJTn5ddAXsKncqbLQ== |
|
f3724182 Location: |
Summary Statistics: Min. NaN; Max. NaN; Mean NaN; StDev NaN; Valid 0.0; Variable Format: numeric Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw== |
|
f3724182 Location: |
Summary Statistics: Max. NaN; Valid 0.0; Min. NaN; StDev NaN; Mean NaN; Variable Format: numeric Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw== |
|
f3724182 Location: |
Summary Statistics: Mean NaN; Min. NaN; StDev NaN; Valid 0.0; Max. NaN Variable Format: numeric Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw== |
|
f3724182 Location: |
Summary Statistics: StDev NaN; Max. NaN; Mean NaN; Valid 0.0; Min. NaN; Variable Format: numeric Notes: UNF:6:5l5wFnmNTi8K8AphgQWhrw== |
|
f3724182 Location: |
Variable Format: character Notes: UNF:6:yCMQt09bao+21kwLugywBg== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:lzZW297KHfWkgcOCZ9GJpw== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:7jaASSgUog3EsHuOAEz42A== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:hdJHy7xb49gOCB0QTaaaiw== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:xcyGZB40LOPrtkuKBmEXeQ== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:NC2Vc78EtuR8QVj+B+Qspw== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:2OHtP1t2/jyik/nV4vuxoA== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:6T5YS6Dn/08IsEPrG3fMQg== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:HlFqkkQyqtMQzTj73q3BTg== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:csIEC9StKR2Thc/1gmvomg== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:tEu4eScL023BZ/Skt0BAtg== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:e0wavxrkLUh2JF6MCU5/cQ== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:n8gBhvdK/Bbsq8u5vDDjkw== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:5N9hTVZPePE0whhu8fpChg== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:iHCKJjTVHDeD7Un8WUazDQ== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:p3QWEkmE7DUcPwhD/PWeSg== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:WyBymXnF5MV2L3SDywc9aw== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:St5nTyhrcdbyJYUuEcf75w== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:Gxc7c5GDniGO/bMUi+JNVA== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:S4xh15qvm5i6vnUMSoSRfA== |
|
f3724186 Location: |
Summary Statistics: Max. NaN; StDev NaN; Mean NaN; Min. NaN; Valid 0.0 Variable Format: numeric Notes: UNF:6:nvJDwiAqcRZfvn1e2Wm4bA== |
|
f3724186 Location: |
Summary Statistics: Max. NaN; Valid 0.0; Min. NaN; Mean NaN; StDev NaN Variable Format: numeric Notes: UNF:6:nvJDwiAqcRZfvn1e2Wm4bA== |
|
f3724186 Location: |
Summary Statistics: Mean NaN; Max. NaN; Min. NaN; StDev NaN; Valid 0.0 Variable Format: numeric Notes: UNF:6:nvJDwiAqcRZfvn1e2Wm4bA== |
|
f3724186 Location: |
Variable Format: character Notes: UNF:6:IUrSXHpaO+QpMDyOOjLkog== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:OqI8ikIDpFt3+UCHYdspwA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:J6b21PSnfEY/5/AQaq+20g== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:gc/f8nDJbTVhmwp31XdG1w== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:KCiADxqlOZCua/ng3G+EGg== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:REfCJBFpy2Kb4WzlwCuaHg== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:HnNH9kMz5j85m8NfoQgoww== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:ut6sk86TG7pQRaveOR1jhg== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:zzeXojQam1O76bznw1r6RQ== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:Qq0cz0AkmzHfnrNLj2FgDQ== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:dDnx5THFokESAvKvoxidog== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:c7v/dX3h/xnJlMSV4yZsEA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:aYedgitIzXNPu6qsN78PxA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:3A/3gzkB5ReaDD/i75nXqw== |
|
f3724185 Location: |
Summary Statistics: Min. NaN; StDev NaN; Valid 0.0; Mean NaN; Max. NaN Variable Format: numeric Notes: UNF:6:07bGLdAvY0kWn6B9j2JqkA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:9+coX40fGRRSvaiuEDNFrA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:jGYiSNpiqQ2FjgTHhtdIpw== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:dML2DHcSz1EgzhdOVd2vnw== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:Fo2i+RzRdm6vbjWYdQtVMA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:YxPfbzm72poR1yYJdvGzzw== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:mvFIlfqjOk9ePiYtL6tGvg== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:yr5pA4cs/lX0RJHvCMTmtA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:7dRS+39XvOpXsVG1quRYBA== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:W10Z9WTu2cbDwBuHgqhJCQ== |
|
f3724185 Location: |
Variable Format: character Notes: UNF:6:dOagXV6gsBamyDzL7082Ag== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:OqI8ikIDpFt3+UCHYdspwA== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:J6b21PSnfEY/5/AQaq+20g== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:gc/f8nDJbTVhmwp31XdG1w== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:KCiADxqlOZCua/ng3G+EGg== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:REfCJBFpy2Kb4WzlwCuaHg== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:HnNH9kMz5j85m8NfoQgoww== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:ut6sk86TG7pQRaveOR1jhg== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:zzeXojQam1O76bznw1r6RQ== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:Qq0cz0AkmzHfnrNLj2FgDQ== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:dDnx5THFokESAvKvoxidog== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:c7v/dX3h/xnJlMSV4yZsEA== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:QVpcjFNnej0m/LrqSAB/cg== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:AB34p3TBu2cG/+v99fijgA== |
|
f3724181 Location: |
Summary Statistics: StDev NaN; Mean NaN; Min. NaN; Max. NaN; Valid 0.0; Variable Format: numeric Notes: UNF:6:07bGLdAvY0kWn6B9j2JqkA== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:9+coX40fGRRSvaiuEDNFrA== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:Dr1ZHHXwIRH/LzJlq7ApdQ== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:dML2DHcSz1EgzhdOVd2vnw== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:gnKU9FxB32K6Xbcr80t47w== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:YxPfbzm72poR1yYJdvGzzw== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:mvFIlfqjOk9ePiYtL6tGvg== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:yr5pA4cs/lX0RJHvCMTmtA== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:7dRS+39XvOpXsVG1quRYBA== |
|
f3724181 Location: |
Variable Format: character Notes: UNF:6:W10Z9WTu2cbDwBuHgqhJCQ== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:Ccm94Fh9WZ1MFrgDpfWsEQ== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:AwZRd2NOamjBcwEOe3qmFg== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:LETivE/Vm2Dlv5+FEUu3GQ== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:xSGJjN6nKGLe5ifTImRyJw== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:6zdzfV26z1Hhedt71iG7gw== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:lnLgiRpkbm+DsNHU89XaQg== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:YCSmzKgrHODMCfePfFEVFg== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:wl/6hsmpFj05PhlddDahiQ== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:0uKMfSXfbkD431SiNMGjFQ== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:T8TFnaoVGoPTx6ASrkQWng== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:4nHTZZ/FyrtCvhlhiPg88g== |
|
f3724184 Location: |
Summary Statistics: Valid 0.0; Mean NaN; StDev NaN; Min. NaN; Max. NaN Variable Format: numeric Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:WkxarQbYBpnFA6AwCbPoIQ== |
|
f3724184 Location: |
Summary Statistics: Min. NaN; Mean NaN; StDev NaN; Max. NaN; Valid 0.0; Variable Format: numeric Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:CN3YUBm6xB+UG3eHBrL1Ng== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:sNdUOOxt3swurWranzVOAg== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:3rVvRGeIdZYxzaLv0hlUPA== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:tqJmtq4yrYIXpd80JkHOow== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:lrw+VihwAyorQ4mN2HyUIA== |
|
f3724184 Location: |
Summary Statistics: Max. NaN; Mean NaN; StDev NaN; Min. NaN; Valid 0.0 Variable Format: numeric Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg== |
|
f3724184 Location: |
Summary Statistics: StDev NaN; Min. NaN; Valid 0.0; Mean NaN; Max. NaN; Variable Format: numeric Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg== |
|
f3724184 Location: |
Summary Statistics: Mean NaN; StDev NaN; Valid 0.0; Max. NaN; Min. NaN Variable Format: numeric Notes: UNF:6:j1r/hvaUtHKtF7Q4mqhdTg== |
|
f3724184 Location: |
Variable Format: character Notes: UNF:6:tN3y83tZjjGSyQl5SNngBg== |
|
Label: |
CEPS_EurLex_codebook.pdf |
|
Text: |
A codebook, explaining the different variables available in the dataset |
|
Notes: |
application/pdf |
|
Label: |
EurLex_all.csv |
|
Text: |
Master file. Contains all data. We recommend using this file. |
|
Notes: |
text/csv |
|
Label: |
EurLex_decisions.csv |
|
Text: |
Only decisions, with full text. |
|
Notes: |
text/csv |
|
Label: |
EurLex_regulations_1952_1990.csv |
|
Text: |
Only regulations from 1952 - 1990, with full text |
|
Notes: |
text/csv |
|
Label: |
EurLex_regulations_1990_2000.csv |
|
Text: |
Only regulations from 1990 - 2000, with full text |
|
Notes: |
text/csv |
|
Label: |
EurLex_regulations_2000_2019.csv |
|
Text: |
Only regulations from 2000 - 2019, with full text |
|
Notes: |
text/csv |
|
Label: |
EurLex_regulations_all.csv |
|
Text: |
All regulations, with full text |
|
Notes: |
text/csv |