|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Replication Data for: Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics |
|
Identification Number: |
doi:10.7910/DVN/JMFHTN |
|
Distributor: |
Harvard Dataverse |
|
Date of Distribution: |
2015-10-08 |
|
Version: |
1 |
|
Bibliographic Citation: |
Asgari, Ehsaneddin, 2015, "Replication Data for: Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics", https://doi.org/10.7910/DVN/JMFHTN, Harvard Dataverse, V1, UNF:6:MdFOywP8u70n6695tyjGAw== [fileUNF] |
|
Citation |
|
|
Title: |
Replication Data for: Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics |
|
Identification Number: |
doi:10.7910/DVN/JMFHTN |
|
Authoring Entity: |
Asgari, Ehsaneddin (University of California, Berkeley) |
|
Distributor: |
Harvard Dataverse |
|
Access Authority: |
Asgari, Ehsaneddin |
|
Depositor: |
Asgari, Ehsaneddin |
|
Date of Deposit: |
2015-10-08 |
|
Holdings Information: |
https://doi.org/10.7910/DVN/JMFHTN |
|
Study Scope |
|
|
Keywords: |
Chemistry, Computer and Information Science, Medicine, Health and Life Sciences, Deep Proteomic, Deep Learning, Deep Genomics, Distributed Representation, Disordered Protein Prediction, Family Classification Benchmark, Protein Data Visualization, t-SNE, Word2Vec |
|
Abstract: |
Users should cite: Asgari E, Mofrad MRK. <a href='http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141287' >Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics</a>. PLoS ONE 10(11): e0141287. doi:10.1371/journal.pone.0141287. This archive also contains the family classification data that we used in the above mentioned PLoS ONE paper. This data can be used as a benchmark for family classification task. |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Access |
|
|
Citation Requirement: |
If you are using this data and method please cite the following paper: Asgari, Ehsaneddin and Mofrad Mohmmad R.K. "Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics". PloS one (2015). In Press. |
|
Notes: |
This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions: |
|
If you are using this data and method please cite the following paper: Asgari, Ehsaneddin and Mofrad Mohmmad R.K. "Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics". PloS one (2015). In Press. |
|
|
Other Study Description Materials |
|
|
Related Publications |
|
|
Citation |
|
|
Title: |
Asgari E, Mofrad MRK (2015) Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS ONE 10(11): e0141287. https://doi.org/10.1371/journal.pone.0141287 |
|
Identification Number: |
10.1371/journal.pone.0141287 |
|
Bibliographic Citation: |
Asgari E, Mofrad MRK (2015) Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS ONE 10(11): e0141287. https://doi.org/10.1371/journal.pone.0141287 |
|
File Description--f2712444 |
|
|
File: family_classification_metadata.tab |
|
|
|
|
Notes: |
UNF:6:njDXLW5qKIEwZHQNVwue0g== |
|
File Description--f2712443 |
|
|
File: family_classification_sequences.tab |
|
|
|
|
Notes: |
UNF:6:yizYYNIv4P+F07al61ev0g== |
|
List of Variables: |
|
|
Variables |
|
|
f2712444 Location: |
Variable Format: character Notes: UNF:6:/YAzQQUr0TT+SrQCOLZY7g== |
|
f2712444 Location: |
Variable Format: character Notes: UNF:6:GDMHWRUl4xOqo19A5EpPSQ== |
|
f2712444 Location: |
Variable Format: character Notes: UNF:6:DnOuVFUmi/vjizsxYrRJdA== |
|
f2712444 Location: |
Variable Format: character Notes: UNF:6:aPOODovMmWM64CAmrByv1w== |
|
f2712444 Location: |
Variable Format: character Notes: UNF:6:OvrFPu8auY/NVFHl+biN4Q== |
|
f2712443 Location: |
Variable Format: character Notes: UNF:6:yizYYNIv4P+F07al61ev0g== |
|
Label: |
family_classification_protVec.csv |
|
Notes: |
text/csv |
|
Label: |
protVec_100d_3grams.csv |
|
Text: |
protein-vectors (ProtVec) : Distributed Representation for Proteins, for deep learning applications of proteomics. Each 3-gram is presented by a 100D vector. |
|
Notes: |
text/csv |