ATOMICA (doi:10.7910/DVN/4DUBJX)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

ATOMICA

Identification Number:

doi:10.7910/DVN/4DUBJX

Distributor:

Harvard Dataverse

Date of Distribution:

2025-04-02

Version:

1

Bibliographic Citation:

Fang, Ada; Zaixi Zhang; Andrew Zhou; Marinka Zitnik, 2025, "ATOMICA", https://doi.org/10.7910/DVN/4DUBJX, Harvard Dataverse, V1, UNF:6:5HKIlWTQwgwDI29MfVc/5Q== [fileUNF]

Study Description

Citation

Title:

ATOMICA

Identification Number:

doi:10.7910/DVN/4DUBJX

Authoring Entity:

Fang, Ada (Harvard University)

Zaixi Zhang (Harvard University)

Andrew Zhou (Harvard University)

Marinka Zitnik (Harvard University)

Distributor:

Harvard Dataverse

Access Authority:

Fang, Ada

Depositor:

Fang, Ada

Date of Deposit:

2025-03-27

Holdings Information:

https://doi.org/10.7910/DVN/4DUBJX

Study Scope

Keywords:

Chemistry, Computer and Information Science, Medicine, Health and Life Sciences

Abstract:

Datasets used in developing & evaluating ATOMICA.

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="http://creativecommons.org/licenses/by/4.0">CC BY 4.0</a>

Other Study Description Materials

File Description--f11033981

File: ATOMICAScore_protein_small_molecule_results.tab

  • Number of cases: 5691

  • No. of variables per record: 24

  • Type of File: text/tab-separated-values

Notes:

UNF:6:CgFPlCHL6R4A7UpNxTlPQA==

ATOMICAScore for identification of amino acid blocks involved in intermolecular bonds.

File Description--f11034562

File: ADP_ids_sequence_30_split.tab

  • Number of cases: 334388

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:LDK8HUM0sSAUVE5LyLMcZQ==

30% sequence similarity split for ADP ligands on protein-small molecule complexes

File Description--f11034561

File: ATP_ids_sequence_30_split.tab

  • Number of cases: 331809

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:IcRGbZgxYwKdA6tCnYag2A==

30% sequence similarity split for ATP ligands on protein-small molecule complexes

File Description--f11034569

File: CA_ids_sequence_30_split.tab

  • Number of cases: 202273

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:KMAaUCWusRU7w419n53vkA==

30% sequence similarity split for CA ligands on protein-ion complexes

File Description--f11034575

File: CIT_ids_sequence_30_split.tab

  • Number of cases: 307807

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:mcfcqtIqd1YtfLuFlIQxeg==

30% sequence similarity split for CIT ligands on protein-small molecule complexes

File Description--f11034579

File: CLA_ids_sequence_30_split.tab

  • Number of cases: 309156

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:gmZqNjII8J4S9H8n0stiOg==

30% sequence similarity split for CLA ligands on protein-small molecule complexes

File Description--f11034566

File: CO_ids_sequence_30_split.tab

  • Number of cases: 202157

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:SWJEVMiBBW5lrmzFsOFdUQ==

30% sequence similarity split for CO ligands on protein-ion complexes

File Description--f11034565

File: CU_ids_sequence_30_split.tab

  • Number of cases: 200746

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:deP/TqmZcjtq5B4a+qQyQQ==

30% sequence similarity split for CU ligands on protein-ion complexes

File Description--f11034865

File: dark_proteome_predictions.tab

  • Number of cases: 3065

  • No. of variables per record: 4

  • Type of File: text/tab-separated-values

Notes:

UNF:6:esWxnh1+pVYCV86nAhVL7A==

Predictions of ATOMICA-Ligand on dark proteome ion and small molecule binding sites

File Description--f11034572

File: FAD_ids_sequence_30_split.tab

  • Number of cases: 333866

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:2/hy9HcUiQJgi6uaRupi0g==

30% sequence similarity split for FAD ligands on protein-small molecule complexes

File Description--f11034574

File: FE_ids_sequence_30_split.tab

  • Number of cases: 201812

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:UOHdwrCxuymh8p9KMmKsPQ==

30% sequence similarity split for FE ligands on protein-ion complexes

File Description--f11034571

File: GDP_ids_sequence_30_split.tab

  • Number of cases: 319029

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:4Y4Yzf+EXHwlGlzJmA/2sQ==

30% sequence similarity split for GDP ligands on protein-small molecule complexes

File Description--f11034576

File: GTP_ids_sequence_30_split.tab

  • Number of cases: 315492

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:vbYOTn7VbUyDd9ms1DBD9g==

30% sequence similarity split for GTP ligands on protein-small molecule complexes

File Description--f11034573

File: HEC_ids_sequence_30_split.tab

  • Number of cases: 269320

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:cCegCG8MKhvVG1heMEgv2A==

30% sequence similarity split for HEC ligands on protein-small molecule complexes

File Description--f11034563

File: HEM_ids_sequence_30_split.tab

  • Number of cases: 333487

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:n3Xba22j9a+xMc84zCIflQ==

30% sequence similarity split for HEM ligands on protein-small molecule complexes

File Description--f11034568

File: K_ids_sequence_30_split.tab

  • Number of cases: 203260

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:C2L2+pA4uW9XjEHmeMHSzg==

30% sequence similarity split for K ligands on protein-ion complexes

File Description--f11034580

File: MG_ids_sequence_30_split.tab

  • Number of cases: 200799

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:57pxZCJpRCm//0MIgLroWg==

30% sequence similarity split for MG ligands on protein-ion complexe complexes

File Description--f11034578

File: MN_ids_sequence_30_split.tab

  • Number of cases: 203784

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:J2Iu1Tpo2vqwUCOjZIJI2A==

30% sequence similarity split for MN ligands on protein-ion complexes

File Description--f11034567

File: NAD_ids_sequence_30_split.tab

  • Number of cases: 310015

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:/qTQMj4j7+nJDNXPGMNQ8Q==

30% sequence similarity split for NAD ligands on protein-small molecule complexes

File Description--f11034564

File: NAP_ids_sequence_30_split.tab

  • Number of cases: 308983

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:doR2+I+invnVWFKu8Oal/Q==

30% sequence similarity split for NAP ligands on protein-small molecule complexes

File Description--f11034577

File: NDP_ids_sequence_30_split.tab

  • Number of cases: 288706

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:gYOjKlHbMP8WxSNL0O5znA==

30% sequence similarity split for NDP ligands on protein-small molecule complexes

File Description--f11034570

File: ZN_ids_sequence_30_split.tab

  • Number of cases: 202806

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:vM5BKDQvTbtaVAgkE7/Hpw==

30% sequence similarity split for ZN ligands on protein-ion complexes

File Description--f11033982

File: PDNA_ids.tab

  • Number of cases: 2750

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:fkbEYNjsww0D7xAp+bpM4g==

Protein-DNA 30% sequence similarity split

File Description--f11033989

File: Pion_ids.tab

  • Number of cases: 74514

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:RmxcW4MpC8dDKJ5YhhetaQ==

Protein-ion 30% sequence similarity split

File Description--f11033996

File: PL_ids.tab

  • Number of cases: 119017

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:NyV65rMg/9kna1s6w9peyA==

Protein-small molecule 30% sequence similarity split

File Description--f11033993

File: Ppeptide_ids.tab

  • Number of cases: 8475

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:K34Ftjt7kdT5Z3ji6zZiWw==

Protein-peptide 30% sequence similarity split

File Description--f11033984

File: PP_ids.tab

  • Number of cases: 124541

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:Hj9F0FLleU13zaO5vffu9g==

Protein-protein 30% sequence similarity split

File Description--f11033983

File: PRNA_ids.tab

  • Number of cases: 3511

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:Yz3HNEhVX8Z/vWiRfXNOxg==

Protein-RNA 30% sequence similarity split

File Description--f11033985

File: RNAL_ids.tab

  • Number of cases: 5185

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:C5VSg8nATMoeMgYItM6dqA==

RNA-ligand 30% sequence similarity split

Variable Description

List of Variables:

Variables

item_id

f11033981 Location:

Variable Format: character

Notes: UNF:6:dwB19c//Lbr0hYM3pZe4Xg==

topk_Hydrogen Bonds

f11033981 Location:

Summary Statistics: StDev 1.4490795283072564; Mean 1.8360569319978937; Valid 5691.0; Min. 0.0; Max. 8.0

Variable Format: numeric

Notes: UNF:6:g9EP/AqkCV4maI5DcZiouQ==

topk_Hydrophobic Interactions

f11033981 Location:

Summary Statistics: Min. 0.0; Max. 7.0; StDev 1.1469985951158388; Valid 5691.0; Mean 0.7083113688279761;

Variable Format: numeric

Notes: UNF:6:HH05yt1jR6UYZZrgy0C74Q==

topk_pi-Stacking

f11033981 Location:

Summary Statistics: Min. 0.0; Mean 0.11509400808293671; Valid 5691.0; Max. 3.0; StDev 0.34509198099013855

Variable Format: numeric

Notes: UNF:6:9w973QZUgIbS0kP9D/j1YQ==

topk_Metal Complexes

f11033981 Location:

Summary Statistics: Valid 5691.0; Max. 4.0; Min. 0.0; Mean 0.05183623264804117; StDev 0.36635964145056626;

Variable Format: numeric

Notes: UNF:6:lTjuZ7lj3nfd81mtd9qnWw==

topk_Halogen Bonds

f11033981 Location:

Summary Statistics: Max. 2.0; Min. 0.0; Valid 5691.0; Mean 0.0028114566859954427; StDev 0.05922020610167539;

Variable Format: numeric

Notes: UNF:6:3D6iQ95W2ZCmJPsTdNDvug==

rand_Hydrogen Bonds

f11033981 Location:

Summary Statistics: StDev 1.23004406077369; Max. 7.0; Mean 1.3772623440520109; Valid 5691.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:p5IftzzHhu4+dXqhbE7uQQ==

rand_Hydrophobic Interactions

f11033981 Location:

Summary Statistics: Max. 6.0; Mean 0.4795290810050949; StDev 0.8416200314378888; Min. 0.0; Valid 5691.0;

Variable Format: numeric

Notes: UNF:6:8i+zY/OH/2yBXl9Fj6cNuQ==

rand_pi-Stacking

f11033981 Location:

Summary Statistics: StDev 0.2793959263460065; Mean 0.07239500966438224; Min. 0.0; Valid 5691.0; Max. 2.0

Variable Format: numeric

Notes: UNF:6:++NSU/4aAmUJVWc0TG8sXQ==

rand_Metal Complexes

f11033981 Location:

Summary Statistics: Max. 4.0; Min. 0.0; Mean 0.0379546652609383; Valid 5691.0; StDev 0.2792790719931769;

Variable Format: numeric

Notes: UNF:6:U/3JosmVJ2NB3qQ44+ewXw==

rand_Halogen Bonds

f11033981 Location:

Summary Statistics: StDev 0.04954194341563684; Mean 0.002460024600245975; Max. 1.0; Valid 5691.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:xWLIZwggKUMwCCRm73ZciQ==

topk_total

f11033981 Location:

Summary Statistics: StDev 1.4595818169397208; Mean 2.7141099982428414; Min. 0.0; Valid 5691.0; Max. 8.0;

Variable Format: numeric

Notes: UNF:6:nnff82jHd9YikpkZ6+pqDw==

rand_total

f11033981 Location:

Summary Statistics: Valid 5691.0; Max. 7.0; Mean 1.969601124582674; Min. 0.0; StDev 1.2507811966345896;

Variable Format: numeric

Notes: UNF:6:zPlwWq3z0BTlcMqSNmYTDg==

Hydrogen Bonds

f11033981 Location:

Summary Statistics: Min. 0.0; Mean 5.865577227200848; StDev 4.88957440758518; Max. 36.0; Valid 5691.0

Variable Format: numeric

Notes: UNF:6:y5Sc7PqIHXrwkIOVbmPowA==

pi-Stacking

f11033981 Location:

Summary Statistics: Min. 0.0; Mean 0.3363205060622032; Max. 8.0; Valid 5691.0; StDev 0.7557957144180683

Variable Format: numeric

Notes: UNF:6:1X41ez6crXCW9X2qAypdyg==

Hydrophobic Interactions

f11033981 Location:

Summary Statistics: Mean 2.3535406782639257; Min. 0.0; Max. 31.0; Valid 5691.0; StDev 3.6704940491824583

Variable Format: numeric

Notes: UNF:6:OtX4UZO05HM2xQLaaz4w5w==

Metal Complexes

f11033981 Location:

Summary Statistics: Min. 0.0; Max. 7.0; Valid 5691.0; StDev 0.5520930580785177; Mean 0.11298541556844163;

Variable Format: numeric

Notes: UNF:6:Cu0mTzuCHbFjPLJaJkUQUg==

Halogen Bonds

f11033981 Location:

Summary Statistics: StDev 0.09350269117683291; Min. 0.0; Mean 0.00667720962923887; Max. 2.0; Valid 5691.0;

Variable Format: numeric

Notes: UNF:6:I1DqwxA14XKgh5AvmRrHKw==

esm_Hydrogen Bonds

f11033981 Location:

Summary Statistics: Min. 0.0; Valid 5691.0; StDev 1.464239618509718; Mean 1.697241258126868; Max. 7.0

Variable Format: numeric

Notes: UNF:6:JIrXkmV+op0k08B0JuLr8g==

esm_Hydrophobic Interactions

f11033981 Location:

Summary Statistics: StDev 0.8998385453313613; Max. 6.0; Valid 5691.0; Min. 0.0; Mean 0.4837462660340889;

Variable Format: numeric

Notes: UNF:6:1d2EZ9AVffERkyC9TEMhIw==

esm_pi-Stacking

f11033981 Location:

Summary Statistics: Min. 0.0; Max. 3.0; Mean 0.08645229309435956; Valid 5691.0; StDev 0.3135671156108943;

Variable Format: numeric

Notes: UNF:6:/YumVz7dk8p5QgALyC0b1g==

esm_Metal Complexes

f11033981 Location:

Summary Statistics: StDev 0.5207989211123341; Mean 0.10068529256721126; Max. 7.0; Min. 0.0; Valid 5691.0

Variable Format: numeric

Notes: UNF:6:bUf1aV/1rYK6EmxuwfnJow==

esm_Halogen Bonds

f11033981 Location:

Summary Statistics: Valid 5691.0; StDev 0.0458750057150348; Max. 1.0; Mean 0.002108592514496593; Min. 0.0

Variable Format: numeric

Notes: UNF:6:bJKfsO006szKxgT5UZH4xA==

esm_total

f11033981 Location:

Summary Statistics: Valid 5691.0; StDev 1.4600068366101557; Mean 2.370233702337024; Max. 8.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:0Rd7lZ+uRcWcP09uDrZokQ==

split

f11034562 Location:

Variable Format: character

Notes: UNF:6:W2W3xPh9cuHUdgo+dc1UWA==

id

f11034562 Location:

Variable Format: character

Notes: UNF:6:/p/X39axrWiOrs/p5dEnfw==

split

f11034561 Location:

Variable Format: character

Notes: UNF:6:Tm1qcsaQBG790iIYnWuRUw==

id

f11034561 Location:

Variable Format: character

Notes: UNF:6:kb3IJP5HXQxoln2TF6sFYA==

split

f11034569 Location:

Variable Format: character

Notes: UNF:6:Mu6kx2PsddZW8RV4slvaiw==

id

f11034569 Location:

Variable Format: character

Notes: UNF:6:yAt7z0RPkiUwzGRPlBMFCg==

split

f11034575 Location:

Variable Format: character

Notes: UNF:6:8TuvFADYF6o0YyyGSKSEGA==

id

f11034575 Location:

Variable Format: character

Notes: UNF:6:6/RYcDZYWPTh9bGkrwVYXQ==

split

f11034579 Location:

Variable Format: character

Notes: UNF:6:uYaDR/iGL8uHo6YTqGQLCA==

id

f11034579 Location:

Variable Format: character

Notes: UNF:6:9t8xqhVErRcAHRORFcp0rw==

split

f11034566 Location:

Variable Format: character

Notes: UNF:6:qgPBqy68NUtH+Ms5LlFpZg==

id

f11034566 Location:

Variable Format: character

Notes: UNF:6:PiC4acDszQW+XlOH0QoUyA==

split

f11034565 Location:

Variable Format: character

Notes: UNF:6:nn3S1dwtrDMz7KJw/1mQsQ==

id

f11034565 Location:

Variable Format: character

Notes: UNF:6:7HLnmyhtVD6aIIswlqrq4w==

ligand

f11034865 Location:

Variable Format: character

Notes: UNF:6:LQTDESOSB1wxyvBf31DfYg==

id

f11034865 Location:

Variable Format: character

Notes: UNF:6:D6t2WMLVb1dsFYft1PXvZg==

mean

f11034865 Location:

Summary Statistics: Min. 0.050465149666666674; Valid 3065.0; Mean 0.7859892032410034; StDev 0.1924985931413104; Max. 0.9985229133333333;

Variable Format: numeric

Notes: UNF:6:/1CDDYEbNLO73SbqoUEWdA==

std

f11034865 Location:

Summary Statistics: Mean 0.16827149435310784; Valid 3065.0; Max. 0.5741644156525099; Min. 1.2830682301940609E-4; StDev 0.18629739185168348

Variable Format: numeric

Notes: UNF:6:4xEj6cvnZ55GROHU0szMHA==

split

f11034572 Location:

Variable Format: character

Notes: UNF:6:ix3lSyF+lj/U3F7gKIS8xw==

id

f11034572 Location:

Variable Format: character

Notes: UNF:6:Z1L1zeNstKm7iJX06hmDQg==

split

f11034574 Location:

Variable Format: character

Notes: UNF:6:2KDwlh/okimRksKwXpCLcA==

id

f11034574 Location:

Variable Format: character

Notes: UNF:6:7+I3oaGm0r2E0Ay9AhNwlg==

split

f11034571 Location:

Variable Format: character

Notes: UNF:6:82Wt247nR/DxAVgkSZpZtQ==

id

f11034571 Location:

Variable Format: character

Notes: UNF:6:cFMhPXJxqqqhj+sbJzjViQ==

split

f11034576 Location:

Variable Format: character

Notes: UNF:6:w0nW2E8fdz6SOvOHAdNEHQ==

id

f11034576 Location:

Variable Format: character

Notes: UNF:6:2BRMW/UXdRSvmENSAQA3+A==

split

f11034573 Location:

Variable Format: character

Notes: UNF:6:gAZ2J+HyD/Y2i/F63BNGPA==

id

f11034573 Location:

Variable Format: character

Notes: UNF:6:OAZ5WFqc53wpXxFcv3rnpw==

split

f11034563 Location:

Variable Format: character

Notes: UNF:6:9OWry8peiJcp9l3YVBU2tw==

id

f11034563 Location:

Variable Format: character

Notes: UNF:6:er4srlDjBGS/QWd1KNTipQ==

split

f11034568 Location:

Variable Format: character

Notes: UNF:6:dE6dJxD65aD7UvJ2J5NwMQ==

id

f11034568 Location:

Variable Format: character

Notes: UNF:6:VEhJEdM1DuMqC2Kpq2If1A==

split

f11034580 Location:

Variable Format: character

Notes: UNF:6:0N24Ye1a4cOoCWbvEfeecQ==

id

f11034580 Location:

Variable Format: character

Notes: UNF:6:OVa00G+9XzUHBnJrJ/Pu7Q==

split

f11034578 Location:

Variable Format: character

Notes: UNF:6:x+0JQRJSfU5UAgbkjr396w==

id

f11034578 Location:

Variable Format: character

Notes: UNF:6:8Q7rkcWSfy3/aA7R9+sfBw==

split

f11034567 Location:

Variable Format: character

Notes: UNF:6:Ln6qdA5E3ECaohf0xE5CLw==

id

f11034567 Location:

Variable Format: character

Notes: UNF:6:T23cuML00ycGUqL6DTLBBA==

split

f11034564 Location:

Variable Format: character

Notes: UNF:6:5Vb0sh62D/jfQ5rDxPKZvQ==

id

f11034564 Location:

Variable Format: character

Notes: UNF:6:/ILwC5eRRhGM87TjJMT0WA==

split

f11034577 Location:

Variable Format: character

Notes: UNF:6:5VuGE2S9Lrwxuf8oXHzl1g==

id

f11034577 Location:

Variable Format: character

Notes: UNF:6:xHomkJMlBJzpi5AugMBj/A==

split

f11034570 Location:

Variable Format: character

Notes: UNF:6:o5K657t/1S6uyBSaelWYRQ==

id

f11034570 Location:

Variable Format: character

Notes: UNF:6:uPpUe3P595KNYUbRC9xwCA==

split

f11033982 Location:

Variable Format: character

Notes: UNF:6:8Tx9l4Fgk/lNGjAVkFOKPA==

id

f11033982 Location:

Variable Format: character

Notes: UNF:6:jafllybnEYaVFex5JCfvGw==

split

f11033989 Location:

Variable Format: character

Notes: UNF:6:8NzJMGT/RCdljW5VHCEBBQ==

id

f11033989 Location:

Variable Format: character

Notes: UNF:6:TmGgvuHqXRrMStm6+PS07A==

split

f11033996 Location:

Variable Format: character

Notes: UNF:6:3HEfy3ACfN11u04C2dR9Gg==

id

f11033996 Location:

Variable Format: character

Notes: UNF:6:8uSPbDjqnLVjmEjatC0Kxg==

split

f11033993 Location:

Variable Format: character

Notes: UNF:6:SXiwSAydjHbG252+xNRebg==

id

f11033993 Location:

Variable Format: character

Notes: UNF:6:uwgRgM1KtYp1iZWlVnQBXw==

split

f11033984 Location:

Variable Format: character

Notes: UNF:6:BSmEFJj5k+39CF5csw4PaA==

id

f11033984 Location:

Variable Format: character

Notes: UNF:6:HOHpWbBYRsy3AqELyn9gaQ==

split

f11033983 Location:

Variable Format: character

Notes: UNF:6:r/KV+RjJV4wX+sn+3jpitw==

id

f11033983 Location:

Variable Format: character

Notes: UNF:6:zCx8b950i0IwxFE4auXgnw==

split

f11033985 Location:

Variable Format: character

Notes: UNF:6:lug//4aoMVAzoB/7ZQS1DQ==

id

f11033985 Location:

Variable Format: character

Notes: UNF:6:L/EUtWGDwQXpgvFsT45sCQ==

Other Study-Related Materials

Label:

annotated_dark_proteome_AF3_outputs.tar.gz

Text:

AlphaFold3 structures and confidence scores of predicted dark proteome metal ion and small molecule complexes

Notes:

application/x-gzip

Other Study-Related Materials

Label:

is_dark_90_plddt_PeSTo_80_ion.jsonl.gz

Text:

Processed AFDB Cluster representative proteins (with pLDDT > 90) which have predicted small molecule binding sites (with PeSTo confidence > 80)

Notes:

application/x-gzip

Other Study-Related Materials

Label:

is_dark_90_plddt_PeSTo_80_small_molecule.jsonl.gz

Text:

Processed AFDB Cluster representative proteins (with pLDDT > 90) which have predicted ion binding sites (with PeSTo confidence > 80)

Notes:

application/x-gzip

Other Study-Related Materials

Label:

ATOMICANet_ion.gml

Text:

ATOMICANet-Ion for protein-ion interfaceome

Notes:

application/gml+xml

Other Study-Related Materials

Label:

ATOMICANet_lipid.gml

Text:

ATOMICANet-Lipid for protein-lipid interfaceome

Notes:

application/gml+xml

Other Study-Related Materials

Label:

ATOMICANet_nucleic_acid.gml

Text:

ATOMICANet-Nucleic-Acid for protein-nucleic acid interfaceome

Notes:

application/gml+xml

Other Study-Related Materials

Label:

ATOMICANet_protein.gml

Text:

ATOMICANet-Protein for protein-protein interfaceome

Notes:

application/gml+xml

Other Study-Related Materials

Label:

ATOMICANet_small_molecule.gml

Text:

ATOMICANet-Small-Molecule for protein-small molecule interfaceome

Notes:

application/gml+xml

Other Study-Related Materials

Label:

pesto_70_plddt_70_ion.jsonl.gz

Text:

Human proteome protein-ion interfaces from AlphaFold2 structures with pLDDT 70% cutoff and PeSTO 70 confidence cutoff

Notes:

application/x-gzip

Other Study-Related Materials

Label:

pesto_70_plddt_70_lipid.jsonl.gz

Text:

Human proteome protein-lipid interfaces from AlphaFold2 structures with pLDDT 70% cutoff and PeSTO 70 confidence cutoff

Notes:

application/x-gzip

Other Study-Related Materials

Label:

pesto_70_plddt_70_nucleic_acid.jsonl.gz

Text:

Human proteome protein-nucleic acid interfaces from AlphaFold2 structures with pLDDT 70% cutoff and PeSTO 70 confidence cutoff

Notes:

application/x-gzip

Other Study-Related Materials

Label:

pesto_70_plddt_70_protein.jsonl.gz

Text:

Human proteome protein-protein interfaces from AlphaFold2 structures with pLDDT 70% cutoff and PeSTO 70 confidence cutoff

Notes:

application/x-gzip

Other Study-Related Materials

Label:

pesto_70_plddt_70_small_molecule.jsonl.gz

Text:

Human proteome protein-small molecule interfaces from AlphaFold2 structures with pLDDT 70% cutoff and PeSTO 70 confidence cutoff

Notes:

application/x-gzip

Other Study-Related Materials

Label:

CSD.jsonl.gz

Text:

CSD pre-training data

Notes:

application/x-gzip

Other Study-Related Materials

Label:

CSD_ids.csv

Text:

CSD molecule motif similarity split

Notes:

text/csv

Other Study-Related Materials

Label:

PDNA.jsonl.gz

Text:

Protein-DNA pre-training data

Notes:

application/x-gzip

Other Study-Related Materials

Label:

Pion.jsonl.gz

Text:

Protein-ion pre-training data

Notes:

application/x-gzip

Other Study-Related Materials

Label:

PL.jsonl.gz

Text:

Protein-small molecule pre-training data

Notes:

application/x-gzip

Other Study-Related Materials

Label:

PP.jsonl.gz

Text:

Protein-protein pre-training data

Notes:

application/x-gzip

Other Study-Related Materials

Label:

Ppeptide.jsonl.gz

Text:

Protein-peptide pre-training data

Notes:

application/x-gzip

Other Study-Related Materials

Label:

PRNA.jsonl.gz

Text:

Protein-RNA pre-training data

Notes:

application/x-gzip

Other Study-Related Materials

Label:

RNAL.jsonl.gz

Text:

RNA-ligand pre-training data

Notes:

application/x-gzip