KEGG-C via PubChem


KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. A recent paper is: Molecular network analysis of diseases and drugs in KEGG. Kanehisa M. Methods Mol Biol. 2013;939:263-75. doi: 10.1007/978-1-62703-107-3_17. We are grateful to the authors for creating and curating KEGG and thank them for making the structures available via PubChem for incorporation into ZINC.

Contact Information

Not available
no phone
no fax

ZINC Subset Overview

Last updated
Source catalog size
Number filtered out
Upload to PubChem?
Not for Sale (Annotated)

Quick Links

Sample molecules
Detailed view
Only from this vendor
Unix download
mol2 unix
SDF unix
Flexibase unix [Scripts to download database files on Linux and MacOS]
Win download
mol2 windows
SDF windows
Flexibase windows [Scripts to download database files on Windows]

Chemical Diversity and Clustering

We assess the chemical diversity of a subset by clustering the molecules. First, we sort ligands by increasing molecular weight. Then, we use the SUBSET 1.0 algorithm ( Voigt JH, Bienfait B, Wang S, Nicklaus MC. JCICS, 2001, 41, 702-12) to progressively select compounds that differ from those previously selected by at least the Tanimoto cutoff, using ChemAxon default fingerprints. The resulting representatives have two interesting properties:

  • 1) Each representative differs from all the others by at least the Tanitmoto cutoff and
  • 2) All the molecules in the subset are within the Tanimoto cutoff of at least one representative.
Thus the representatives can be said to "cover" the chemical space of the subset at a given Tanimoto level. N/A indicates that clustering is pending.

Tanimoto Cutoff Level 60% 70% 80% 90% 100%
Number of Representatives 1,733 2,964 4,755 7,316 17,116

Physical Property Distributions

We compute the physical properties of each molecule in the subset, and graph them below.   Download Calculated Physical Properties

Tab-delimited information files

Ready-to-dock molecular files

More about this.
Format Reference(pH 7) Mid(pH 6-8) High(pH 8-9.5) Low(pH 4.5-6) Download
SMILES All All All All
MOL2 All All All All Single Usual Metals All Single Usual Metals All
SDF All All All All Single Usual Metals All Single Usual Metals All
Flexibase 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 0 1 2 0 1 Single Usual Metals All Single Usual Metals All