This page provides access to SDfiles for the compounds used in Robert Jorissen and Mike Gilson's paper on the use of a Support Vector Machine method for compound screening. These files are freely available for academic, commercial, or personal use. We do ask that you cite our reference in any publication that uses this information: Virtual Screening of Molecular Databases Using a Support Vector Machine, Jorissen & Gilson, J.Chem.Inf.Mod., WebRelease 4/15/2005. (Please update citation when the article has a volume and page numbers.) The download contains:
Each SDfile begins with 125 known binders: 25 compounds for each protein target. The targets are as follows:
The subsequent compounds are "decoys" from the National Cancer Institute diversity set. Note that the decoys included in compounds_1ST.sdf are the same as those included in compounds_ODD.sdf and those included in compounds_2ND.sdf are the same as those included in compounds_EVEN.sdf.
The NCI diversity set compounds were filtered and prepared as follows:
For information on SDfile format, see the ctable.pdf document at the MDL web site.
Download here: jorissen-gilson-SVM-compounds.tar.gz