This file contains experimental and calculated values of the endpoint for visible sets


Optimization based on the traditional correlation coefficient (r2)


SMILES is used in the model
Data from SMILES-file (#TrainingSet.txt)
Threshold=3
The number of active SMILES attributes (ASA) =25



IMPORTANT: In the case of classic scheme W%=N101/Nall, otherwise W%=N111/Nall
Percent of ASA with presence in all sets (W%) =96

Defect of Split = 1.1806

Intercept (c0) and slope (c1) calculated for each set individually:
Training set   : c0=  -9.08919 c1=   1.13464
InvTraining set: c0=  -6.56787 c1=   0.81804
Calibration set: c0=  -6.81932 c1=   0.85877

Slope and intesept calculated with subtraining set give the model:

Endpoint =  -9.0891909 ( 0.1362470) +    1.1346407 ( 0.0178854) * DCW(3,10)

Statistical characteristics of the model:

N is the number of compounds in the set;
R is correlation coefficient;
Q is cross-validated correlation coefficient;
CCC is concordance correlation coefficient;
IIC is index of ideality of correlation;
s is standard error of estimation;
MAE is mean absolute error;
F is Fischer F-ratio

Blk is the number of SMILES attributes in given SMILES, which are blocked
All is the number of all SMILES attributes in given SMILES string

Y-randomization: 1000 permutations for each average
The randomized correlation coefficients are not constants,
but they have some range, as rule, about 0.03. 

                                 : Train  :InvTrain: Calib 
                                 :      24:      24:      23
                                 :  0.8728:  0.8630:  0.8284
                                1:  0.0818:  0.0200:  0.0345
                                2:  0.0653:  0.0243:  0.0000
                                3:  0.0000:  0.0955:  0.0387
                                4:  0.0816:  0.0211:  0.0078
                                5:  0.1212:  0.0031:  0.0888
                                6:  0.0023:  0.0963:  0.0422
                                7:  0.0050:  0.0592:  0.0501
                                8:  0.0109:  0.0116:  0.0491
                                9:  0.1131:  0.0562:  0.0104
                               10:  0.0244:  0.0421:  0.1130
Rr2, i.e. average randomized R   :  0.0506:  0.0429:  0.0435
   CRp2=R*sqrt(R2-Rr2) [1]       :  0.8471:  0.8413:  0.8064:

 CRp2 should be greater 0.5 [1]

REFERENCE for Y-scrambling
[1] P.K. Ojha, K. Roy, Comparative QSARs for antimalarial endochins:
    Importance of descriptor-thinning and noise reduction prior to
     feature selection, Chemometr. Intell. Lab. 109 (2011) 146-161

External validation characteristics for the model taken from
REFERNCES
[1] Golbraikh A., Tropsha A. J.Mol.Graph.Model. 20(2002)269; // R02, k,kk
[2] Roy P.P., Roy K. Chem. Biol. Drug Des. 73(2009) 442; // Rm2
[3] PK Ojha,I Mitra, RN Das,K Roy,Chemometr Intell Lab 107(2011)194-205
    // Average of Rm2 and absolute difference Rm2(x,y)-Rm2(y,x)
    // x,y are experimental and predicted values of endpoint
[4] I-Kuei Lin, L. A concordance correlation coefficient to
    evaluate reproducibility (1989) Biometrics, 45 (1), 255-268.
[5] Toropova, A.P.,Toropov, A.A. The index of ideality of correlation:
    A criterion of predictability of QSAR models for skin permeability?
    (2016) Science of the Total Environment, . Article in Press.

The range of endpoint:
Min= -3.1 Max=  4.0 Middle=  0.4

n           =      23
r2          =    0.8284
r02         =    0.8271
rr02        =    0.8273
(r2-r02)/r2 =    0.0016 should be < 0.1 [1]
(r2-rr02)/r2=    0.0013 should be < 0.1 [1]
k           =    1.0946 should be 0.85 <  k < 1.15 [1]
kk          =    0.7558 should be 0.85 < kk < 1.15 [1]
Rm2(test)   =    0.7983 should be > 0.5 [2]

n           =      23
r2          =    0.8284
r02         =    0.8273
rr02        =    0.8271
(r2-r02)/r2 =    0.0013 should be < 0.1 [1]
(r2-rr02)/r2=    0.0016 should be < 0.1 [1]
k           =    0.7558 should be 0.85 <  k < 1.15 [1]
kk          =    1.0946 should be 0.85 < kk < 1.15 [1]
R*m2(test)  =    0.8014 should be > 0.5 [2]

Average Rm2 = 0.7998 should be larger 0.5 [3]
Delta Rm2 = 0.0031 should be lower 0.2 [3]

        :  n :  R2   :  CCC  :  IIC  :  Q2   :     s  :    MAE :  F     
Training:  24: 0.8728: 0.9321: 0.5605: 0.8514:   0.805:   0.609:      151
InvTrain:  24: 0.8630: 0.8997: 0.8236: 0.8321:   0.828:   0.639:      139
Calib   :  23: 0.8284: 0.8942: 0.8643: 0.7981:   0.955:   0.769:      101

Training set is indicated by    +;
Invisisble training set is indicated by -;
Calibration set is indicated by #

B a l a n c e   o f   c o r r e l a t i o n s :
    Training set - invisible Training set - Calibration set

 :SMILES                                            :   DCW(3,10):        Expr:        Calc:   Expr-Calc:Blk/All: ID 
+:Nc4cccc1c4c2cccc3cccc1c23                         :    10.62296:      2.8800:      2.9641:     -0.0841:  1/ 49: 13177-27-0
+:Nc1cccc2cccnc12                                   :     6.15652:     -1.1400:     -2.1037:      0.9637:  0/ 29: 578-66-5
+:Nc1ccc2ccccc2c1                                   :     5.93452:     -0.6700:     -2.3556:      1.6856:  0/ 29: 91-59-8
+:Cc1cc(C)c(N)cc1C                                  :     5.84556:     -1.3200:     -2.4566:      1.1366:  0/ 31: 137-17-7
+:Nc1ccc2c3ccccc3Nc2c1                              :     8.79919:      0.6000:      0.8947:     -0.2947:  1/ 39: 4539-51-9
+:Cc1ccc(O)c(N)c1                                   :     6.12225:     -2.1000:     -2.1426:      0.0426:  0/ 29: 95-84-1
+:Nc1ccc(cc1)Sc2ccc(N)cc2                           :     7.57104:      0.3100:     -0.4988:      0.8088:  3/ 45: 139-65-1
+:CC(C)c1ccc(N)cc1N                                 :     5.63158:     -3.0000:     -2.6994:     -0.3006:  1/ 33: 00-00-01
+:Fc1ccc(N)c(F)c1                                   :     6.82390:     -2.7000:     -1.3465:     -1.3535:  1/ 29: 367-25-9
+:Nc4ccc3cccc2c1ccccc1c4c23                         :    10.62296:      3.3500:      2.9641:      0.3859:  1/ 49: 13177-25-8
+:Nc1ccc(Cl)cc1                                     :     5.93470:     -2.5200:     -2.3554:     -0.1646:  1/ 23: 106-47-8
+:Nc1cc(ccc1)c2cc(N)ccc2                            :     7.82973:     -1.3000:     -0.2053:     -1.0947:  0/ 43: 2050-89-7
+:Nc1ccc(OC)cc1C                                    :     5.55900:     -3.0000:     -2.7817:     -0.2183:  0/ 27: 102-50-1
+:Nc1cc(N)ccc1CCCC                                  :     5.49299:     -2.7000:     -2.8566:      0.1566:  0/ 31: 00-00-02
+:Nc3ccc4c2cccc1cccc(c12)c4c3                       :    11.47238:      3.8000:      3.9278:     -0.1278:  1/ 53: 5869-25-0
+:Nc1ccc(Br)cc1                                     :     5.93470:     -2.7000:     -2.3554:     -0.3446:  3/ 23: 106-40-1
+:Nc1cc(Cl)ccc1O                                    :     5.61192:     -3.0000:     -2.7217:     -0.2783:  2/ 25: 95-85-2
+:Nc1ccc(cc1OC)c2ccc(N)c(OC)c2                      :     7.53273:      0.1500:     -0.5422:      0.6922:  1/ 55: 119-90-4
+:[O-][N+](=O)c1ccc(N)c2ccccc12                     :     6.61026:     -1.7700:     -1.5889:     -0.1811: 13/ 57: 776-34-1
+:Nc2cccc1nc3ccccc3nc12                             :     8.31844:     -0.0100:      0.3493:     -0.3593:  2/ 41: 2876-22-4
+:Fc2cc(Cc1ccc(N)c(F)c1)ccc2N                       :     8.38906:      0.2300:      0.4294:     -0.1994:  2/ 53: 13824-23-2
+:Nc1cc2ccccc2nc1                                   :     6.06974:     -3.1400:     -2.2022:     -0.9378:  1/ 29: 580-17-6
+:Nc1cc2c3ccccc3Nc2cc1                              :     8.79919:     -0.4800:      0.8947:     -1.3747:  1/ 39: 6377-12-4
+:Nc1ccc2nc3cc(N)ccc3nc2c1                          :    10.23938:      3.9700:      2.5288:      1.4412:  2/ 47: 120209-97-4
-:Nc1cccc2ncccc12                                   :     5.53722:     -2.0000:     -2.8064:      0.8064:  1/ 29: 611-34-7
-:Nc1ccc2cc3ccccc3cc2c1                             :    10.68771:      2.6200:      3.0375:     -0.4175:  0/ 41: 613-13-8
-:Nc1ccc(cc1Cl)c2ccc(N)c(Cl)c2                      :     8.77835:      0.8100:      0.8711:     -0.0611:  3/ 51: 91-94-1
-:Cc1ccc(N)c(C)c1                                   :     5.62674:     -2.2200:     -2.7049:      0.4849:  0/ 29: 95-68-1
-:Nc1ccc(cc1)c2ccccc2                               :     7.52012:     -0.1400:     -0.5566:      0.4166:  0/ 37: 92-67-1
-:Oc1ccc2c3ccc(N)cc3Cc2c1                           :     9.45314:      0.4100:      1.6367:     -1.2267:  2/ 45: 1953-38-4
-:Cc1cc(N)c(C)cc1                                   :     5.62674:     -2.4000:     -2.7049:      0.3049:  0/ 29: 95-78-3
-:Cc1cc(ccc1N)c2ccc(N)c(C)c2                        :     7.75390:      0.0100:     -0.2913:      0.3013:  1/ 51: 119-93-7
-:Nc2ccccc2c1cc(ccc1)[N+]([O-])=O                   :     7.38300:     -0.8900:     -0.7121:     -0.1779: 15/ 61: 34862-87-8
-:Nc2cc3ccccc3c1ccccc12                             :    10.15519:      2.9800:      2.4333:      0.5467:  0/ 41: 947-73-9
-:Clc1cc(N)cc(Cl)c1N                                :     6.81583:     -0.6900:     -1.3557:      0.6657:  4/ 31: 609-20-1
-:Nc2ccccc2c1ccc(N)cc1                              :     6.40685:     -0.9200:     -1.8197:      0.8997:  0/ 39: 492-17-1
-:Nc1ccc(cc1)Oc2ccc(N)cc2                           :     7.30487:     -1.1400:     -0.8008:     -0.3392:  1/ 45: 101-80-4
-:Nc2cccc1c3ccccc3Cc12                              :     8.35418:      0.4300:      0.3898:      0.0402:  1/ 39: 6344-63-4
-:FC(F)(F)c1cc(N)ccc1                               :     6.95370:     -0.8000:     -1.1992:      0.3992:  2/ 37: 98-16-8
-:Nc1cc(ccc1)c2ccc(cc2)[N+]([O-])=O                 :     8.63920:      0.6900:      0.7132:     -0.0232: 16/ 65: 53059-29-3
-:CCc1cc(ccc1N)Cc2ccc(N)c(CC)c2                     :     6.72318:     -0.9900:     -1.4608:      0.4708:  1/ 57: 19900-65-3
-:Nc1ccc(cc1)c2ccc(N)cc2                            :     7.82973:     -0.3900:     -0.2053:     -0.1847:  0/ 43: 92-87-5
-:Nc1cc(Cl)ccc1N                                    :     5.92662:     -0.4900:     -2.3646:      1.8746:  2/ 25: 95-83-0
-:Nc2ccc3ccc1ccccc1c3c2                             :    11.76652:      3.7700:      4.2616:     -0.4916:  0/ 41: 1892-54-2
-:Nc2cccc1cc3ccccc3cc12                             :    10.15519:      1.1800:      2.4333:     -1.2533:  0/ 41: 610-49-1
-:Nc2c3ccccc3cc1ccccc12                             :    10.15519:      0.8700:      2.4333:     -1.5633:  0/ 41: 779-03-3
-:Nc2cccc3Nc1ccccc1c23                              :     7.64506:     -1.4200:     -0.4148:     -1.0052:  2/ 39: 18992-64-8
-:Nc4cc2c(ccc1ccccc12)c3ccccc34                     :    10.84876:      1.8300:      3.2203:     -1.3903:  2/ 57: 2642-98-0
#:Brc1ccc2c3ccc(N)cc3Cc2c1                          :     9.77591:      2.6200:      2.0030:      0.6170:  3/ 45: 6638-60-4
#:Nc1cc(C)ccc1OC                                    :     4.95525:     -2.0500:     -3.4668:      1.4168:  1/ 27: 120-71-8
#:Nc2cccc3Cc1ccccc1c23                              :     7.73257:      1.1300:     -0.3155:      1.4455:  2/ 39: 7083-63-8
#:Nc1cc(ccc1)c2cc(ccc2)[N+]([O-])=O                 :     8.63920:     -0.5500:      0.7132:     -1.2632: 16/ 65: 31835-64-0
#:Nc2ccccc2c1ccc(cc1)[N+]([O-])=O                   :     7.38300:     -0.6200:     -0.7121:      0.0921: 15/ 61: 6272-52-2
#:Nc2ccccc2c1ccccc1N                                :     6.08917:     -1.5200:     -2.1802:      0.6602:  1/ 35: 1454-80-4
#:Nc3cccc2c3ccc1ccccc12                             :    10.15519:      2.3800:      2.4333:     -0.0533:  0/ 41: 4176-53-8
#:Nc1ccc2nc3ccccc3nc2c1                             :     9.92978:      0.5500:      2.1775:     -1.6275:  2/ 41: 2876-23-5
#:O=[N+]([O-])c1cc(ccc1N)[N+]([O-])=O               :     6.37362:     -2.0000:     -1.8574:     -0.1426: 30/ 69: 97-02-9
#:Nc2ccc(Cc1ccc(N)cc1)cc2                           :     6.70221:     -1.6000:     -1.4846:     -0.1154:  0/ 45: 101-77-9
#:Nc2cc4cccc3c1ccccc1c(c2)c34                       :    12.29737:      3.2300:      4.8639:     -1.6339:  2/ 53: 13177-26-9
#:Nc2ccc(CCc1ccc(N)cc1)cc2                          :     6.75061:     -2.1500:     -1.4297:     -0.7203:  0/ 47: 621-95-4
#:Nc1cc2ccc3ccccc3c2cc1                             :    10.68771:      2.4600:      3.0375:     -0.5775:  0/ 41: 3366-65-2
#:Nc1c(cc(cc1Br)[N+]([O-])=O)[N+]([O-])=O           :     7.11413:     -0.5400:     -1.0172:      0.4772: 33/ 75: 1817-73-8
#:Nc2ccccc2c1ccccc1                                 :     6.09724:     -1.4900:     -2.1710:      0.6810:  0/ 33: 90-41-5
#:Nc1ccccc1Cl                                       :     4.81938:     -3.0000:     -3.6209:      0.6209:  2/ 19: 95-51-2
#:Nc1ccc(cc1)C2CCCCC2                               :     6.45237:     -1.2400:     -1.7681:      0.5281:  3/ 37: 6373-50-8
#:Nc2ccc(Oc1ccccc1)cc2                              :     6.99526:      0.3800:     -1.1521:      1.5321:  1/ 39: 139-59-3
#:[O-][N+](=O)c1ccc2c(c1)Cc3cc(N)ccc23              :     9.79568:      3.0000:      2.0254:      0.9746: 14/ 71: 1214-32-0
#:Nc1ccc(cc1)c2ccc(cc2)[N+]([O-])=O                 :     8.63920:      1.0400:      0.7132:      0.3268: 16/ 65: 1211-40-1
#:Nc1cc(ccc1)c2ccc(N)cc2                            :     7.82973:      0.2000:     -0.2053:      0.4053:  0/ 43: 32316-90-8
#:Nc4ccc1ccc2cccc3ccc4c1c23                         :    10.62296:      1.4300:      2.9641:     -1.5341:  1/ 49: 1606-67-3
#:CC(C)c1cc(ccc1N)Cc2ccc(N)c(c2)C(C)C               :     6.67351:     -1.7700:     -1.5172:     -0.2528:  2/ 69: 19900-66-4
