Retrospective Study
Copyright ©2014 Baishideng Publishing Group Inc.
World J Gastroenterol. Oct 21, 2014; 20(39): 14463-14471
Published online Oct 21, 2014. doi: 10.3748/wjg.v20.i39.14463
Table 1 Characteristics of the training and testing sets[1,2] n (%)
Training set (n = 162)
Testing set (n = 176)
P value
CRC (n = 55)Non-CRC (n = 107)P valueCRC (n = 56)Non-CRC (n = 120)P valueCasesControls
Age, yr (S.E.)66.47 (1.50)68.31 (1.12)0.33567.38 (1.83)69.99 (1.03)0.2160.7040.270
Gender0.6300.1760.3870.313
Male32 (58.2)58 (54.2)28 (50.0)73 (60.8)
Female23 (41.8)49 (45.8)28 (50.0)47 (39.2)
Stage--0.447-
I21 (38.2)-15 (26.8)-
II10 (18.2)-9 (16.1)-
III14 (25.5)-21 (37.5)-
IV10 (18.2)-11 (19.6)-
Tumor site----0.286-
Colon28 (50.9)30 (53.6)
Rectum22 (40.0)16 (28.6)
Cecum4 (7.3)5 (8.9)
Colon+Rectum1 (1.8)5 (8.9)
Table 2 Multivariate analysis of colorectal cancer-related molecular markers and the discrimination model based on age, sex, and 15 genes, using the logistic regression model on the training set
95%CI of OR
BORUpperLowerP value
Sex0.5771.7807.5820.4180.435
Age0.0281.0281.0830.9760.293
MCM40.1421.1524.5040.2950.838
ZNF2641.4504.26518.2080.9990.050
RNF4-0.5500.5775.1460.0650.622
GRB22.0097.45637.1311.4970.014
MDM21.3593.89215.1660.9990.050
STAT2-1.1780.3081.4660.0650.139
WEE11.2643.54014.7840.8480.083
DUSP62.46511.76940.3303.4351.33E-11
CPEB42.0457.72527.6952.1550.002
MMD-1.0670.3440.8650.1370.023
NF1-1.4170.2431.5170.0390.130
IRF40.0571.0593.3500.3350.923
EIF2S3-2.1050.1220.7180.0210.020
EXT2-1.9330.1451.2350.0170.077
POLDIP2-1.2940.2741.5150.0500.138
Table 3 Discrimination power and receiver operating characteristic analysis of different combinations of colorectal cancer-associated genes in the training set
95%CI
Genes used for modelsAUCSEP valueLowerUpper
DUSP60.8040.038< 0.0010.730.879
DUSP6, CPEB40.8550.032< 0.0010.7910.919
DUSP6, CPEB4, EIF2S30.8820.032< 0.0010.8200.945
DUSP6, CPEB4, EIF2S3, MDM20.8950.030< 0.0010.8380.953
DUSP6, CPEB4, EIF2S3, MDM2, MMD0.9050.028< 0.0010.8490.960
Table 4 Mean expression levels, standard error and statistical power of selected genes between case and control groups in the training and testing sets
Training set
Testing set
Selected genesCase (n = 55)Control (n = 107)PowerCase (n = 56)Control (n = 120)Power
MDM2-0.4225 (0.08945)-0.8913 (0.04572)1-0.3270 (0.09063)-0.9209 (0.03618)1
DUSP62.5483 (0.13248)1.5458 (0.06415)12.0335 (0.12041)1.7462 (0.06135)1
CPEB41.3413 (0.11016)0.3932 (0.09799)11.4595 (0.11851)0.4014 (0.06980)1
MMD2.0567 (0.15441)1.3178 (0.09799)11.7029 (0.15958)1.4320 (0.07806)1
EIF2S33.4489 (0.07883)3.6158 (0.05331)13.4311 (0.05937)3.5620 (0.03815)1
Table 5 Performance of the statistical model based on the five-gene profile logistic probabilities for the training set
Logit(P)SensitivitySpecificityPPVNPVAccuracy
0.02099%16%2.3%99.9%44.2%
0.05195%63%12.1%99.6%73.9%
0.17890%72%41.1%97.1%78.1%
0.50078%92%82.7%89.1%87.0%
0.47580%90%87.8%83.3%86.6%
0.68561%95%96.4%52.9%83.5%
0.90125%99%99.6%12.6%73.9%
Table 6 Performance of the statistical model on the training, testing sets and external validation dataset from 12 public microarray studies with Logit(P) = 0.5
Training setTesting setExternal validation
Non-Cancers10712088
True negative9811072
False positive91016
Colorectal Cancers5556519
False negative121915
True positive4337504
Total162176607
Sensitivity78.2%66.1%97.1%
Specificity91.5%91.7%81.8%
PPV82.7%78.7%96.9%
NPV89.1%85.3%82.8%
Accuracy87.0%83.5%94.9%
Table 7 Logistic regression models for 12 pooled microarray data sets as the external validation of colorectal cancer -associated genes from three studies
Model 1
Model 2
Model 3
BS.E.P valueBS.E.P valueBS.E.P value
Five selected genes of this study:
MDM26.0691.461< 0.001
DUSP61.3600.235< 0.001
CPEB4-3.1770.383< 0.001
MMD0.3350.4420.448
EIF2S31.4620.244< 0.001
Seven selected genes of Marshall et al[14]
ANXA30.5590.2120.008
CLEC4D46.2599.918< 0.001
LMNB11.8830.330< 0.001
PRRG4-1.2840.3710.001
TNFAIP61.7870.377< 0.001
VNN10.2070.1590.194
IL2RB0.2690.2160.213
Five selected genes of Han et al[15]
CDA-0.4960.090< 0.001
MGC20553-1.3860.197< 0.001
BANK10.5650.3730.129
BCNP1-0.9441.1480.411
MS4A1-1.4830.4570.001
Constant-32.7586.001< 0.001-124.67825.437< 0.00116.6012.995< 0.001
H-L0.4600.0440.194
R20.8530.8410.693
AUC0.9780.9850.957
Accuracy0.9490.9740.939
Specificity0.8180.8860.716
Sensitivity0.9710.9880.977