Chang YT, Huang CS, Yao CT, Su SL, Terng HJ, Chou HL, Chou YC, Chen KH, Shih YW, Lu CY, Lai CH, Jian CE, Lin CH, Chen CT, Wu YS, Lin KS, Wetter T, Chang CW, Chu CM. Gene expression profile of peripheral blood in colorectal cancer. World J Gastroenterol 2014; 20(39): 14463-14471 [PMID: 25339833 DOI: 10.3748/wjg.v20.i39.14463]
Corresponding Author of This Article
Chi-Ming Chu, PhD, Professor, Division of Biomedical Statistics and Informatics, School of Public Health, National Defense Medical Center, Mingquan East Road 161, Taipei 114, Taiwan. chuchiming@web.de
Research Domain of This Article
Medical Informatics
Article-Type of This Article
Retrospective Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Gastroenterol. Oct 21, 2014; 20(39): 14463-14471 Published online Oct 21, 2014. doi: 10.3748/wjg.v20.i39.14463
Table 1 Characteristics of the training and testing sets[1,2] n (%)
Training set (n = 162)
Testing set (n = 176)
P value
CRC (n = 55)
Non-CRC (n = 107)
P value
CRC (n = 56)
Non-CRC (n = 120)
P value
Cases
Controls
Age, yr (S.E.)
66.47 (1.50)
68.31 (1.12)
0.335
67.38 (1.83)
69.99 (1.03)
0.216
0.704
0.270
Gender
0.630
0.176
0.387
0.313
Male
32 (58.2)
58 (54.2)
28 (50.0)
73 (60.8)
Female
23 (41.8)
49 (45.8)
28 (50.0)
47 (39.2)
Stage
-
-
0.447
-
I
21 (38.2)
-
15 (26.8)
-
II
10 (18.2)
-
9 (16.1)
-
III
14 (25.5)
-
21 (37.5)
-
IV
10 (18.2)
-
11 (19.6)
-
Tumor site
-
-
-
-
0.286
-
Colon
28 (50.9)
30 (53.6)
Rectum
22 (40.0)
16 (28.6)
Cecum
4 (7.3)
5 (8.9)
Colon+Rectum
1 (1.8)
5 (8.9)
Table 2 Multivariate analysis of colorectal cancer-related molecular markers and the discrimination model based on age, sex, and 15 genes, using the logistic regression model on the training set
95%CI of OR
B
OR
Upper
Lower
P value
Sex
0.577
1.780
7.582
0.418
0.435
Age
0.028
1.028
1.083
0.976
0.293
MCM4
0.142
1.152
4.504
0.295
0.838
ZNF264
1.450
4.265
18.208
0.999
0.050
RNF4
-0.550
0.577
5.146
0.065
0.622
GRB2
2.009
7.456
37.131
1.497
0.014
MDM2
1.359
3.892
15.166
0.999
0.050
STAT2
-1.178
0.308
1.466
0.065
0.139
WEE1
1.264
3.540
14.784
0.848
0.083
DUSP6
2.465
11.769
40.330
3.435
1.33E-11
CPEB4
2.045
7.725
27.695
2.155
0.002
MMD
-1.067
0.344
0.865
0.137
0.023
NF1
-1.417
0.243
1.517
0.039
0.130
IRF4
0.057
1.059
3.350
0.335
0.923
EIF2S3
-2.105
0.122
0.718
0.021
0.020
EXT2
-1.933
0.145
1.235
0.017
0.077
POLDIP2
-1.294
0.274
1.515
0.050
0.138
Table 3 Discrimination power and receiver operating characteristic analysis of different combinations of colorectal cancer-associated genes in the training set
95%CI
Genes used for models
AUC
SE
P value
Lower
Upper
DUSP6
0.804
0.038
< 0.001
0.73
0.879
DUSP6, CPEB4
0.855
0.032
< 0.001
0.791
0.919
DUSP6, CPEB4, EIF2S3
0.882
0.032
< 0.001
0.820
0.945
DUSP6, CPEB4, EIF2S3, MDM2
0.895
0.030
< 0.001
0.838
0.953
DUSP6, CPEB4, EIF2S3, MDM2, MMD
0.905
0.028
< 0.001
0.849
0.960
Table 4 Mean expression levels, standard error and statistical power of selected genes between case and control groups in the training and testing sets
Training set
Testing set
Selected genes
Case (n = 55)
Control (n = 107)
Power
Case (n = 56)
Control (n = 120)
Power
MDM2
-0.4225 (0.08945)
-0.8913 (0.04572)
1
-0.3270 (0.09063)
-0.9209 (0.03618)
1
DUSP6
2.5483 (0.13248)
1.5458 (0.06415)
1
2.0335 (0.12041)
1.7462 (0.06135)
1
CPEB4
1.3413 (0.11016)
0.3932 (0.09799)
1
1.4595 (0.11851)
0.4014 (0.06980)
1
MMD
2.0567 (0.15441)
1.3178 (0.09799)
1
1.7029 (0.15958)
1.4320 (0.07806)
1
EIF2S3
3.4489 (0.07883)
3.6158 (0.05331)
1
3.4311 (0.05937)
3.5620 (0.03815)
1
Table 5 Performance of the statistical model based on the five-gene profile logistic probabilities for the training set
Logit(P)
Sensitivity
Specificity
PPV
NPV
Accuracy
0.020
99%
16%
2.3%
99.9%
44.2%
0.051
95%
63%
12.1%
99.6%
73.9%
0.178
90%
72%
41.1%
97.1%
78.1%
0.500
78%
92%
82.7%
89.1%
87.0%
0.475
80%
90%
87.8%
83.3%
86.6%
0.685
61%
95%
96.4%
52.9%
83.5%
0.901
25%
99%
99.6%
12.6%
73.9%
Table 6 Performance of the statistical model on the training, testing sets and external validation dataset from 12 public microarray studies with Logit(P) = 0.5
Training set
Testing set
External validation
Non-Cancers
107
120
88
True negative
98
110
72
False positive
9
10
16
Colorectal Cancers
55
56
519
False negative
12
19
15
True positive
43
37
504
Total
162
176
607
Sensitivity
78.2%
66.1%
97.1%
Specificity
91.5%
91.7%
81.8%
PPV
82.7%
78.7%
96.9%
NPV
89.1%
85.3%
82.8%
Accuracy
87.0%
83.5%
94.9%
Table 7 Logistic regression models for 12 pooled microarray data sets as the external validation of colorectal cancer -associated genes from three studies
Citation: Chang YT, Huang CS, Yao CT, Su SL, Terng HJ, Chou HL, Chou YC, Chen KH, Shih YW, Lu CY, Lai CH, Jian CE, Lin CH, Chen CT, Wu YS, Lin KS, Wetter T, Chang CW, Chu CM. Gene expression profile of peripheral blood in colorectal cancer. World J Gastroenterol 2014; 20(39): 14463-14471