Retrospective Study
Copyright ©2014 Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Dec 14, 2014; 20(46): 17476-17482
Published online Dec 14, 2014. doi: 10.3748/wjg.v20.i46.17476
Verification of gene expression profiles for colorectal cancer using 12 internet public microarray datasets
Yu-Tien Chang, Chung-Tay Yao, Sui-Lung Su, Yu-Ching Chou, Chi-Ming Chu, Chi-Shuan Huang, Harn-Jing Terng, Hsiu-Ling Chou, Thomas Wetter, Kang-Hua Chen, Chi-Wen Chang, Yun-Wen Shih, Ching-Huang Lai
Yu-Tien Chang, Chi-Ming Chu, Yun-Wen Shih, Division of Biomedical Statistics and Informatics, School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
Yu-Tien Chang, Chi-Ming Chu, Graduate Institute of Medical Sciences, National Defense Medical Center, Taipei 114, Taiwan
Chung-Tay Yao, Department of Emergency, Cathay General Hospital, Taipei 106, Taiwan
Sui-Lung Su, Yu-Ching Chou, Ching-Huang Lai, Department of Epidemiology, School of Public Health, National Defense Medical Center, Taipei 114, Taiwan
Chi-Shuan Huang, Division of Colorectal Surgery, Cheng Hsin Rehabilitation Medical Center, Taipei 112, Taiwan
Harn-Jing Terng, Advpharma, Inc., Taipei 221, Taiwan
Hsiu-Ling Chou, Department of Nursing, Far Eastern Memorial Hospital and Oriental Institute of Technology, New Taipei 220, Taiwan
Thomas Wetter, Department of Medical Informatics, Faculty of Medicine, University of Heidelberg, 69120 Heidelberg, Germany
Kang-Hua Chen, Chi-Wen Chang, School of Nursing, College of Medicine, Chang Gung University, Tao-Yuan 333, Taiwan
Author contributions: Chu CM and Chang CW designed the research; Shih YW, Chang YT, Terng HJ and Wetter T performed the research; Chou YC, Su SL, Huang CS, Chou HL, Chen KH and Lai CH analyzed the data; Chu CM and Chang CW wrote the paper.
Correspondence to: Chi-Ming Chu, PhD, Professor, Division of Biomedical Statistics and Informatics, School of Public Health, National Defense Medical Center, No. 161, Section 6, Min-Chuan East Road, Taipei 114, Taiwan. chuchiming@web.de
Telephone: +886-9-63367484 Fax: +886-2-87923147
Received: July 6, 2013
Revised: February 17, 2014
Accepted: March 12, 2014
Published online: December 14, 2014
Processing time: 532 Days and 3.3 Hours
Abstract

AIM: To verify gene expression profiles for colorectal cancer using 12 internet public microarray datasets.

METHODS: Logistic regression analysis was performed, and odds ratios for each gene were determined between colorectal cancer (CRC) and controls. Twelve public microarray datasets of GSE 4107, 4183, 8671, 9348, 10961, 13067, 13294, 13471, 14333, 15960, 17538, and 18105, which included 519 cases of adenocarcinoma and 88 normal mucosa controls, were pooled and used to verify 17 selective genes from 3 published studies and estimate the external generality.

RESULTS: We validated the 17 CRC-associated genes from studies by Chang et al (Model 1: 5 genes), Marshall et al (Model 2: 7 genes) and Han et al (Model 3: 5 genes) and performed the multivariate logistic regression analysis using the pooled 12 public microarray datasets as well as the external validation. The goodness-of-fit test of Hosmer-Lemeshow (H-L) showed statistical significance (P = 0.044) for Model 2 of Marshall et al in which observed event rates did not match expected event rates in subgroups of the model population. Expected and observed event rates in subgroups were similar, which are called well calibrated, in Models 1, 3 and 4 with non-significant P values of 0.460, 0.194 and 1.000 for H-L tests, respectively. A 7-gene model of CPEB4, EIF2S3, MGC20553, MS4A1, ANXA3, TNFAIP6 and IL2RB was pairwise selected, which showed the best results in logistic regression analysis (H-L P = 1.000, R2 = 0.951, areas under the curve = 0.999, accuracy = 0.968, specificity = 0.966 and sensitivity = 0.994).

CONCLUSION: A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.

Keywords: Gene expression profiles; Colorectal cancer; Microarray; Gene Expression Omnibus; Gene Expression Omnibus; Gene Expression Omnibus series

Core tip: In the future, the 7-gene (CPEB4, EIF2S3, MGC20553, MS4A1, ANXA3, TNFAIP6 and IL2RB) logistic regression model that showed the best results can be further verified for more samples. Meanwhile, the causal relations are needed to confirm among the selected genes and colorectal cancer (CRC). The expression signature of these CRC-associated genes can be evaluated for early detection of CRC. Early detection can thus improve survival in patients before symptoms are detectable, during treatment, or during remission.