Published online Dec 14, 2014. doi: 10.3748/wjg.v20.i46.17476
Revised: February 17, 2014
Accepted: March 12, 2014
Published online: December 14, 2014
Processing time: 532 Days and 3.3 Hours
AIM: To verify gene expression profiles for colorectal cancer using 12 internet public microarray datasets.
METHODS: Logistic regression analysis was performed, and odds ratios for each gene were determined between colorectal cancer (CRC) and controls. Twelve public microarray datasets of GSE 4107, 4183, 8671, 9348, 10961, 13067, 13294, 13471, 14333, 15960, 17538, and 18105, which included 519 cases of adenocarcinoma and 88 normal mucosa controls, were pooled and used to verify 17 selective genes from 3 published studies and estimate the external generality.
RESULTS: We validated the 17 CRC-associated genes from studies by Chang et al (Model 1: 5 genes), Marshall et al (Model 2: 7 genes) and Han et al (Model 3: 5 genes) and performed the multivariate logistic regression analysis using the pooled 12 public microarray datasets as well as the external validation. The goodness-of-fit test of Hosmer-Lemeshow (H-L) showed statistical significance (P = 0.044) for Model 2 of Marshall et al in which observed event rates did not match expected event rates in subgroups of the model population. Expected and observed event rates in subgroups were similar, which are called well calibrated, in Models 1, 3 and 4 with non-significant P values of 0.460, 0.194 and 1.000 for H-L tests, respectively. A 7-gene model of CPEB4, EIF2S3, MGC20553, MS4A1, ANXA3, TNFAIP6 and IL2RB was pairwise selected, which showed the best results in logistic regression analysis (H-L P = 1.000, R2 = 0.951, areas under the curve = 0.999, accuracy = 0.968, specificity = 0.966 and sensitivity = 0.994).
CONCLUSION: A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.
Core tip: In the future, the 7-gene (CPEB4, EIF2S3, MGC20553, MS4A1, ANXA3, TNFAIP6 and IL2RB) logistic regression model that showed the best results can be further verified for more samples. Meanwhile, the causal relations are needed to confirm among the selected genes and colorectal cancer (CRC). The expression signature of these CRC-associated genes can be evaluated for early detection of CRC. Early detection can thus improve survival in patients before symptoms are detectable, during treatment, or during remission.