Published online Oct 21, 2014. doi: 10.3748/wjg.v20.i39.14463
Revised: April 8, 2014
Accepted: June 12, 2014
Published online: October 21, 2014
Processing time: 292 Days and 19.5 Hours
AIM: Optimal molecular markers for detecting colorectal cancer (CRC) in a blood-based assay were evaluated.
METHODS: A matched (by variables of age and sex) case-control design (111 CRC and 227 non-cancer samples) was applied. Total RNAs isolated from the 338 blood samples were reverse-transcribed, and the relative transcript levels of candidate genes were analyzed. The training set was made of 162 random samples of the total 338 samples. A logistic regression analysis was performed, and odds ratios for each gene were determined between CRC and non-cancer. The samples (n = 176) in the testing set were used to validate the logistic model, and an inferred performance (generality) was verified. By pooling 12 public microarray datasets(GSE 4107, 4183, 8671, 9348, 10961, 13067, 13294, 13471, 14333, 15960, 17538, and 18105), which included 519 cases of adenocarcinoma and 88 controls of normal mucosa, we were able to verify the selected genes from logistic models and estimate their external generality.
RESULTS: The logistic regression analysis resulted in the selection of five significant genes (P < 0.05; MDM2, DUSP6, CPEB4, MMD, and EIF2S3), with odds ratios of 2.978, 6.029, 3.776, 0.538 and 0.138, respectively. The five-gene model performed stably for the discrimination of CRC cases from controls in the training set, with accuracies ranging from 73.9% to 87.0%, a sensitivity of 95% and a specificity of 95%. In addition, a good performance in the test set was obtained using the discrimination model, providing 83.5% accuracy, 66.0% sensitivity, 92.0% specificity, a positive predictive value of 89.2% and a negative predictive value of 73.0%. Multivariate logistic regressions analyzed 12 pooled public microarray data sets as an external validation. Models that provided similar expected and observed event rates in subgroups were termed well calibrated. A model in which MDM2, DUSP6, CPEB4, MMD, and EIF2S3 were selected showed the result in logistic regression analysis (H-L P = 0.460, R2= 0.853, AUC = 0.978, accuracy = 0.949, specificity = 0.818 and sensitivity = 0.971).
CONCLUSION: A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.
Core tip: A novel gene expression profile was associated with colorectal cancer and can potentially be applied to blood-based detection assays. The model that selected MDM2, DUSP6, CPEB4, MMD, and EIF2S3 showed the result in logistic regression analysis (H-L P = 0.460, R2 = 0.853, AUC = 0.978, accuracy = 0.949, specificity = 0.818 and sensitivity = 0.971).