Published online Oct 28, 2020. doi: 10.3748/wjg.v26.i40.6207
Peer-review started: June 28, 2020
First decision: July 28, 2020
Revised: August 9, 2020
Accepted: September 25, 2020
Article in press: September 25, 2020
Published online: October 28, 2020
Processing time: 122 Days and 0.9 Hours
Identifying genetic mutations in cancer patients have been increasingly important because distinctive mutational patterns can be very informative to determine the optimal therapeutic strategy. In recent years, the digitization of pathology slide images has been explosively increasing, providing huge digitized tissue data. Combining the routine digitization of pathology whole-slide images (WSIs) with deep learning, computer-aided mutation prediction with the pathology images from cancers can be a time- and cost-effective complementary method for personalized treatment.
Recent studies have reported that deep learning-based molecular cancer subtyping and microsatellite instability prediction can be performed directly from the standard hematoxylin and eosin (H&E) sections in diverse cancers. Motivated by these recent studies, we tried to predict the frequently occurring and clinically meaningful mutations from the H&E-stained colorectal cancer (CRC) tissue WSIs with deep learning-based classifiers. Cost-effective alternatives for current molecular tests can be helpful to support the decision-making process for the management of patients with CRCs.
The present study aimed to investigate the feasibility of deep learning-based mutation prediction for the frequently occurring mutations in CRCs using H&E WSIs.
We built and tested the classifiers for mutation prediction on the 629 The Cancer Genome Atlas (TCGA) CRC dataset and validated them with the 142 Seoul St. Mary Hospital (SMH) CRC dataset. Based on the frequency of mutations in both the TCGA and SMH datasets, we chose APC, KRAS, PIK3CA, SMAD4, and TP53 genes for the current study. The classifiers were trained with 360 × 360 pixel patches of tissue images. The receiver operating characteristic (ROC) curves and their area under the curves (AUCs) were presented for all the classifiers to demonstrate the performance of each classifier.
The AUCs for ROC curves ranged from 0.693 to 0.809 for the TCGA frozen WSIs and from 0.645 to 0.783 for the TCGA formalin-fixed paraffin-embedded WSIs. Moreover, the prediction performance can be enhanced with the expansion of datasets. The prediction performance was improved with the classifiers trained with both TCGA and SMH data.
The present study demonstrated that the APC, KRAS, PIK3CA, SMAD4, and TP53 mutations can be predicted from H&E pathology images using deep learning-based classifiers, showing the potential for deep learning-based mutation prediction in the CRC tissue slides.
Although the classifiers in this study were not enough to be used for predicting the genetic mutations in the clinic, we can recognize the potential of deep learning-based methods to learn features for discriminating the wild-type and mutated tissues, which are not easily discernible to the human eyes. Therefore, deep learning models can assist pathologists in the detection of cancer subtype or gene mutations.