Basic Study
Copyright ©The Author(s) 2020. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Oct 28, 2020; 26(40): 6207-6223
Published online Oct 28, 2020. doi: 10.3748/wjg.v26.i40.6207
Prediction of clinically actionable genetic alterations from colorectal cancer histopathology images using deep learning
Hyun-Jong Jang, Ahwon Lee, J Kang, In Hye Song, Sung Hak Lee
Hyun-Jong Jang, Department of Physiology, Department of Biomedicine and Health Sciences, Catholic Neuroscience Institute, The Catholic University of Korea, Seoul 06591, South Korea
Ahwon Lee, J Kang, In Hye Song, Sung Hak Lee, Department of Hospital Pathology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, South Korea
Author contributions: Jang HJ and Lee SH designed research; Lee SH collected material and clinical data from patients; Lee A, Kang J, Song IH and Lee SH performed the assays; Jang HJ, Lee A, Kang J, Song IH and Lee SH analyzed data; Jang HJ and Lee SH wrote the paper.
Supported by Research Fund of Seoul St. Mary’s Hospital made in the program year of 2018.
Institutional review board statement: The study was reviewed and approved by the Institutional Review Board of the College of Medicine at the Catholic University of Korea, No. KC19SESI0787.
Conflict-of-interest statement: The authors declare that they have no conflicts of interest.
Data sharing statement: No additional data are available.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Corresponding author: Sung Hak Lee, MD, PhD, Associate Professor, Department of Hospital Pathology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, 222 Banpo-daero, Seocho-gu, Seoul 06591, South Korea. hakjjang@catholic.ac.kr
Received: June 28, 2020
Peer-review started: June 28, 2020
First decision: July 28, 2020
Revised: August 9, 2020
Accepted: September 25, 2020
Article in press: September 25, 2020
Published online: October 28, 2020
Processing time: 122 Days and 0.9 Hours
ARTICLE HIGHLIGHTS
Research background

Identifying genetic mutations in cancer patients have been increasingly important because distinctive mutational patterns can be very informative to determine the optimal therapeutic strategy. In recent years, the digitization of pathology slide images has been explosively increasing, providing huge digitized tissue data. Combining the routine digitization of pathology whole-slide images (WSIs) with deep learning, computer-aided mutation prediction with the pathology images from cancers can be a time- and cost-effective complementary method for personalized treatment.

Research motivation

Recent studies have reported that deep learning-based molecular cancer subtyping and microsatellite instability prediction can be performed directly from the standard hematoxylin and eosin (H&E) sections in diverse cancers. Motivated by these recent studies, we tried to predict the frequently occurring and clinically meaningful mutations from the H&E-stained colorectal cancer (CRC) tissue WSIs with deep learning-based classifiers. Cost-effective alternatives for current molecular tests can be helpful to support the decision-making process for the management of patients with CRCs.

Research objectives

The present study aimed to investigate the feasibility of deep learning-based mutation prediction for the frequently occurring mutations in CRCs using H&E WSIs.

Research methods

We built and tested the classifiers for mutation prediction on the 629 The Cancer Genome Atlas (TCGA) CRC dataset and validated them with the 142 Seoul St. Mary Hospital (SMH) CRC dataset. Based on the frequency of mutations in both the TCGA and SMH datasets, we chose APC, KRAS, PIK3CA, SMAD4, and TP53 genes for the current study. The classifiers were trained with 360 × 360 pixel patches of tissue images. The receiver operating characteristic (ROC) curves and their area under the curves (AUCs) were presented for all the classifiers to demonstrate the performance of each classifier.

Research results

The AUCs for ROC curves ranged from 0.693 to 0.809 for the TCGA frozen WSIs and from 0.645 to 0.783 for the TCGA formalin-fixed paraffin-embedded WSIs. Moreover, the prediction performance can be enhanced with the expansion of datasets. The prediction performance was improved with the classifiers trained with both TCGA and SMH data.

Research conclusions

The present study demonstrated that the APC, KRAS, PIK3CA, SMAD4, and TP53 mutations can be predicted from H&E pathology images using deep learning-based classifiers, showing the potential for deep learning-based mutation prediction in the CRC tissue slides.

Research perspectives

Although the classifiers in this study were not enough to be used for predicting the genetic mutations in the clinic, we can recognize the potential of deep learning-based methods to learn features for discriminating the wild-type and mutated tissues, which are not easily discernible to the human eyes. Therefore, deep learning models can assist pathologists in the detection of cancer subtype or gene mutations.