Clinical and Translational Research
Copyright ©The Author(s) 2023. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastrointest Oncol. Jul 15, 2023; 15(7): 1215-1226
Published online Jul 15, 2023. doi: 10.4251/wjgo.v15.i7.1215
Integrated analysis of single-cell and bulk RNA-seq establishes a novel signature for prediction in gastric cancer
Fei Wen, Xin Guan, Hai-Xia Qu, Xiang-Jun Jiang
Fei Wen, Qingdao University, Medical College, Qingdao 266000, Shandong Province, China
Xin Guan, Hai-Xia Qu, Xiang-Jun Jiang, Department of Gastroenterology, Qingdao Municipal Hospital, Qingdao 266071, Shandong Province, China
Author contributions: Jiang XJ designed and coordinated the study; Wen F, Qu HX, and Guan X performed data collection and analysis; Wen F interpreted the data and wrote the manuscript; All authors approved the final version of the article.
Institutional review board statement: Given that our article is based on a study of sequencing data in the public database, GEO, there are no ethical issues involved, so the institutional review board approval form or document and institutional animal care and use committee approval form or document are not applicable.
Conflict-of-interest statement: All the authors report having no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Xiang-Jun Jiang, PhD, Doctor, Department of Gastroenterology, Qingdao Municipal Hospital, No. 1 Jiaozhou Road, Qingdao 266071, Shandong Province, China. drjxj@163.com
Received: February 1, 2023
Peer-review started: February 1, 2023
First decision: March 21, 2023
Revised: March 31, 2023
Accepted: May 8, 2023
Article in press: May 8, 2023
Published online: July 15, 2023
Processing time: 161 Days and 2.5 Hours
Abstract
BACKGROUND

Single-cell sequencing technology provides the capability to analyze changes in specific cell types during the progression of disease. However, previous single-cell sequencing studies on gastric cancer (GC) have largely focused on immune cells and stromal cells, and further elucidation is required regarding the alterations that occur in gastric epithelial cells during the development of GC.

AIM

To create a GC prediction model based on single-cell and bulk RNA sequencing (bulk RNA-seq) data.

METHODS

In this study, we conducted a comprehensive analysis by integrating three single-cell RNA sequencing (scRNA-seq) datasets and ten bulk RNA-seq datasets. Our analysis mainly focused on determining cell proportions and identifying differentially expressed genes (DEGs). Specifically, we performed differential expression analysis among epithelial cells in GC tissues and normal gastric tissues (NAGs) and utilized both single-cell and bulk RNA-seq data to establish a prediction model for GC. We further validated the accuracy of the GC prediction model in bulk RNA-seq data. We also used Kaplan–Meier plots to verify the correlation between genes in the prediction model and the prognosis of GC.

RESULTS

By analyzing scRNA-seq data from a total of 70707 cells from GC tissue, NAG, and chronic gastric tissue, 10 cell types were identified, and DEGs in GC and normal epithelial cells were screened. After determining the DEGs in GC and normal gastric samples identified by bulk RNA-seq data, a GC predictive classifier was constructed using the Least absolute shrinkage and selection operator (LASSO) and random forest methods. The LASSO classifier showed good performance in both validation and model verification using The Cancer Genome Atlas and Genotype-Tissue Expression (GTEx) datasets [area under the curve (AUC)_min = 0.988, AUC_1se = 0.994], and the random forest model also achieved good results with the validation set (AUC = 0.92). Genes TIMP1, PLOD3, CKS2, TYMP, TNFRSF10B, CPNE1, GDF15, BCAP31, and CLDN7 were identified to have high importance values in multiple GC predictive models, and KM-PLOTTER analysis showed their relevance to GC prognosis, suggesting their potential for use in GC diagnosis and treatment.

CONCLUSION

A predictive classifier was established based on the analysis of RNA-seq data, and the genes in it are expected to serve as auxiliary markers in the clinical diagnosis of GC.

Keywords: Gastric cancer; Single-cell RNA sequencing; Prediction model; Least absolute shrinkage and selection operator; Random forest

Core Tip: In this study, we integrated and analyzed three single-cell RNA sequencing datasets and 10 bulk RNA sequencing datasets of gastric cancer (GC) from the Gene Expression Omnibus database. We conducted a differential expression analysis of epithelial cell subpopulations from GC tissue and normal gastric mucosa tissue and constructed GC prediction classifiers using the Least absolute shrinkage and selection operator (LASSO) method and random forest method. The LASSO prediction model was further validated in the Cancer Genome Atlas stomach adenocarcinoma dataset. TIMP1, PLOD3, CKS2, TYMP, TNFRSF10B, CPNE1, GDF15, BCAP31, and CLDN7 were selected as the predictive genes for GC. This study provides a new approach for constructing prediction models based on single-cell sequencing data and offers new reference targets for the clinical diagnosis and treatment of GC.