Hung M, Park J, Hon ES, Bounsanga J, Moazzami S, Ruiz-Negrón B, Wang D. Artificial intelligence in dentistry: Harnessing big data to predict oral cancer survival. World J Clin Oncol 2020; 11(11): 918-934 [PMID: 33312886 DOI: 10.5306/wjco.v11.i11.918]
Corresponding Author of This Article
Man Hung, PhD, Professor, Research Dean, College of Dental Medicine, Roseman University of Health Sciences, 10894 S River Front Parkway, South Jordan, UT 84095, United States. mhung@roseman.edu
Research Domain of This Article
Dentistry, Oral Surgery & Medicine
Article-Type of This Article
Retrospective Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Clin Oncol. Nov 24, 2020; 11(11): 918-934 Published online Nov 24, 2020. doi: 10.5306/wjco.v11.i11.918
Table 1 Number of oral cancer cases from various anatomical sites
ICD-O-3 codes
Sites
Number of cases
C000
External upper lip
413
C001
External lower lip
2444
C002
External lip, NOS
92
C003
Mucosa of upper lip
104
C004
Mucosa of lower lip
567
C005
Mucosa of lip, NOS
29
C006
Commissure of lip
85
C008
Overlapping lesion of lip
46
C009
Lip, NOS (excludes skin of lip C44.0)
153
C019
Base of tongue, NOS
10840
C020
Dorsal surface of tongue, NOS
652
C021
Border of tongue
2632
C022
Ventral surface of tongue, NOS
1688
C023
Anterior 2/3 of tongue, NOS
2807
C024
Lingual tonsil
170
C028
Overlapping lesion of tongue
581
C029
Tongue, NOS
3050
C030
Upper gum
821
C031
Lower gum
1680
C039
Gum, NOS
210
C040
Anterior floor of mouth
1362
C041
Lateral floor of mouth
352
C048
Overlapping lesion of floor of mouth
136
C049
Floor of mouth, NOS
2284
C050
Hard palate
1155
C051
Soft palate, NOS (excludes nasopharyngeal surface of soft palate C11.3)
1301
C052
Uvula
180
C058
Overlapping lesion of palate
206
C059
Palate, NOS
154
C060
Cheek mucosa
1787
C061
Vestibule of mouth
134
C062
Retromolar area
1413
C068
Overlapping lesion of other and unspecified parts of mouth
142
C069
Mouth, NOS
487
C079
Parotid gland
7111
C080
Submandibular gland
1149
C081
Sublingual gland
94
C088
Overlapping lesion of major salivary glands
6
C089
Major salivary gland, NOS (excludes minor salivary gland, NOS C06.9)
287
C090
Tonsillar fossa
1735
C091
Tonsillar pillar
888
C098
Overlapping lesion of tonsil
109
C099
Tonsil, NOS (excludes lingual tonsil C02.4 and pharyngeal tonsil C11.1)
9521
C100
Vallecula
282
C101
Anterior surface of epiglottis
88
C102
Lateral wall of oropharynx
184
C103
Posterior wall of oropharynx
246
C104
Branchial cleft (site of neoplasm)
37
C108
Overlapping lesion of oropharynx
277
C109
Oropharynx, NOS
940
C129
Pyriform sinus
1707
C130
Postcricoid region
78
C131
Hypopharyngeal aspect of aryepiglottic fold, NOS (excludes laryngeal aspect of aryepiglottic fold C32.1)
214
C132
Posterior wall of hypopharynx
250
C138
Overlapping lesion of hypopharynx
113
C139
Hypopharynx, NOS
816
C739
Thyroid gland
111425
Table 2 List of all 10 variables included in the final machine learning model building and validation
Variables
Variable description
Age at diagnosis
This data item represents the age of the patient at diagnosis for this cancer. The code is three digits and represents the patient’s actual age in years
Year of diagnosis
The year of diagnosis is the year the tumor was first diagnosed by a recognized medical practitioner, whether clinically or microscopically confirmed
Month of diagnosis
The month of diagnosis is the month the tumor was first diagnosed by a recognized medical practitioner, whether clinically or microscopically confirmed
Primary site
This data item identifies the site in which the primary tumor originated. See the International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3)[18] for topography codes. The decimal point is eliminated
CS tumor size
Information on tumor size. Available for 2004-2015 diagnosis years. Earlier cases may be converted and new codes added which weren't available for use prior to the current version of CS. For more information, see http://seer.cancer.gov/seerstat/variables/seer/ajcc-stage[19]
CS extension
Information on extension of the tumor. Available for 2004-2015 diagnosis years. Earlier cases may be converted and new codes added which weren't available for use prior to the current version of CS. For more information, see http://seer.cancer.gov/seerstat/variables/seer/ajcc-stage[19]
This is the AJCC “Stage Group” component that is derived from CS detailed site-specific codes, using the CS algorithm, effective with 2004-2015 diagnosis years. See the CS site-specific schema for details (http://seer.cancer.gov/seerstat/variables/seer/ajcc-stage)[19]
RX Summ-surg prim site
Surgery of primary site describes a surgical procedure that removes and/or destroys tissue of the primary site performed as part of the initial work-up or first course of therapy
Site recode ICD-O-3/WHO 2008
A recode based on primary site and ICD-O-3 Histology in order to make analyses of site/histology groups easier. For example, the lymphomas are excluded from stomach and Kaposi and mesothelioma are separate categories based on histology. For more information, see http://seer.cancer.gov/siterecode/icdo3_dwhoheme/index.html[20]
Table 3 Demographic characteristics of the sample (n = 177714)
Variable
Mean
SD
Median
n
%
Survival months/mo
60.35
40.98
54.00
Age at diagnosis/yr
54.62
16.10
55.00
Tumor size/(ID, cm)
22.56
21.74
19.00
Marital status
Single
35688
20.08
Married
110480
62.17
Separated
1746
0.98
Divorced
16401
9.23
Widowed
13055
7.35
Unmarried or domestic partner
344
0.19
Sex
Male
72179
40.62
Female
105535
59.38
Race
White
148556
83.60
Black
16051
9.03
Other
13107
7.38
Table 4 Machine learning model performance
Performance indicators
Linear regression
Decision tree
Random forest
XGBoost
MSE
647.49
538.30
489.58
486.55
RMSE
25.45
23.20
22.13
22.06
MAE
18.21
14.45
13.63
13.55
R2 score
0.620
0.681
0.709
0.711
Adjusted R2 score
0.620
0.681
0.709
0.711
Citation: Hung M, Park J, Hon ES, Bounsanga J, Moazzami S, Ruiz-Negrón B, Wang D. Artificial intelligence in dentistry: Harnessing big data to predict oral cancer survival. World J Clin Oncol 2020; 11(11): 918-934