Published online Aug 16, 2018. doi: 10.12998/wjcc.v6.i8.200
Peer-review started: March 28, 2018
First decision: May 16, 2018
Revised: June 7, 2018
Accepted: June 26, 2018
Article in press: June 27, 2018
Published online: August 16, 2018
Processing time: 141 Days and 20.7 Hours
PM2.5 and PM10, also known as particle pollutions, can deposit in the respiratory tract and may trigger inflammatory reactions. Several studies have revealed that PM2.5 and PM10 concentrations may be associated with the occurrence of upper respiratory tract infections (URIs) and increase the mortality related to hospitalized pneumonia. Machine learning utilizes computational statistics to explore optimized algorithms that can learn from and make predictions based on data. Machine learning for potential hazardous exposures has been successfully applied to predict the occurrence of several clinical diseases, such as myocardial infarction, and the related risk of mortality. In addition, machine learning, such as artificial neural networks, can provide us an opportunity for big data training for the prediction of clinical diseases. For example, Carnegie Mellon’s Delphi group of the United States Centers of Disease Control has been working to create a machine learning model that accurately tracks the spread of the flu.
Since the severity of air pollution varies geographically, the hazardous effect on human health may also differ by region and ethnicity. It is reasonable to create a surveillance system to forecast the probability of disease occurrence related to regional air pollution. Accordingly, we attempted to establish a model of machine learning to relate PM2.5 and PM10 concentrations to the volume of outpatient visits for acute URIs in Taiwan.
To examine the accuracy of machine learning to relate PM2.5 and PM10 concentrations to URIs.
Daily nationwide and regional outdoor PM2.5 and PM10 concentrations collected over 30 consecutive days from the Taiwan Environment Protection Administration were the inputs for the multilayer perceptron (MLP) machine learning to relate to the subsequent one-week outpatient visits for URIs. The URI data were obtained from the Centers for Disease Control datasets in Taiwan between 2009 and 2016. The testing used the middle month dataset of each season (January, April, July, and October), and the training used the other months’ datasets. The weekly URI cases were classified by tertile as high, moderate, and low volumes.
Both PM concentrations and URI cases peak in the winter and spring. In the nationwide data analysis, MLP machine learning can accurately relate PM2.5 and PM10 concentrations with the URI volumes of the elderly (89.05% and 88.32%, respectively) and the overall population (81.75% and 83.21%, respectively). In the regional data analyses, PM2.5 has greater accuracy than PM10 for the elderly, particularly in the Central region (78.10% and 74.45%, respectively), whereas PM10 has greater accuracy than PM2.5 for the overall population, particularly in the Northern region (73.19% and 63.04%, respectively).
Machine learning could accurately relate short-term PM2.5 and PM10 concentrations to subsequent URI occurrence. Our findings suggested that the effects of PM2.5 and PM10 on URI may differ by age, and the mechanism needs further evaluation.
We used MLP machine learning to successfully relate PM concentrations data to the volume of URI cases. Data for more air pollutants and other meteorological parameters can be applied to the current MLP model in future work.