Tunable structure priors for Bayesian rule learning for knowledge integrated biomarker discovery

doi:10.5306/wjco.v9.i5.98

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 9, Issue 5

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (7404)

All Articles published online

The chart showing PDF series, WORD series, HTML series, Figures (1-11) series, Tables (1-1) series.

Item

Count

PDF

441

WORD

332

HTML

3723

Figures (1-11)

482

Tables (1-1)

510

Sum=5488

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

665

Download

1251

Sum=1916

Sep 14, 2018 (publication date) through Aug 13, 2025

Times Cited of This Article

Times Cited (2)

Journal Information of This Article

Publication Name

World Journal of Clinical Oncology

ISSN

2218-4333

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Basic Study

World J Clin Oncol. Sep 14, 2018; 9(5): 98-109
Published online Sep 14, 2018. doi: 10.5306/wjco.v9.i5.98

Tunable structure priors for Bayesian rule learning for knowledge integrated biomarker discovery

Jeya Balaji Balasubramanian, Vanathi Gopalakrishnan

Jeya Balaji Balasubramanian, Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, United States

Vanathi Gopalakrishnan, Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15206, United States

Author contributions: Balasubramanian JB developed the concept, conducted the research, and prepared the first draft of the manuscript in consultation with research mentor and senior author Gopalakrishnan V; All authors contributed to writing and editing the manuscript.

Supported by National Institute of General Medical Sciences of the National Institutes of Health, No. R01GM100387.

Conflict-of-interest statement: The authors declare no conflicts of interest with respect to the submitted manuscript.

Open-Access: This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Correspondence to: Vanathi Gopalakrishnan, PhD, Associate Professor, Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Room 530, 5607 Baum Boulevard, Pittsburgh, PA 15206, United States. vanathi@pitt.edu

Telephone: +1-412-6243290 Fax: +1-412-6245310

Received: April 27, 2018
Peer-review started: April 27, 2018
First decision: July 9, 2018
Revised: July 24, 2018
Accepted: August 5, 2018
Article in press: August 5, 2018
Published online: September 14, 2018
Processing time: 140 Days and 15.9 Hours

ARTICLE HIGHLIGHTS

Research background

Biomedicine is increasingly a data-driven science, owing largely to the explosion in data, especially from the development of high-throughput technologies. Such datasets often suffer from the problem of high-dimensionality, where a very large number of candidate variables can explain the outcome variable of interest but have few instances to support any model hypothesis. In many applications, in addition to the data itself, some domain knowledge is available that may assist in the data mining process to help learn more meaningful models. It is important to develop data mining tools to leverage this available domain knowledge. However, currently, there is a dearth of data mining methods that can incorporate this available domain knowledge.

Research motivation

Developing data mining methods that can incorporate domain knowledge will help learn more meaningful models and will benefit many domains, especially the ones that suffer from data scarcity but have some domain knowledge that can assist with the data mining process (for example - biomedicine).

Research objectives

In this work, our objective was to extend a rule learning algorithm, called Bayesian rule learning (BRL), to make it capable of incorporating prior domain knowledge. BRL is a good candidate because it has been shown to be successful in application to high-dimensional biomedical data analysis tasks. We implemented such a tool, called BRL_p that has tunable priors, which means the user can control the degree of incorporation of their specified knowledge. BRL_p is a novel data mining tool that allows the user to specify their domain knowledge (including uncertain domain knowledge) and incorporates it into the model search process.

Research methods

BRL searches over a space of Bayesian belief network models (BNs) to find the optimal network and infers a rule set from that model. We implemented a way for the BN to incorporate informative priors, a distribution encoding the relative importance of each model prior to seeing the training data. This allowed BRL to incorporate user-specified domain knowledge into the data mining process called BRL_p. BRL_p has a hyperparameter λ that allows the user to adjust the degree of incorporation of their specified prior knowledge.

We evaluated BRL_p by comparing it to BRL (without informative priors) and other state-of-the-art classifiers on a simple simulated dataset, and a real-world lung cancer prognostic dataset. We measured the degree of acceptance of the specified prior knowledge with respect to the hyperparameter λ in BRL_p. We also observed the changes in predictive power using AUC.

Research results

We observed, in both the experiments with simulated data and the real-world lung cancer prognostic data that with increasing values of λ the degree of incorporation of the specified prior knowledge also increased. We also observed that specifying prior knowledge relevant to the problem dataset could sometimes help find models with better predictive performance. When BRL_p is compared to the state-of-the-art classifiers, we observed that it performed better than other interpretable models but the more complex and non-interpretable models achieved better predictive performance than BRL_p.

Research conclusions

BRL_p allows the user to incorporate their specified domain knowledge into the data mining task and allows them to control the degree of incorporation with a hyperparameter. This is a novel rule learning algorithm that we have made available to the general public via GitHub. We anticipate its use in many applications especially the ones suffering from data scarcity but have additional domain knowledge available that may assist in the data mining task.

Research perspectives

In this paper, we explored specifications of simple domain knowledge. We need to further explore the incorporation of more complex forms of knowledge. In this paper, we incorporate domain knowledge from literature. We also want to explore domain knowledge available in other sources. These future directions may motivate further developments to BRL_p.