Case Control Study Open Access
Copyright ©The Author(s) 2024. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Dec 14, 2024; 30(46): 4880-4903
Published online Dec 14, 2024. doi: 10.3748/wjg.v30.i46.4880
Multi-clustering study on the association between human leukocyte antigen-DP-DQ and hepatitis B virus-related hepatocellular carcinoma and cirrhosis in Viet Nam
Thuy Thu Nguyen, Tu Cam Ho, Van-Khanh Tran, Center for Gene and Protein Research, Hanoi Medical University, Hanoi 116177, Viet Nam
Tu Cam Ho, Institute of Virology, TUM School of Medicine, Technical University of Munich, Munich 81675, Germany
Huong Thi Thu Bui, Department of Biochemistry, Thai Nguyen University of Medicine and Pharmacy, Thai Nguyen 251540, Viet Nam
Tue Trong Nguyen, Medical Technology Department, Hanoi Medical University, Hanoi 116177, Viet Nam
Tue Trong Nguyen, Clinical Laboratory, Hanoi Medical University Hospital, Hanoi 116177, Viet Nam
ORCID number: Thuy Thu Nguyen (0000-0003-1306-6159); Tu Cam Ho (0000-0001-8239-096X); Huong Thi Thu Bui (0000-0002-4101-5618); Van-Khanh Tran (0000-0002-5059-8106); Tue Trong Nguyen (0000-0002-5986-831X).
Author contributions: Nguyen TT and Nguyen TT designed the present study; Nguyen TT received the grant for the study; Nguyen TT, Nguyen TT, Ho TC and Bui HTT performed the data collection and the experiments; Ho TC performed the data mining and hierarchical clustering analysis study; Nguyen TT, Ho TC and Nguyen TT wrote the main manuscript; Tran VK revised the manuscript. All authors read and approved the final manuscript.
Supported by National Foundation for Science and Technology Development (NAFOSTED)-Ministry of Science and Technology, Viet Nam, No. 108.02-2019.307.
Institutional review board statement: The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of Hanoi Medical University (No. HMUIRB109). Written informed consent was obtained from the subjects regarding the use of the samples and information for research purposes.
Informed consent statement: All study participants, or their legal guardians, provided informed written consent prior to study enrollment.
Conflict-of-interest statement: No conflict of interest has been declared by any of the authors impacting on the work presented in this manuscript.
Data sharing statement: The datasets used and analyzed during the current study are available from the corresponding author on reasonable request. We, however, cannot provide personal information or data containing patient identification in any form.
STROBE statement: The authors have read the STROBE Statement-a checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-a checklist of items.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Tue Trong Nguyen, DPhil, Academic Research, Chief Technician, Research Scientist, Researcher, Medical Technology Department, Hanoi Medical University, No. 1 Ton That Tung Street, Dong Da District, Hanoi 116177, Viet Nam. trongtue@hmu.edu.vn
Received: March 7, 2024
Revised: September 4, 2024
Accepted: October 16, 2024
Published online: December 14, 2024
Processing time: 258 Days and 22.3 Hours

Abstract
BACKGROUND

Human leukocyte antigen (HLA) class II molecules are cell surface receptor proteins found on antigen-presenting cells. Polymorphisms and mutations in the HLA gene can affect the immune system and the progression of hepatitis B.

AIM

To study the relation between rs2856718 of HLA-DQ, rs3077, and rs9277535 of HLA-DP, hepatitis B virus (HBV)-related cirrhosis, and hepatocellular carcinoma (HCC).

METHODS

In this case-control study, the genotypes of these single nucleotide polymorphisms (SNPs) were screened in 315 healthy controls, 471 chronic hepatitis B patients, 250 patients with HBV-related liver cirrhosis, and 251 patients with HCC using TaqMan real-time PCR. We conducted Hardy-Weinberg equilibrium and linkage disequilibrium tests on the genotype distributions of rs2856718, rs3077, and rs9277535 before hierarchical clustering analysis to build the complex interaction between the markers in each patient group.

RESULTS

The physical distance separating these SNPs was 29816 kB with the disequilibrium (D’) values ranging from 0.07 to 0.34. The close linkage between rs3077 and rs9277535 was attributed to a distance of 21 kB. The D’ value decreased from moderate in the healthy control group (D’ = 0.50, P < 0.05) to weak in the hepatic disease group (D’ < 0.3, P < 0.05). In a combination of the three variants rs2856718, rs3077, and rs9277535, the A allele decreased hepatic disease risk [A-A-A haplotype, risk ratio (RR) = 0.44 (0.14; 1.37), P < 0.05]. The G allele had the opposite effect [G-A/G-G haplotype, RR = 1.12 (1.02; 1.23), P < 0.05]. In liver cancer cases, the A-A-A/G haplotype increased the risk of HCC by 1.58 (P < 0.05).

CONCLUSION

Rs9277535 affects liver fibrosis progression due to HBV infection, while rs3077 is associated with a risk of HBV-related HCC. The link between rs2856718, rs3077, and rs9277535 and disease risk was determined using a multi-clustering analysis.

Key Words: Human leukocyte antigen; Multi-clustering study; Hepatitis B virus; Hepatocellular carcinoma; Cirrhosis

Core Tip: A significant correlation was observed between Human leukocyte antigen (HLA)-DP-DQ polymorphisms and the risk of hepatitis B virus (HBV)-related liver cirrhosis (LC) and hepatocellular carcinoma (HCC). In this study, individuals with the HLA-DP and HLA-DQ genotypes were found to have a higher prevalence of LC and HCC. The hypothesis that HLA polymorphisms play a crucial role in the progression of HBV-related liver diseases has been confirmed. Specific HLA-DP and HLA-DQ alleles were associated with the risk of HBV-related LC and HCC. By delving into the genetic underpinnings of these diseases, we highlight the potential for developing personalized medical approaches and targeted therapies for affected individuals.



INTRODUCTION

Hepatitis B virus (HBV) is the primary cause of viral hepatitis morbidity. According to estimates from the World Health Organization (2022), there were 254 million chronic hepatitis B infections, with 1.2 million new infections each year. In 2022, approximately 1.1 million deaths worldwide will be caused by HBV, mostly from liver cirrhosis (LC) and hepatocellular carcinoma (HCC)[1]. Viet Nam has been identified as one of 20 countries responsible for 75% of the global burden of viral hepatitis[2], with a high HBV prevalence in adults of approximately 10.5%[3].

Several studies have focused on identifying sensitive loci in human leukocyte antigens (HLA). HLA plays an important role in presenting viral antigens to immune cells that are responsible for killing virus-infected and cancerous cells. Genome-wide association studies (GWAS) of the HLA gene have recently shown that three single nucleotide polymorphisms (SNPs) of HLA-DP (rs3077 and rs9277535) and HLA-DQ (rs2856718) are associated with chronic HBV infection in the Japanese population[4,5]. Genotypes of these SNPs were identified in Southeast Chinese patients to examine the association of HLA-DP-DQ variants with the risk of both HBV infection and HCC development by Hu et al[6]. The results showed that HLA-DQ rs2856718 was significantly associated with a decrease in HCC risk and that HLA-DP rs9277535 and HLA-DQ rs2856718 were related to HBV clearance. HLA-DP rs3077 was significantly associated with susceptibility to persistent HBV infection and the development of HCC. Genetic variations in the HLA-DP and HLA-DQ loci may be markers of HBV clearance and the risk of HCC development[6]. Ji et al[7] indicated that the HLA-DQ rs2856718 polymorphism significantly reduced the risk of HCC in HBV-infected patients and increased the risk of cirrhosis compared to asymptomatic hepatitis B surface antigen (HBsAg) carriers plus chronic hepatitis B virus (CHB) patients. The HLA-DQ rs2856718-G allele decreases the risk of chronic HBV infection, and the risk alleles HLA-DP rs3077 and rs9277535 are associated with HBV susceptibility in Saudi Arabia[8]. Other studies have shown that the HLA-DP rs3077 polymorphism may be associated with a protective effect that increases spontaneously resolved HBV infection in the Indonesian population[9] and acts beneficially against HCC susceptibility in the Asian population[10]. The HLA-DP rs9277535 polymorphism reduces the risk of persistent HBV infection in Indonesian people[9], but enhances susceptibility to HCC in Asians[10].

Treatment with nucleotide analogs or interferon-alpha inhibits HBV replication in chronically infected patients and slows disease progression to HCC. Losing HBsAg is the goal of HBV treatment but is rarely achieved in these patients (≤ 1% per year), and lifelong treatment is often required. Chronic HBV infection is a heterogeneous disease divided into different phases based on various clinical parameters such as HBsAg, hepatitis B envelope antigen (HBeAg), HBV deoxyribonucleic acid (HBV DNA), and alanine aminotransferase (ALT) levels. The immune system is essential for controlling HBV infection. However, the parameters and combinations associated with achieving a functional cure is still not fully understood. Thus, a deeper understanding of clinical and immune parameters is required[11]. HLA genes have been identified as critical genetic factors for HBV persistence by GWAS through the regulation of adaptive immunity by presenting processed antigens for T cell recognition. Several 3’-untranslated region (3’-UTR) variants in HLA-B, HLA-G, and HLA-DQA1 have been associated with spontaneous HBsAg seroclearance after adjustment for host factors such as age, gender, ethnicity, and viral factors such as HBV genotype and viral load[12].

Machine learning (ML) techniques have demonstrated a significant potential for capturing intricate patterns. When harnessed, these patterns can greatly enhance our understanding of various diseases, leading to more precise advancements in diagnosis, personalized treatment recommendations, and accurate outcome predictions. ML models have demonstrated notable accuracy in addressing HBV-related issues, such as early detection, risk assessment for HBV, and prediction of HBsAg seroclearance[11].

In this study, we focused on multivariate statistics using several algorithms of the hierarchical clustering analysis (HCA) matrix and Pearson’s correlation in R (version 4.1.0)[13]. Chi-square statistics were computed using Yates’ correction for the continuity and generation Pyates. The bias-corrected correlation coefficient (Puncor) and Fisher’s r-to-z transformed correlation coefficient (Pfisher) are two alternative criteria. HCA was used to generate a dendrogram hierarchy of clusters. In combination with [principal component analysis (PCA), CA, MCA, FAMD, and MFA], hierarchical clustering, and partitioning clustering, particularly the K-means method, we studied the complex correlation between genetic polymorphisms of HLA-DP and HLA-DQ and the risk of HBV-related cirrhosis, and HCC in Viet Nam.

MATERIALS AND METHODS
Patients

This case-control study enrolled healthy controls and patients with chronic hepatitis B, HBV-related LC, and HBV-related HCC patients between 2020 and 2023. 315 healthy controls were recruited from among subjects who underwent routine physical examinations at Hanoi Medical University Hospital. No healthy controls had a history of hepatitis virus infection or other liver diseases. 251 HBV-infected HCC patients positive for HBsAg were obtained from the Viet Nam National Cancer Hospital. In total, 721 patients with persistent HBV infection were enrolled from the National Hospital of Tropical Diseases, Thanh Nhan Hospital, and Thai Nguyen Hospital. Of these, 471 patients had active chronic hepatitis B infection and 250 had LC. Persistent HBV infection was defined as positivity for at least six months for both HBsAg and antibodies against the hepatitis B core antigen (anti-HBc) with or without HBeAg positivity.

Genomic DNA extraction

Genomic DNA was extracted from peripheral venous blood using the Wizard® Genomic DNA Purification Kit (A1125; Promega, United States). DNA was isolated from whole blood samples using the manual protocol provided by the manufacturer. The isolated DNA was stored at –20 °C until use.

HLA gene polymorphism typing

Three SNPs (rs2856718, rs3077, and rs9277535) were genotyped using the TaqMan-Allelic discrimination method with a QuantStudio 3 real-time PCR system (Applied Biosystems, United States), and the results were analyzed using allelic discrimination software (Applied Biosystems, United States). The PCR reactions were performed in 20 µL of reaction mixture containing 50 ng of genomic DNA, 10 µL of 2X Taqman™ Fast Advanced Master Mix (Applied Biosystems, United States), and 0.5 µL of 40X SNP-specific primer/probe. The primers/probes included HLA-DP rs3077 (C_11916951_10), HLA-DP rs9277535 (C_29715274_20), and HLA-DQ rs2856718 (C_27015374_30) (Applied Biosystems). The thermal cycle conditions were followed as: hold at 95 °C for 10 minutes, then amplify for 40 cycles at 95 °C for 15 seconds and 60 °C for 1 minute.

Statistical and HCA

Statistical tests and cluster analyses were performed using R (version 4.4.1). In this correlation study, we used the effect size index d according to Cohen’s criterion, with d = ∆/σ where σ is the standard deviation (the square root of the variance) and ∆ is the influence index of the risk factor (e.g. treatment or genotype) on the population phenotype. With ∆ and the patient’s group μ1 and the control group μ2, we constructed the index ∆ = μ1 - μ2. The smaller the value of ∆, the smaller the difference, and the larger the sample size must be according to the formular: Where the two-sided confidence level is Zα/2. α is the probability of making a type I error and β is the probability of making a type II error. The power of the study is 1 – β. Cohen’s effect size d was calculated as follows: D = ∆/σ[14]. We set criterion d at a moderate level (d = 0.6), with a power of 90%, using a two-sided T test (two-sample) and a statistical significance level (sign level) of 0.05.

The distribution of genotypes between patients and controls was compared using the Chi-square test. Categorical and continuous variables were compared using the Chi-square test and student’s t-test, respectively. Data were expressed as the median (range) or number of cases. Statistical significance was defined as P < 0.05. Hardy-Weinberg equilibrium (HWE) for genotype distributions and the linkage disequilibrium (LD) of SNPs were examined. Chi-squared statistics were computed using Yates’ correction for continuity, generating Pyates. The Pearson or product-moment correlation coefficient is frequently used as the outcome measure for meta-analyses. Pearson’s method is advantageous when all or most of the nonzero parameters share the same sign. Pearson’s test proves useful in a genomic setting for screening age-related genes, which is also our objective. Two alternative criteria were a bias-corrected version of the correlation coefficient (Puncor) and Fisher’s r-to-z transformed correlation coefficient (Pfisher).

The main advantage of the HCA clustering concept is that it shows possible correlations between several factors to provide reference markers useful for diagnostic control and to improve outcome prevention. In particular, the association between genetic characteristics and clinical outcomes requires several in vitro studies; however, these have some limitations. It is critical to clean and prepare the dataset because HCA and K-means cannot operate with missing or noisy data. We combined and validated the data using K-means clustering, which provides several options for the optimal cluster number to produce a PCA cluster plot and define the principal component position. Because our data contained various types of statements, calculating the distance matrix in HCA and K-means proved challenging. We calculated the distance between each observation and the estimated cluster distances from the remaining statements. The distance between the elements can be complete, single, average, ward, squitty, or centroidal. A cluster tree was generated by computing the correlation between cophenetic distances and initial distance data. The number of clusters was determined using K-means, which calculates clustering indices and reallocates observations to the closest cluster. The K-means computation was optimized using 20 indices to construct a PCA cluster plot, which visualized the best cluster number. PCA is a dimensionality reduction method that can reduce the dimensionality of large datasets and transform them into smaller datasets that still contain most of the information in a large set[13].

Cluster analysis was used to evaluate the factors that may interact with gene variants to influence disease progression. Based on the clustering and decision tree results, we identified the first essential factors in each patient group. We maximized multiple metrics simultaneously to obtain the optimal cut-off point and present specific sensitivity and specificity values in the receiver operating characteristic (ROC) plot.

RESULTS

We set the criterion d at a moderate level (d = 0.6) with a power of 90% in a two-sided T test (two-sample), and a statistical significance level (sig. level) of 0.05. This indicates that the considered risk factor (HLA genotype) had a moderate or high level of influence on the phenotype (HCC or Cirrhosis progression). According to the formula in the Methods section and using the pwr.t.test function, the minimum expected sample size for each group was 86 (Figure 1)[14].

Figure 1
Figure 1 Graphic of sample size estimation.
Subject characteristics and evaluation of the LD

The average age of the patients and study participants in each group was similar. The liver enzyme indices ALT and aspartate aminotransferase (AST) tested in all dot groups showed significant differences between the pathological and healthy groups, as well as between the cirrhosis and remaining disease groups (Table 1). We determined the distribution of genotype or haplotype combinations in each patient/participant group (Figure 2), and found that haplotype A/G-G-G (in order of variation: Rs2856718, rs3077, and rs9277535) had the highest frequency, especially in the HCC group (19.1%, 48/250 patients) (Figure 2).

Figure 2
Figure 2 The distribution of genotype or haplotype combinations in each patient/participant group, in which haplotype A/G-G-G (in order of variation: Rs2856718, rs3077, rs9277535) had the highest frequency, especially in the hepatocellular carcinoma group (19.1%, 48/250 patients). A: The distribution of genotype or haplotype combinations following the number of patients; B: The distribution of genotype or haplotype combinations following the prevalence (%).
Table 1 Clinical characteristics of study subjects.
Variable
Healthy control (n = 315)
CHB (n = 471)
HBV-related LC (n = 250)
HBV-related HCC (n = 251)
HBV carrier (n = 972)
Age, mean ± SD52.78 ± 11.6350.43 ± 14.6754.75 ± 13.3655.99 ± 9.9152.98 ± 13.48
Age< 60237353165165
≥ 60781188586
GenderMale244367190212769
Female71 (22.5)104 (22.1)60 (24.0)39 (15.5)203 (20.9)
ALT24.92 ± 11.9387.79 ± 168.66388.67 ± 442.2173.19 ± 73.53
AST24.91 ± 7.6870.89 ± 112.36362.19 ± 401.4187.89 ± 177.25

Looking at each variation, we identified some critical markers for each specific group. For example, when comparing the risk of chronic hepatitis due to hepatitis B, we found that the AA genotype, the A allele of the rs3077 variant, and the AA genotype or the A allele of the rs9277535 variant could be protective factors with an odds ratio (OR) < 1 and P < 0.05, compared with the healthy control and hepatitis B groups in particular, or the group with liver disease in general (Table 2). In addition, we also observed that the AG of the rs3077 variant could be a protective factor against the risk of progression of hepatitis B or cirrhosis to liver cancer with OR < 1 (P < 0.05) (Table 3). The AA genotype of rs9277535 could also be a protective factor against the risk of developing cirrhosis based on chronic hepatitis B, with OR > 1 (P < 0.05) (Table 3).

Table 2 Correlation of human leukocyte antigen-DP and human leukocyte antigen-DQ polymorphisms with the risk of chronic active hepatitis B, n (%).
SNPsHealthy controlHBV carrier
Healthy control vs CHB
Healthy control vs HBV carrier
CHB
HBV-related LC
HBV-related HCC
OR (95%CI)
P value
OR (95%CI)
P value
rs2856718
GG106 (33.7)173 (36.7)100 (40.0)87 (34.7)11
AG149 (47.3)208 (44.2)97 (38.8)119 (47.4)0.855 (0.621-1.179)0.33920.838 (0.630-1.115)0.2253
AA60 (19.0)90 (19.1)53 (21.2)45 (17.9)0.919 (0.612-1.380)0.68400.923 (0.642-1.326)0.6631
AA + AG209 (66.3)298 (63.3)150 (60.0)164 (65.3)0.874 (0.648-1.179)0.37670.862 (0.660-1.127)0.2774
G allele361 (57.3)554 (58.8)297 (59.4)293 (58.4)11
A allele269 (42.7)384 (41.2)203 (40.6)209 (41.6)0.930 (0.758-1.141)0.48820.939 (0.782-1.126)0.4937
rs3077
GG133 (42.2)246 (52.2)128 (51.2)152 (60.5)11
AG139 (44.1)196 (41.6)107 (42.8)84 (33.50.762 (0.563-1.032)0.07020.704 (0.537-0.924)0.0113
AA43 (13.7)29 (6.2)15 (6.0)15 (6.0)0.365 (0.218-0.611)0.00010.347 (0.224-0.537)< 0.0001
AA + AG182 (57.8)225 (47.8)122 (48.8)99 (39.5)0.668 (0.501-0.891)0.00600.620 (0.479-0.801)0.0003
G allele405 (64.3)688 (73.0)364 (72.5)388 (77.3)11
A allele225 (35.7)254 (27.0)138 (27.5)114 (22.7)0.665 (0.535-0.826)0.00020.632 (0.521-0.765)< 0.0001
rs9277535
GG159 (50.5)275 (58.4)142 (56.8)154 (61.4)11
AG123 (39.0)187 (39.7)95 (38.0)88 (35.0)0.879 (0.651-1.186)0.39930.838 (0.640-1.097)0.1972
AA33 (10.5)9 (1.9)13 (5.2)9 (3.6)0.158 (0.074-0.338)< 0.00010.262 (0.155-0.440)< 0.0001
AA + AG156 (49.5)196 (41.6)108 (43.2)97 (38.6)0.726 (0.545-0.968)0.02900.716 (0.555-0.924)0.0102
G allele441 (70.0)737 (78.1)379 (75.8)396 (78.9)11
A allele189 (30.0)205 (21.9)121 (24.2)106 (21.1)0.649 (0.516-0.817)0.00020.667 (0.545-0.815)0.0001
Table 3 Correlation of human leukocyte antigen-DP and human leukocyte antigen-DQ polymorphisms with the risk of hepatitis B virus-related liver cirrhosis and hepatocellular carcinoma.
SNPsCHB vs HBV-related LC
CHB vs HBV-related HCC
HBV-related LC vs HBV-related HCC
OR (95%CI)
P value
OR (95%CI)
P value
OR (95%CI)
P value
rs2856718
GG111
AG0.807 (0.572-1.139)0.22191.138 (0.808-1.602)0.46011.410 (0.952-2.089)0.0865
AA0.019 (0.670-1.549)0.93071.006 (0.647-1.564)0.97950.976 (0.598-1.594)0.9224
AA + AG0.871 (0.637-1.193)0.38911.094 (0.794-1.508)0.58131.257 (0.874-1.806)0.2170
G allele111
A allele0.986 (0.791-1.230)0.90111.029 (0.826-1.282)0.79841.044 (0.811-1.342)0.7396
rs3077
GG111
AG1.049 (0.763-1.442)0.76730.694 (0.501-0.961)0.02780.661 (0.457-0.957)0.0284
AA0.994 (0.514-1.921)0.98590.837 (0.435-1.612)0.59490.842 (0.397-1.789)0.6548
AA + AG1.042 (0.767-1.416)0.79240.712 (0.522-0.972)0.03240.683 (0.479-0.974)0.0352
G allele111
A allele1.027 (0.805-1.310)0.83050.791 (0.613-1.02)0.07050.775 (0.582-1.032)0.0810
rs9277535
GG111
AG0.984 (0.715-1.354)0.92040.840 (0.610-1.159)0.28830.854 (0.591-1.235)0.4022
AA2.797 (1.168-6.702)0.02101.786 (0.694-4.593)0.22900.638 (0.265-1.539)0.3174
AA + AG1.067 (0.783-1.455)0.68140.884 (0.646-1.209)0.43930.828 (0.580-1.183)0.3001
G allele111
A allele1.148 (0.888-1.484)0.29240.962 (0.739-1.254)0.77590.838 (0.623-1.128)0.2438

It is necessary to understand whether this OR result has an objective effect of the genotype on the pathological condition or is simply due to the distribution of allele frequencies in the general population. To this end, we conducted tests on link imbalance and distribution in the collected sample group according to the Hardy–Weinberg law. The genotype distributions of three SNPs (rs2856718, rs3077, and rs9277535) met the HWE criterion between the observed and expected frequencies of each genotype in the HCC, cirrhosis, chronic hepatitis B, and control groups (P > 0.05). The physical distance separating these SNPs was 29816 kB and the D’ value ranged from 0.07 to 0.34. The disequilibrium was strong when D’ ≥ 0.80, moderate when D’ was around 0.50, and weak when D’ was 0. Significantly, the close linkage between rs3077 and rs9277535 was due to the distance of 21 kB. We found a decrease in D’ from that of the healthy controls (D’ = 0.50, P < 0.05) in the hepatic disease group (D’ < 0.3, P < 0.05), indicating moderate to weak disequilibrium (Figure 3 and Table 4). The variant rs2856718 showed a moderate LD with rs3077 (D’ from 0.27 to 0.34) but very weak with rs9277535 (D’ from 0.07 to 0.20). This may predict the independence of this SNP in disease status, except for the specific genotype of rs3077.

Figure 3
Figure 3 Linkage disequilibrium pairwise map. The physical distance separating these single nucleotide polymorphisms (SNPs) was 29,816 kb, and the D value ranges from 0.07 to 0.42. The disequilibrium is strong when D' ≥ 0.80, moderate when D' is around 0.50, and weak when D' is 0. Significantly, the closed linkage between rs3077 and rs9277535 is due to the distance of 21kb. We found decreasment of D' from the healthy control (D’ = 0.50, P < 0.05) to the hepatic diseases group (D' < 0.3, P < 0.05), which means from the moderate to the weak disequilibrium. The variant rs2856718 showed a moderate linkage disequilibrium with rs3077 (D’ from 0.27 to 0.34) but very weak with rs9277535 (D’ from 0.07 to 0.20). This might predict the independence of this SNP to disease status except when a specific genotype of rs3077 influenced it.
Table 4 Pairwise linkage disequilibrum results.
Hepatitis
rs3077
rs9277535
Cirrhosis
rs3077
rs9277535
HCC
rs3077
rs9277535
Healthy Control
rs3077
rs9277535
rs2856718D-0.038-0.006rs2856718D-0.032-0.012rs2856718D-0.028-0.015rs2856718D-0.041-0.025
D'0.3400.069D'0.2880.117D'0.2980.166D'0.2720.198
Corr.(R)-0.173-0.030Corr.(R)-0.146-0.055Corr.(R)-0.136-0.073Corr.(R)-0.175-0.112
R20.0300.001R20.0210.003R20.0190.005R20.0310.013
χ228.1680.874χ210.6911.498χ29.3522.651χ219.2607.903
P value< 0.00010.350P value0.0010.221P value0.0020.103P value< 0.00010.005
n471471n250250n251251n315315
rs3077D0.072rs3077D0.074rs3077D0.066rs3077D0.109
D'0.450D'0.419D'0.405D'0.565
Corr.(R)0.391Corr.(R)0.385Corr.(R)0.387Corr.(R)0.496
R20.153R20.148R20.150R20.246
χ2143.729χ274.223χ275.122χ2155.016
P value< 2.2 × 10-16P value< 2.2 × 10-16P value< 2.2 × 10-16P value< 2.2 × 10-16
n471n250n251n315

We conducted a risk ratio (RR) test to better understand the disease risk of the genotype or haplotype combinations of the three variants. Detailed results are presented in Table 4 and Figure 4. The G-A/G-G haplotype [RR = 1.31 (1.05; 1.64), P < 0.01] increased the risk of chronic hepatitis B by 31% when the group of patients with chronic hepatitis B was compared with the other groups and the healthy control group (Figure 4A). This haplotype also increased the risk of overall liver disease by 12% compared to healthy individuals, with RR = 1.12 [1.02; 1.23] (P < 0.05) (Figure 4D). In cases of liver cancer, haplotype A-A-A/G increased the risk of liver cancer by 1.58 times RR = 2.58 (1.15; 5.79), P < 0.05 (Figure 4B). The A/G-A-A, A-A-A, A/G-A/G-A, G-A-A, and A-A/G-G haplotypes significantly reduced the risk of liver disease from 18% [RR = 0.82 (0.63; 1.08)] to 71% [RR = 0.29 (0.09; 0.99)] (P < 0.01 and P < 0.05) (Figure 4D) (Table 5).

Figure 4
Figure 4 Forest plot show the result of risk ratio test of the haplotype or genotype combination of the three variants (in order of variation: rs2856718, rs3077, rs9277535). A: Chronic hepatitis B group; B: Hepatitis B virus (HBV)-related hepatocarcinoma group; C: HBV-related liver cirrhosis; D: Healthy control.
Table 5 Risk ratio of haplotype rs2856718-rs30277-rs9277535.
Haplotype: Rs2856718, rs30277, rs9277535
A/G_A/G_A
A/G_A/G_A/
A/G_A/G_G
A/G_A_A
A/G_A_A/G
A/G_A_G
A/G_G_A
A/G_G_A/G
A/G_G_G
A_A/G_A
A_A/G_A/G
A_A/G_G
A_A_A
A_A_A/G
A_A_G
A_G_A
A_G_A/G
A_G_G
G_A/G_A
G_A/G_A/G
G_A/G_G
G_A_A
G_A_A/G
G_A_G
G_G_A
G_G_A/G
G_G_G
HepatitisN25130NA43NA358311711121NA1245137464113NA1853
%0.410.86.4NA0.80.6NA7.417.60.23.62.30.20.40.2NA2.59.60.27.99.80.82.30.6NA3.811.3
Pfisher1.15 × 10-013.89 × 10-013.24 ×10-011.53 × 10-022.96 × 10-013.78 × 10-011.52 × 10-012.85 × 10-011.95 × 10-015.00 × 10-011.54 × 10-014.27 × 10-012.12 × 10-015.00 × 10-015.00 × 10-012.68 × 10-012.83 × 10-013.86 × 10-018.36 × 10-023.38 × 10-011.47 × 10-021.59 × 10-014.24 × 10-015.00 × 10-015.00 × 10-013.19 × 10-013.60 × 10-01
Pyates1.27 × 10-014.18 ×10-013.18 × 10-012.62 × 10-022.97 × 10-013.71 × 10-011.58 × 10-013.30 × 10-012.05 × 10-015.00 × 10-011.88 × 10-014.69 × 10-012.77 × 10-015.00 × 10-015.00 × 10-013.67 × 10-013.65 × 10-013.97 × 10-011.06 × 10-013.64 × 10-011.78 × 10-021.91 × 10-015.00 × 10-015.00 × 10-015.00 × 10-013.73 × 10-013.76 × 10-01
Puncor7.49 × 10-023.82 × 10-012.79 × 10-011.11 × 10-022.11 × 10-012.60 × 10-016.40 × 10-022.90 × 10-011.83 × 10-013.15 × 10-011.45 × 10-013.96 ×10-011.55 × 10-014.34 × 10-013.47 × 10-011.41 × 10-012.94 × 10-013.61 × 10-015.56 × 10-023.25 × 10-011.35 × 10-021.30 × 10-014.40 × 10-014.79 × 10-012.24 × 10-013.15 × 10-013.42 × 10-01
RR0.451.040.92NA0.730.74NA1.081.090.681.230.940.450.911.37NA1.130.960.30.941.310.641.041.02NA1.10.95
Lower0.130.820.68NA0.310.28NA0.830.910.120.860.580.080.290.34NA0.730.750.050.721.050.270.650.42NA0.760.76
Upper1.611.31.24NA1.691.96NA1.411.313.731.771.522.722.835.48NA1.761.221.921.231.641.511.662.51NA1.581.2
HCCN2231813222048NA5NANA31NA531218161311837
%0.89.27.20.41.20.80.8819.1NA2NANA1.20.4NA212.40.87.26.40.41.20.40.43.214.7
Pfisher5.00 × 10-012.46 × 10-014.45 × 10-015.00 × 10-015.00 × 10-015.00 × 10-018.65 × 10-022.44 × 10-011.08 × 10-015.00 × 10-012.04 × 10-016.15 × 10-043.02 × 10-014.62 × 10-021.76 × 10-015.00 × 10-015.00 × 10-017.96 × 10-023.45 × 10-012.63 × 10-012.53 × 10-011.10 × 10-011.70 × 10-015.00 × 10-019.75 × 10-025.00 × 10-015.09 × 10-02
Pyates5.00 × 10-012.58 × 10-014.84 × 10-014.15 × 10-015.00 × 10-015.00 × 10-011.81 × 10-012.76 × 10-011.14 × 10-013.62 × 10-012.14 × 10-014.75 × 10-032.44 × 10-018.48 × 10-024.22 × 10-015.00 × 10-014.71 × 10-019.65 × 10-025.00 × 10-012.73 × 10-012.44 × 10-011.32 × 10-011.53 × 10-014.79 × 10-012.21 × 10-014.58 × 10-016.16 × 10-02
Puncor4.02 × 10-012.22 × 10-014.29 × 10-012.62 × 10-014.81 × 10-014.56 × 10-016.16 × 10-022.32 × 10-019.65 × 10-021.62 × 10-011.58 × 10-012.40 × 10-031.13 × 10-012.94 × 10-021.38 × 10-012.43 × 10-013.78 × 10-017.79 × 10-024.18 × 10-012.32 × 10-012.05 × 10-017.68 × 10-021.04 × 10-013.08 × 10-012.11 × 10-023.83 × 10-014.94 × 10-02
RR0.850.861.040.571.030.932.581.171.21NA0.67NANA2.582.57NA0.881.281.140.850.830.30.520.645.140.911.3
Lower0.240.580.680.090.370.260.960.780.91NA0.29NANA1.150.64NA0.390.920.330.550.520.040.180.14.60.480.96
Upper3.041.271.63.622.843.286.911.741.59NA1.52NANA5.7910.32NA1.971.773.891.321.312.011.544.015.751.721.76
CirrhosisN12119121117362991NANA162422220461NA1232
%0.48.47.60.40.80.40.46.814.40.83.63.60.4NANA0.42.49.60.88.881.62.40.4NA4.812.8
Pfisher2.40 × 10-011.26 × 10-013.38 × 10-015.00 × 10-013.74 × 10-013.51 × 10-012.89 × 10-015.00 × 10-011.96 × 10-018.59 × 10-022.66 × 10-011.27 × 10-015.00 × 10-013.02 × 10-015.00 × 10-011.75 × 10-014.07 × 10-014.53 × 10-013.45 × 10-013.99 × 10-013.95 × 10-013.78 × 10-014.07 × 10-015.00 × 10-015.00 × 10-011.24 × 10-012.92 × 10-01
Pyates2.71 × 10-011.39 × 10-013.68 ×10-014.17 ×10-013.93 × 10-013.13 × 10-015.00 ×10-015.00 × 10-011.97 × 10-011.80 × 10-013.21 × 10-011.51 × 10-015.00 × 10-012.46 × 10-015.00 × 10-014.21 × 10-015.00 × 10-014.66 × 10-015.00 × 10-014.28 × 10-014.51 × 10-014.51 × 10-015.00 × 10-014.81 × 10-015.00 × 10-011.45 × 10-013.17 × 10-01
Puncor1.65 × 10-011.15 × 10-013.17 × 10-012.63 × 10-012.74 × 10-011.92 × 10-013.89 × 10-014.68 × 10-011.71 × 10-016.08 × 10-022.50 × 10-011.04 × 10-014.32 ×10-011.14 × 10-012.44 × 10-011.37 × 10-014.31 × 10-014.19 × 10-014.16 × 10-013.78 × 10-013.99 × 10-013.33 × 10-014.31 × 10-013.10 × 10-013.12 × 10-011.06 × 10-012.80 × 10-01
RR0.430.781.110.570.680.471.290.980.862.591.231.460.86NANA2.581.070.961.151.061.061.211.070.64NA1.391.1
Lower0.070.520.730.090.190.070.240.630.620.960.690.830.14NANA0.640.520.660.340.720.70.510.520.1NA0.850.79
Upper2.81.181.683.632.53.037.061.531.186.942.22.585.15NANA10.362.191.43.911.571.592.882.194.03NA2.291.54
Healthy controlN7402276511744171241NA162843016893NA729
%2.212.772.21.91.60.35.4140.32.23.81.30.3NA0.31.98.91.39.55.12.52.91NA2.29.2
Pfisher6.13 × 10-037.00 × 10-025.00 × 10-015.62 × 10-041.11 × 10-017.52 × 10-025.00 × 10-011.26 × 10-019.50 × 10-025.00 × 10-012.24 × 10-014.77 × 10-021.73 × 10-025.00 ×10-015.00 × 10-012.15 × 10-014.14 × 10-012.58 × 10-011.17 × 10-012.05 × 10-012.55 × 10-022.15 × 10-021.94 × 10-012.07 × 10-015.00 × 10-011.08 × 10-016.52 × 10-02
Pyates8.12 × 10-038.59 × 10-025.00 ×10-014.14 × 10-041.35 × 10-011.01 × 10-015.00 × 10-011.37 × 10-011.05 × 10-015.00 × 10-012.45 × 10-016.34 × 10-022.66 × 10-025.00 × 10-015.00 × 10-014.93 × 10-013.97 × 10-012.70 × 10-011.56 × 10-012.18 × 10-013.36 × 10-022.90 × 10-022.70 × 10-013.27 × 10-015.00 × 10-011.07 × 10-016.65 × 10-02
Puncor3.06 × 10-037.05 × 10-024.78 × 10-019.49 × 10-057.98 × 10-025.20 × 10-024.90 × 10-011.11 × 10-019.04 × 10-024.90 × 10-011.89 × 10-014.13 × 10-027.99 × 10-033.28 × 10-012.10 × 10-012.00 × 10-013.16 × 10-012.35 × 10-018.10 × 10-021.85 × 10-012.55 × 10-021.46 × 10-022.03 × 10-011.95 × 10-012.85 × 10-017.83 × 10-025.44 × 10-02
RR0.550.921.000.290.790.720.991.081.060.991.080.820.441.101.320.661.051.040.730.951.120.700.910.831.321.121.08
Lower0.280.820.880.090.520.420.560.970.980.560.930.630.140.771.280.170.870.940.410.841.020.450.710.481.280.990.99
Upper1.071.041.130.991.201.241.751.201.141.751.261.081.371.581.372.651.271.141.321.071.231.091.171.421.371.281.18
HCA

Cluster analysis was used to understand the factors that may interact with gene variants to influence disease progression. Pearson’s correlation was tested for each group. The results are shown in Figures 5A, 6A, 7A, and 8A for the hepatitis B, cirrhosis, liver cancer, and healthy control groups, respectively. In all groups, we saw a significant positive correlation of the two variants rs3077-rs9277535 with R from 0.5 (in the healthy control group) to 0.3 in the disease groups (P < 0.01). A weak, stable negative correlation was found in two pairs of rs2856718-rs3077 mutations with R from –0.17 to –0.12 (P < 0.001) (Figures 5A, 6A, 7A, and 8A).

Figure 5
Figure 5 Hierachical clustering anaslysis result in chronic hepatitis B patient group. A: Correlation heatmap; B: Dendrogram; C: Principal component analysis map. ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALB: Albumin; GGT: Gamma glutamyl transferase.
Figure 6
Figure 6 Hierachical clustering anaslysis result in hepatitis B virus-related liver cirrhosis patient group. A: Correlation heatmap; B: Dendrogram; C: Principal component analysis map. ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALB: Albumin; GGT: Gamma glutamyl transferase; AFP: Alpha-fetoprotein.
Figure 7
Figure 7 Hierachical clustering anaslysis result in hepatitis B virus-related hepatocarcinoma patient group. A: Correlation Heatmap; B: Dendrogram; C: Principal component analysis map. ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALB: Albumin; GGT: Gamma glutamyl transferase; AFP: Alpha-fetoprotein.
Figure 8
Figure 8 Hierachical clustering anaslysis result in healthy control group. A: Correlation heatmap; B: Dendrogram; C: Principal component analysis map. ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALB: Albumin; GGT: Gamma glutamyl transferase.

To identify the most important factors in each patient group, we constructed a decision tree based on the results of the Pearson correlation test. Hierarchical clustering grouped similar objects in a dendrogram, iteratively merging similar clusters, starting with each data point as a separate cluster. This creates a tree-like structure that shows the relationships between clusters and their hierarchy. The height represents the distance between clusters, and the optimal number of clusters is where the height changes the most. The positions of the clades and leaves provide information regarding which element is closest to the root, which is an important factor in determining the following threshold values. This was ordered from 1 (the leaf closest to the root) in the dendrogram (Figures 5B, 6B, 7B, and 8B).

PCA plots help to better model the relationships between biomarkers. PCA allowed us to summarize and visualize information in a dataset containing individuals/observations described by multiple intercorrelated quantitative variables. Each variable can be considered as a different dimension. We selected the results from 20 different index values, where the cluster number could distinguish the clusters and appropriately avoid overlaps. We applied K-means cluster testing with ML algorithms to obtain the optimal number of clusters with the Euclidean distance as a measurement. This chart clearly shows that closely related markers were in the same cluster (Figures 5C, 6C, 7C, and 8C). The dendrogram and PCA map completed all our databases, in which the studied variants could combine and be necessary for the outcomes.

Cut-off point and RR

Based on the results of the cluster analysis and decision trees, we identified the most important first factors in each patient group. For example, sex and albumin (ALB) concentration were the most important factors in a hepatitis B patient group (Figure 9). In the liver cancer group, cirrhosis status, tumor status (size and differentiation), ALB concentration, sex, and age were the primary factors monitored (Figure 10). In the cirrhosis group, the genotype of the rs2856718 variant, age, international normalized ratio, gamma glutamyl transferase (GGT) concentration, and the Child-Pugh score were considered the most important factors. Running bootstrapping on two genotype groups (AG vs GG/AA) provided the best accuracy (acc = 0.56) for an optimal age cut-off of 52 years (Figure 11). We maximized multiple metrics simultaneously to obtain the optimal cut-off point and presented specific sensitivity and specificity values in the ROC plot, their RR in the dot and forest plot (ROC plot Supplementary Figures 1-7 with their related dot and forest plots Supplementary Figures 8-10, ROC plot Supplementary Figures 11-18 with their related dot and forest Supplementary Figures 19-28, and ROC plot Supplementary Figures 29-40 with their related dot and forest plots Supplementary Figures 41 and 42).

Figure 9
Figure 9 Diagram of optimal cut point in chronic hepatitis B patient group. ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALB: Albumin; GGT: Gamma glutamyl transferase.
Figure 10
Figure 10 Diagram of optimal cut point in hepatitis B virus-related hepatocarcinoma patient group. ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALB: Albumin; AFP: Alpha-fetoprotein.
Figure 11
Figure 11 Diagram of optimal cut point in hepatitis B virus-related liver cirrhosis patient group. ALT: Alanine aminotransferase; AST: Aspartate aminotransferase; ALB: Albumin; GGT: Gamma glutamyl transferase; AFP: Alpha-fetoprotein; INR: International normalized ratio; PLT: Platelet count.

To better understand the diagnostic significance of genetic variants, we examined the RR of the haplotype with each directly related factor (based on the results of cluster analysis and PCA map) and cut-off points. Specifically, in the female hepatitis patient group, the AG/GG/GG haplotype reduced the risk of total bilirubin greater than 7.08 μmol/L by about 30% [RR = 0.72 (0.49; 1.06), P < 0.05]. In this group, the AA/GG/GG haplotype reduced the risk of alpha-fetoprotein (AFP) > 33.11 U/mL by 56% with RR = 0.44 (0.20; 0.99), P < 0.01; the GG/AG/AG haplotype reduced the risk of chronic hepatitis from the age of 31 years by 40% with RR = 0.63 (0.31; 1.28), P < 0.05. The AG/AG/GG haplotype was 40% more frequent in women with hepatitis B than that in male patients [RR = 0.76 (0.56; 1.02), P < 0.05] (Table 6).

Table 6 The risk ratio results of haplotype rs2856718-rs30277-rs9277535 following the preclinical factors.
Preclinical factors
Groups
Female
Male
All
Total bilirubin (µmol/L); Normal range: < 17 µmol/LHepatitisAG/GG/GG
RR = 0.72 (0.49; 1.06)
Cutpoint = 7.08 µmol/L
Nupper/Nlower = 10/6
Pfisher = 0.017, Pyates = 0.027, Puncor = 0.011
HCCAG/GG/GG
RR = 1.45 (1.05; 1.99)
Cutpoint = 43.6 µmol/L
Nupper/Nlower = 25/19
Pfisher = 0.02, Pyates= 0.03, Puncor = 0.02
AFP (IU/mL); Normal range: < 5.8 IU/mLHepatitisAA/GG/GG
RR = 0.44 (0.20; 0.99)
Cutpoint = 33.11 U/mL
Nupper/Nlower = 4/7
Pfisher = 0.008, Pyates = 0.008, Puncor = 0.002
ALB (g/L); Normal range: 35-50 g/LHCCGG/AG/AG
RR = 2.20 (1.24; 3.92)
Cutpoint = 43.6 g/L
Nupper/Nlower = 8/10
Pfisher = 0.016, Pyates = 0.018, Puncor = 0.008
Direct bilirubin (µmol/L); Normal range: < 4.3 µmol/LHCCGG/AG/GG
RR = 1.57 (1.09; 2.25)
Cutpoint = 5.13 µmol/L
Nupper/Nlower = 11/5
Pfisher = 0.034, Pyates = 0.046, Puncor = 0.026
Tumor size (cm)HCCGG/AA/AG AA/GG/AG
RR = 2.2 (1.94; 2.56)RR = 2.25 (1.96; 2.59)
Cutpoint = 5 cmCutpoint = 5 cm
Nupper/NloweRR = 5/0Nupper/NloweRR = 3/0
Pfisher = 0.009, Pyates = 0.02, Puncor = 0.006Pfisher = 0.046, Pyates = 0.092, Puncor = 0.028
ALT (U/L); Normal range: < 40 U/L at 37 oCHCCGG/AG/AG
RR = 0.74 (0.49; 1.13)
Cutpoint = 28.8 U/L
Nupper/Nlower = 10/8
Pfisher = 0.048, Pyates = 0.068, Puncor = 0.038
AST (U/L); Normal range: < 37 U/L at 37 oCHCCGG/GG/GGGG/GG/GG
RR = 0.52 (0.25; 1.10)RR = 0.53 (0.27; 1.06)
Cutpoint = 58.9 U/LCutpoint = 58.9 U/L
Nupper/Nlower = 6/26Nupper/Nlower = 7/30
Pfisher = 0.034, Pyates = 0.043, Puncor = 0.028Pfisher = 0.029, Pyates = 0.036, Puncor = 0.024
Age (years old)HepatitisGG/AG/AG
RR = 0.63 (0.31; 1.28)
Cutpoint = 31 years old
Nupper/Nlower = 3/2
Pfisher = 0.013, Pyates = 0.0086, Puncor = 0.0004
CirrhosisAA/AG/AG
RR = 0.38 (0.11; 1.31)
Cutpoint = 52 years old
Nupper/Nlower = 2/7
Pfisher = 0.023, Pyates = 0.038, Puncor = 0.018
HCCGG/GG/GG
RR = 1.63 (0.98; 2.71)
Cutpoint = 63 years old
Nupper/Nlower = 13/24
Pfisher = 0.045, Pyates = 0.055, Puncor = 0.035
Gender (Female, male)HepatitisAG/AG/GG
RR = 0.76 (0.56; 1.02)
Cutpoint = Male
Nmale/Nfemale = 18/12
Pfisher = 0.03, Pyates = 0.01, Puncor = 0.007
HCCAA/GG/GG AG/AG/GG
RR = 0.74 (0.567; 0.96)RR = 1.20 (1.13; 1.27)
Cutpoint = MaleCutpoint = Male
Nmale/Nfemale = 12/11Nmale/Nfemale = 18/0
Pfisher = 0.001, Pyates = 0.001, Puncor = 0.0005Pfisher = 0.04, Pyates = 0.06, Puncor = 0.02
GGT (U/L); Male: 11-50 U/L at 37 oC; Female: 7-32 U/L at 37 oCCirrhosisAG/GG/GG
RR = 1.71 (1.09; 2.70)
Cutpoint = 181.97 U/L
Nupper/Nlower = 15/21
Pfisher = 0.02, Pyates = 0.024, Puncor = 0.014

In the group of male liver cancer patients, the AG/GG/GG haplotype increased the risk of total bilirubin > 43.6 μmol/L by 45% [RR = 1.45 (1.05; 1.99), P < 0.05]. The GG/AG/AG haplotype increased the risk of ALB > 43.6 g/L by 1.2 times [RR = 2.20 (1.24; 3.92), P < 0.05] and reduced the risk of ALT > 28.8 U/L by 30% [RR = 0.74 (0.49; 1.13), P < 0.05]. The GG/GG/GG haplotype reduced the risk of AST > 58.9 U/L by 50% [RR = 0.52 (0.25; 1.10), P < 0.05]. In the HCC group, the GG/AA/AG and AA/GG/AG haplotypes increased the risk of tumors > 5 cm by 1.2 times [RR = 2.23 (1.94; 2.56) and 2.25 (1.96; 2.59), P < 0.05]. Haplotype GG/AG/GG increased the risk of direct bilirubin > 5.13 μmol/L by 43% [RR = 1.57 (1.09; 2.25), P < 0.05]. The GG/GG/GG haplotype reduced the risk of AST > 58.9 U/L by 50% [RR = 0.53 (0.27; 1.06), P < 0.05]. This haplotype was associated with an increased risk of liver cancer from the age of 63 years by 63% [RR = 1.63 (0.98; 2.71), P < 0.05]. The AA/GG/GG haplotype was 26% less frequent in male patients with liver cancer than that in female patients, whereas the AG/AG/GG haplotype was 20% more frequent in male patients (Table 6).

In patients with cirrhosis, the AA/AG/AG haplotype reduced the risk of HBV-infected patients developing cirrhosis from the age of 52 years by 60% [RR = 0.38 (0.11; 1.31), P < 0.05] (Table 6, Supplementary Figure 41). The AG/GG/GG haplotype increased the risk of GGT > 181.97 U/L by 64% [RR = 1.71 (1.09; 2.70), P < 0.05] (Table 6).

DISCUSSION

In clinical and biological studies, effect size, which is the difference between two or more groups, is important. Statistical significance is necessary but not sufficient to conclude whether the relationship or effect is real. The standard deviation determines the stability of statistical data around the mean value. The lower the standard deviation, the greater the stability of the data, and the smaller the fluctuation around the mean value. The higher the standard deviation, the smaller the stability of the data, and the greater the fluctuation around the mean value. The number of patient samples obtained from each group ensured that our initial hypothesis was met, with a Cohen's d value of 0.6, indicating that the genotypes of HLA variants may moderate the progression of liver disease (HCC and cirrhosis). Subsequent analyses confirmed this hypothesis.

Association between combinations of SNPs and susceptibility to HBV

HLA class I expression is higher in HCC cells than in normal hepatic cells[15]. We also observed a clear influence of HLA variants in our study, which belong to HLA class II, on the development of liver diseases, particularly liver cancer and cirrhosis. Multivariate analysis revealed that the three alleles had suppressive or additive effects on the risk of developing liver cancer or cirrhosis due to hepatitis B. We began with a LD test to determine whether there was an influence at the physical level, that is, the location of the variants on the chromosome, on the risk of developing the disease. Combinations of two or three SNPs revealed significant associations between HLA variants and susceptibility to HBV-related diseases. Maximal LD occurs when haplotypes have a null frequency, and intermediate levels of LD imply recombination between markers or recurrent mutations. For each marker pair, the strength of association (R2) was measured using a standard contingency table χ2 test. R2 > 0.5 indicates a high association, 0.3-0.5 a moderate level, 0.1-0.3 a low association, and 0-0.1 no association. In this study, all R2 values were very low, indicating that these haplotypes were rare, their distribution followed the Hardy-Weinberg equilibrium, and their LD moderately depended on the disease or phenotype status. (D’ was from 0.27 in disease cases to 0.56 in healthy control) (Figure 3 and Table 4).

Variants of the HLA-DP locus have been studied and have shown a link with HBV infection in Asian populations. The most significant alleles were the A alleles of HLA-DPA1 rs3077 and HLA-DPB1 rs9277535, which confer a decreased risk of HBV infection in Caucasian[16]. In the GWAS study in 2011 from the Japanese group, a significant association of two SNPs (rs2856718 and rs7453920) within the HLA-DQ locus was found (overall P value of 5.98 × 10-28 and 3.99 × 10-37). The association of CHB with rs2856718 and rs7453920 remained significant even after stratification with rs3077 and rs9277535, indicating independent effect of HLA-DQ variants on CHB susceptibility (P value of 1.52 × 10-21 – 2.38 × 10-30). In the Japanese cohort, the SNP rs2856718 was located in a LD block, including HLA-DQB1 and HLA-DQA1 genes (R2 = 0.1, D’ = 0.73). Similar to HLA-DPs, HLA-DQs are highly polymorphic, especially in exon two, which encodes an antigen-binding site[5]. The SNP rs9277535 in 3’-UTR of the HLA-DPB1 gene (approximately 22 kB from rs3077) augmented the risk of chronic hepatitis B and reduced the production of HLA-DPB1 (P = 10-15). Only weak LD was observed between rs9277535 and rs3077 in Europeans (R2 = 0.09, D’ = 0.50; HapMap CEU) and Asians (R2 = 0.24, D’ = 0.54), suggesting that the effects of these SNPs are likely independent[17]. The LD results indicated that the relationships between SNPs in this genomic region were comparable between Asian and European ancestry groups, even though the allele frequencies differed[17]. These findings indicate that variants in the HLA-DP-DQ antigen-binding regions contribute to the risk of persistent HBV infection.

The synergistic effect of variants on phenotype

After separating each variant, we determined how each genotype affected the risk of disease development (Tables 2 and 3). When combining the variants, we observed that the genotypes of two SNPs, rs3077 and rs9277535, were AA, and the risk of liver disease was reduced by 70% [RR = 0.29 (0.09; 0.99), P < 0.001] (Table 5). The rs3077 and rs9277535 variants influence the immune response by affecting the binding of HLA class II to CD4+ T cells. A significant positive correlation was observed between the SNP rs3077 and rs9277535 in HLA-DP and Systemic Lupus Erythematosus susceptibility. The rs3077 polymorphism was significantly correlated to IL-17, INF-γ, and cutaneous vasculitis. Additionally, carriers of the rs3077-AA genotype showed lower concentrations of inflammatory cytokines than those of the other two genotypes[18]. The rs3077-G allele was associated with a higher risk of chronic hepatitis B infection and decreased HLA-DPA1 expression, consistent with an additive genetic model. The SNPs rs3077 and rs9277535 situated in 3’-UTR regions of HLA-DPA1 and HLA-DPB1 are good candidates for allelic expression imbalance testing, suggesting that the mRNA expression associations found in the human liver cohort may change between the non-risk A allele and others. For both SNPs, the proportion of the non-risk A allele was significantly higher in the cDNA than in the DNA of heterozygous samples, indicating reduced expression of the risk alleles. These SNPs were highly associated with chronic HBV infection. Together, these independent results strongly implicate lower levels of HLA-DPA1 and HLA-DPB1 expression with an increased risk of chronic HBV infection[17].

HLA-DQ rs2856718, HLA-DP rs3077, and HLA-DP rs9277535 are significantly associated with a decreased risk of HCC and an increased risk of LC[7]. Some studies have shown that SNP rs2856718 differentially affects LC and HCC risk in chronic HBV-infected patients with HBV mutations[7]. In our study, we showed the effect of this phenotypic variant on the genotypes of rs3077 and rs9277535. The more A allele in the haplotype, the lower the risk of disease (hepatitis, cirrhosis, or HCC-related HBV) [A-A-A haplotype, RR = 0.44 (0.14; 1.37), P < 0.05]. In contrast, the more G allele in the haplotype, the greater the risk of hepatic disease [G-A/G-G haplotype, RR = 1.12 (1.02; 1.23), P < 0.05] (Table 5).

Variations are only meaningful for diagnosis in specific cases

To better understand when genetic factors are essential for diagnosis, all relevant risk factors must be considered and arranged in the specific context of each disease group. By applying the K-means cluster analysis strategy and generating the Euclidean distance, we can better understand the above relationships by repeating this process until there is no further movement between clusters[19]. The ROC curve shows the clinical sensitivity and specificity of every possible cut-off value. An ROC curve plots the actual positive rate (sensitivity) and false-positive positive (100-Specificity) for the cut-off points. The points on the ROC curve represent the sensitivity/specificity pair for a particular decision threshold. This imputation also provides the accuracy, determines the percentage of correct predictions made by the model, and measures how well a single model performs. This is one of the most commonly used metrics.

Significant differences were observed in the optimal cut-off value for each patient group and the average value in healthy people. In this study, we discuss this trait when combined with different genotypes of the three variant combinations. In the hepatitis patient group, the AG/GG/GG haplotype reduced the risk of total bilirubin greater than 7.08 μmol/L by about 30% [RR = 0.72 (0.49; 1.06), P < 0.05]. In the group of liver cancer patients, the AG/GG/GG haplotype increased the risk of total bilirubin > 43.6 μmol/L by 45% [RR = 1.45 (1.05; 1.99), P < 0.05]. The normal range of total bilirubin was < 17 μmol/L. The GG/AG/GG haplotype increased the risk of direct bilirubin > 5.13 μmol/L by 43% [RR = 1.57 (1.09; 2.25), P < 0.05], while the normal level was inferior (4.3 μmol/L). These may be important markers that indicate liver damage status based on bilirubin levels. High bilirubin levels indicate an issue in the liver or gallbladder. For example, high levels of direct bilirubin may indicate that the liver does not properly clear bilirubin. Therefore, these results are indicative of liver damage or disease.

In the female hepatitis patient group, the AA/GG/GG haplotype reduced the risk of AFP > 33.11 IU/mL by 40% [RR = 0.44 (0.20; 0.99), P < 0.01]. An AFP level between 0 and 40 ng/mL or < 58 IU/mL is within the normal range for adults. A very high level of AFP in the blood (> 400 ng/mL or 332 IU/mL) is a sign of a liver tumor. We found that the optimal cut-off point of AFP in our patients was 436.5 IU/mL (log10AFP = 2.64) (Supplementary Figure 14). In the liver cancer group, the AA/GG/GG haplotype was much higher in female patients (28%, 11/39 patients) than in male patients (9.4%, 20/212 patients) [RR = 0.74 (0.56; 0.96), P < 0.01]. However, the AG/AG/GG haplotype was 20% more frequent in male patients (Supplementary Figure 25). The AG/AG/GG haplotype was 40% more frequent in women with hepatitis B than in male patients [RR = 0.76 (0.56; 1.02), P < 0.05] (Supplementary Figure 10).

The GG/AG/AG haplotype reduced the risk of chronic hepatitis from the age of 31 years to 40% [RR = 0.63 (0.31; 1.28) (P < 0.05) in female patients (Table 6). The GG/GG/GG haplotype was associated with an increased risk of liver cancer from the age of 63 years by 63% [RR = 1.63 (0.98; 2.71), P < 0.05]. In patients with cirrhosis, the AA/AG/AG haplotype reduced the risk of HBV-infected patients developing cirrhosis from the age of 52 years by 60% [RR = 0.39 (0.11; 1.31), P < 0.05] (Supplementary Figure 29, Supplementary Figure 41). The role of HLA in longevity may be challenging owing to significant methodological problems, such as serological and molecular typing of different loci, sample sizes, inclusion criteria, and age cut-off. However, within this complex scenario, some data emerge; on the whole, the (sex-specific) link of longevity with alleles or haplotypes of several genes may be the risk factors for a variety of diseases (cardiovascular diseases, cancer), including HLA alleles and haplotypes, were not unexpected based on previous studies on the genetics of longevity in centenarians[20].

In the liver cancer group, the GG/GG/GG haplotype reduced the risk of AST > 58.9 U/L by 50% [RR = 0.53 (0.27; 1.06); P < 0.05]. The GG/AA/AG and AA/GG/AG haplotypes increased the risk of tumor size > 5 cm by 1.2 times [RR = 2.23 (1.94; 2.56) and 2.25 (1.96; 2.59), P < 0.05]. In the cirrhotic group, The AG/GG/GG haplotype increased the risk of GGT > 181.97 U/L by 71% [RR = 1.71 (1.09; 2.70), P < 0.05]. The variant rs2856718 showed its effect on the age of patients with the best accuracy (0.58) when we ran the resampling in two groups having AG and AA/GG genotypes, as shown by the haplotype of three variants; the AA/AG/AG haplotype reduced the risk of HBV/infected patients developing cirrhosis from the age of 52 years by 60% [RR = 0.38 (0.98; 1.31), P < 0.05] (Table 6). HLA linkages with clinical expression and outcomes in the King’s College cohort showed that the baseline AST levels were most elevated in those expressing HLA- DRB1*03 and DRB1*07 (type 1 and 2 autoimmune hepatitis susceptibilities, respectively) compared with those expressing HLA-DRB1*13 (susceptibility to juvenile autoimmune sclerosing cholangitis). HLA-DRB1*13 cases showed increased baseline levels of alkaline phosphatase and GGT. This is indirect evidence that HLA genotypes contribute to disease outcomes. Further evidence of the role of HLA was the fact that more significant histological inflammation and fibrosis appeared in children with either the HLA-DRB1*03 or DRB1*13 genotypes compared with those with different HLA-DR genotypes[21].

This research method, combining paraclinical and bioinformatics, or more specifically, developing from biochemical results to LD analysis and then multi-cluster/multivariate, can be widely developed for pre- ML research in medicine, especially to move closer to personalized treatment strategies.

Limitations

Combining data on clinical parameters, immune phenotyping data, genetic information, and ML models shows promise for effectively predicting the likelihood of a functional cure in patients with HBV. However, the limited size of multidimensional datasets and the characteristics of HBV cases pose challenges for machine-learning systems and meta-clustering, which typically require substantial data for pattern recognition[11]. Clustering helps to analyze an unlabeled dataset to group data points based on similarity. However, complex algorithmic adjustments are required to replicate these strategies in pathological studies. Accordingly, branching and iteration with complex algorithms must ensure logic and consistency with the statistical tests.

CONCLUSION

Rs9277535 affects liver fibrosis due to HBV infection, while rs3077 is associated with the risk of HBV-related HCC. In this study, the link between rs2856718, rs3077, and rs9277535 and disease risk was determined using a multi-clustering analysis. Using a key ML method, namely the cluster analysis method, decision trees were produced for each disease group. Ultimately, a complex relationship was observed between HLA variations and disease risk that occurred indirectly through other key markers, including bilirubin, GGT, AST, ALT, and AFP.

ACKNOWLEDGEMENTS

We thank the patients and participants for their consent and permission to publish our data. We thank our colleagues at the National Cancer Hospital in Hanoi, the National Hospital of Tropical Diseases, Thanh Nhan Hospital, and Thai Nguyen Hospital who supported us in patient recruitment and clinical data supply.

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Gastroenterology and hepatology

Country of origin: Viet Nam

Peer-review report’s classification

Scientific Quality: Grade A, Grade B

Novelty: Grade A, Grade B

Creativity or Innovation: Grade A, Grade B

Scientific Significance: Grade A, Grade B

P-Reviewer: Salomon I; Wang L S-Editor: Qu XL L-Editor: A P-Editor: Yu HG

References
1.  World Health Organization  Hepatitis B. Available from: https://www.who.int/news-room/fact-sheets/detail/hepatitis-b.  [PubMed]  [DOI]  [Cited in This Article: ]
2.  Cooke GS, Andrieux-Meyer I, Applegate TL, Atun R, Burry JR, Cheinquer H, Dusheiko G, Feld JJ, Gore C, Griswold MG, Hamid S, Hellard ME, Hou J, Howell J, Jia J, Kravchenko N, Lazarus JV, Lemoine M, Lesi OA, Maistat L, McMahon BJ, Razavi H, Roberts T, Simmons B, Sonderup MW, Spearman CW, Taylor BE, Thomas DL, Waked I, Ward JW, Wiktor SZ; Lancet Gastroenterology & Hepatology Commissioners. Accelerating the elimination of viral hepatitis: a Lancet Gastroenterology & Hepatology Commission. Lancet Gastroenterol Hepatol. 2019;4:135-184.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 286]  [Cited by in F6Publishing: 355]  [Article Influence: 71.0]  [Reference Citation Analysis (0)]
3.  Flower B, Du Hong D, Vu Thi Kim H, Pham Minh K, Geskus RB, Day J, Cooke GS. Seroprevalence of Hepatitis B, C and D in Vietnam: A systematic review and meta-analysis. Lancet Reg Health West Pac. 2022;24:100468.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 1]  [Article Influence: 0.5]  [Reference Citation Analysis (0)]
4.  Kamatani Y, Wattanapokayakit S, Ochi H, Kawaguchi T, Takahashi A, Hosono N, Kubo M, Tsunoda T, Kamatani N, Kumada H, Puseenam A, Sura T, Daigo Y, Chayama K, Chantratita W, Nakamura Y, Matsuda K. A genome-wide association study identifies variants in the HLA-DP locus associated with chronic hepatitis B in Asians. Nat Genet. 2009;41:591-595.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 396]  [Cited by in F6Publishing: 407]  [Article Influence: 27.1]  [Reference Citation Analysis (0)]
5.  Mbarek H, Ochi H, Urabe Y, Kumar V, Kubo M, Hosono N, Takahashi A, Kamatani Y, Miki D, Abe H, Tsunoda T, Kamatani N, Chayama K, Nakamura Y, Matsuda K. A genome-wide association study of chronic hepatitis B identified novel risk locus in a Japanese population. Hum Mol Genet. 2011;20:3884-3892.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 165]  [Cited by in F6Publishing: 178]  [Article Influence: 13.7]  [Reference Citation Analysis (0)]
6.  Hu L, Zhai X, Liu J, Chu M, Pan S, Jiang J, Zhang Y, Wang H, Chen J, Shen H, Hu Z. Genetic variants in human leukocyte antigen/DP-DQ influence both hepatitis B virus clearance and hepatocellular carcinoma development. Hepatology. 2012;55:1426-1431.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 127]  [Cited by in F6Publishing: 139]  [Article Influence: 11.6]  [Reference Citation Analysis (0)]
7.  Ji X, Zhang Q, Li B, Du Y, Yin J, Liu W, Zhang H, Cao G. Impacts of human leukocyte antigen DQ genetic polymorphisms and their interactions with hepatitis B virus mutations on the risks of viral persistence, liver cirrhosis, and hepatocellular carcinoma. Infect Genet Evol. 2014;28:201-209.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 21]  [Cited by in F6Publishing: 23]  [Article Influence: 2.3]  [Reference Citation Analysis (0)]
8.  Al-Qahtani AA, Al-Anazi MR, Abdo AA, Sanai FM, Al-Hamoudi W, Alswat KA, Al-Ashgar HI, Khalaf NZ, Eldali AM, Viswan NA, Al-Ahdal MN. Association between HLA variations and chronic hepatitis B virus infection in Saudi Arabian patients. PLoS One. 2014;9:e80445.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 43]  [Cited by in F6Publishing: 45]  [Article Influence: 4.5]  [Reference Citation Analysis (0)]
9.  Wasityastuti W, Yano Y, Ratnasari N, Triyono T, Triwikatmani C, Indrarti F, Heriyanto DS, Yamani LN, Liang Y, Utsumi T, Hayashi Y. Protective effects of HLA-DPA1/DPB1 variants against Hepatitis B virus infection in an Indonesian population. Infect Genet Evol. 2016;41:177-184.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 15]  [Cited by in F6Publishing: 17]  [Article Influence: 2.1]  [Reference Citation Analysis (0)]
10.  Zhang X, Zheng C, Zhou ZH, Li M, Gao YT, Jin SG, Sun XH, Gao YQ. Relationship between HLA-DP gene polymorphisms and the risk of hepatocellular carcinoma: a meta-analysis. Genet Mol Res. 2015;14:15553-15563.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 10]  [Cited by in F6Publishing: 10]  [Article Influence: 1.1]  [Reference Citation Analysis (0)]
11.  Dost S, Rivas A, Begali H, Ziegler A, Aliabadi E, Cornberg M, Kraft AR, Vidal M.   Unraveling the Hepatitis B Cure: A Hybrid AI Approach for Capturing Knowledge about the Immune System's Impact. Proceedings of the 12th Knowledge Capture Conference 2023. NY, United States: Association for Computing Machinery. 2023.  [PubMed]  [DOI]  [Cited in This Article: ]
12.  Chu YJ, Yang HI, Hu HH, Liu J, Lin YL, Chang CL, Luo WS, Jen CL, Chen CJ. HBV genotype-dependent association of HLA variants with the serodecline of HBsAg in chronic hepatitis B patients. Sci Rep. 2023;13:359.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
13.  Nguyen TT, Ho CT, Bui HTT, Ho LK, Ta VT. Multidimensional Machine Learning for Assessing Parameters Associated With COVID-19 in Vietnam: Validation Study. JMIR Form Res. 2023;7:e42895.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
14.  R-bloggers  Calculating required sample size in R and SAS. 2024. Available from: https://www.r-bloggers.com/2017/02/calculating-required-sample-size-in-r-and-sas/.  [PubMed]  [DOI]  [Cited in This Article: ]
15.  Akazawa Y, Nobuoka D, Takahashi M, Yoshikawa T, Shimomura M, Mizuno S, Fujiwara T, Nakamoto Y, Nakatsura T. Higher human lymphocyte antigen class I expression in early-stage cancer cells leads to high sensitivity for cytotoxic T lymphocytes. Cancer Sci. 2019;110:1842-1852.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Cited by in F6Publishing: 6]  [Article Influence: 1.2]  [Reference Citation Analysis (0)]
16.  Vermehren J, Lötsch J, Susser S, Wicker S, Berger A, Zeuzem S, Sarrazin C, Doehring A. A common HLA-DPA1 variant is associated with hepatitis B virus infection but fails to distinguish active from inactive Caucasian carriers. PLoS One. 2012;7:e32605.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 38]  [Cited by in F6Publishing: 39]  [Article Influence: 3.3]  [Reference Citation Analysis (0)]
17.  O'Brien TR, Kohaar I, Pfeiffer RM, Maeder D, Yeager M, Schadt EE, Prokunina-Olsson L. Risk alleles for chronic hepatitis B are associated with decreased mRNA expression of HLA-DPA1 and HLA-DPB1 in normal human liver. Genes Immun. 2011;12:428-433.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 77]  [Cited by in F6Publishing: 78]  [Article Influence: 6.0]  [Reference Citation Analysis (0)]
18.  Zhang J, Zhan W, Yang B, Tian A, Chen L, Liao Y, Wu Y, Cai B, Wang L. Genetic Polymorphisms of rs3077 and rs9277535 in HLA-DP associated with Systemic lupus erythematosus in a Chinese population. Sci Rep. 2017;7:39757.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 13]  [Cited by in F6Publishing: 11]  [Article Influence: 1.6]  [Reference Citation Analysis (0)]
19.  Nichols L, Taverner T, Crowe F, Richardson S, Yau C, Kiddle S, Kirk P, Barrett J, Nirantharakumar K, Griffin S, Edwards D, Marshall T. In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm. J Clin Epidemiol. 2022;152:164-175.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Reference Citation Analysis (0)]
20.  Caruso C, Candore G, Colonna Romano G, Lio D, Bonafè M, Valensin S, Franceschi C. HLA, aging, and longevity: a critical reappraisal. Hum Immunol. 2000;61:942-949.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 56]  [Cited by in F6Publishing: 59]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]
21.  Mack CL. HLA Associations in pediatric autoimmune liver diseases: Current state and future research initiatives. Front Immunol. 2022;13:1019339.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 6]  [Cited by in F6Publishing: 5]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]