Grantham JP, Hii A, Shenfine J. Preoperative risk modelling for oesophagectomy: A systematic review. World J Gastrointest Surg 2023; 15(3): 450-470 [PMID: 37032794 DOI: 10.4240/wjgs.v15.i3.450]
Corresponding Author of This Article
James Paul Grantham, MBBS, MSc, Doctor, Department of General Surgery, Modbury Hospital, Smart Road, Adelaide 5092, South Australia, Australia. jamespgrantham91@gmail.com
Research Domain of This Article
Surgery
Article-Type of This Article
Systematic Reviews
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Author contributions: Grantham JP and Shenfine J designed the research; Grantham JP and Hii A performed the research and analysed the data; Grantham JP, Hii A and Shenfine J all contributed to writing and reviewing the paper.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
PRISMA 2009 Checklist statement: The authors have read the PRISMA 2009 Checklist and the manuscript was prepared and revised according to the PRISMA 2009 Checklist.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: James Paul Grantham, MBBS, MSc, Doctor, Department of General Surgery, Modbury Hospital, Smart Road, Adelaide 5092, South Australia, Australia. jamespgrantham91@gmail.com
Received: November 21, 2022 Peer-review started: November 21, 2022 First decision: December 26, 2022 Revised: January 9, 2023 Accepted: February 22, 2023 Article in press: February 22, 2023 Published online: March 27, 2023 Processing time: 126 Days and 1.6 Hours
Abstract
BACKGROUND
Oesophageal cancer is a frequently observed and lethal malignancy worldwide. Surgical resection remains a realistic option for curative intent in the early stages of the disease. However, the decision to undertake oesophagectomy is significant as it exposes the patient to a substantial risk of morbidity and mortality. Therefore, appropriate patient selection, counselling and resource allocation is important. Many tools have been developed to aid surgeons in appropriate decision-making.
AIM
To examine all multivariate risk models that use preoperative and intraoperative information and establish which have the most clinical utility.
METHODS
A systematic review of the MEDLINE, EMBASE and Cochrane databases was conducted from 2000-2020. The search terms applied were ((Oesophagectomy) AND (Risk OR predict OR model OR score) AND (Outcomes OR complications OR morbidity OR mortality OR length of stay OR anastomotic leak)). The applied inclusion criteria were articles assessing multivariate based tools using exclusively preoperatively available data to predict perioperative patient outcomes following oesophagectomy. The exclusion criteria were publications that described models requiring intra-operative or post-operative data and articles appraising only univariate predictors such as American Society of Anesthesiologists score, cardiopulmonary fitness or pre-operative sarcopenia. Articles that exclusively assessed distant outcomes such as long-term survival were excluded as were publications using cohorts mixed with other surgical procedures. The articles generated from each search were collated, processed and then reported in accordance with PRISMA guidelines. All risk models were appraised for clinical credibility, methodological quality, performance, validation, and clinical effectiveness.
RESULTS
The initial search of composite databases yielded 8715 articles which reduced to 5827 following the deduplication process. After title and abstract screening, 197 potentially relevant texts were retrieved for detailed review. Twenty-seven published studies were ultimately included which examined twenty-one multivariate risk models utilising exclusively preoperative data. Most models examined were clinically credible and were constructed with sound methodological quality, but model performance was often insufficient to prognosticate patient outcomes. Three risk models were identified as being promising in predicting perioperative mortality, including the National Quality Improvement Project surgical risk calculator, revised STS score and the Takeuchi model. Two studies predicted perioperative major morbidity, including the predicting postoperative complications score and prognostic nutritional index-multivariate models. Many of these models require external validation and demonstration of clinical effectiveness.
CONCLUSION
Whilst there are several promising models in predicting perioperative oesophagectomy outcomes, more research is needed to confirm their validity and demonstrate improved clinical outcomes with the adoption of these models.
Core Tip: The undertaking of an oesophagectomy incurs a high morbidity rate and can lead to mortality. It is therefore incumbent upon the surgeon to appropriately select and counsel prospective patients on anticipated risks. Multivariate clinical decision-making tools can be a powerful adjunct in improving this process when utilised preoperatively. In a world of countless proposed surgical risk models, choosing which model to use can prove challenging. This systematic review represents the largest and most comprehensive effort to determine which model is most relevant, valid and accurate in forecasting perioperative outcomes following oesophagectomy.
Citation: Grantham JP, Hii A, Shenfine J. Preoperative risk modelling for oesophagectomy: A systematic review. World J Gastrointest Surg 2023; 15(3): 450-470
Oesophageal cancer is the eighth most commonly diagnosed cancer worldwide and remains the sixth leading cause of cancer-related deaths globally[1]. The mainstay of curative treatment is surgical resection, an oesophagectomy, often in combination with neoadjuvant chemotherapy or chemoradiotherapy[2]. There are various surgical approaches when performing an oesophagectomy, these are broadly classified as open, hybrid and minimally invasive techniques[3]. Irrespective of the approach an oesophageal resection is a major surgical undertaking; often taking hours to perform, with a significant period of single lung ventilation[4,5]. Post-operative complications are common, occurring in approximately half of all patients[6]. These are most frequently respiratory in nature, which occur in 20%-40% of all patients[7,8]. Anastomotic leak, which can occur in 10%-20% of cases, is perhaps the most feared due to the associated high mortality[9]. The reported rates of mortality in high-volume centres is recognised to lie between 2% and 8%[10]. However, even non-life threatening complications can lead to significant morbidity which can exact a devastating toll on patient outcomes[7].
The substantial associated morbidity and mortality emphasises the critical role of the preoperative assessment in selecting suitable patients for oesophagectomy. Patients require a preoperative assessment to assess if they are fit enough to withstand the physiological strain of the surgery but also enables an opportunity to counsel patients about the risks of surgical treatment. It also permits the identification of higher-risk patients for whom more intense resource allocation may be warranted in the post-operative setting. In recent decades, surgeons have begun to turn to cognitive aids such as surgical risk prediction tools to help guide the decision-making process[11]. Several studies have demonstrated that the utilisation of predictive modelling to augment decision making is superior to isolated subjective clinical judgement[12,13]. By selecting more appropriate surgical candidates, informing patients more accurately and deploying the resources in a more tailored fashion, these tools are designed to improve patient outcomes.
There are many available tools, some of which are generic surgical risk predictors whilst others have been specifically developed and validated for patients undergoing oesophagectomy. Some are based on preoperatively available data and others rely on intraoperative data. Naturally, only tools based exclusively on preoperative data can aid selection of appropriate surgical candidates or be used to better inform patients of their risk status. The clear advantages of utilising these multivariate risk prediction models framed against the proliferating multitude of these models has created a significant conundrum for surgeons attempting to determine which one to adopt. There have been two systematic reviews undertaken to aid surgeon choice of the best tool to utilise. The first, by Findlay et al[14], also assessed the quality of scientific rigor in the development studies from which the models were constructed. Their review concluded that none of the preoperative models evaluated accurately predicted morbidity or mortality. Warnell et al[15] also concluded that none of the existing models could be confidently applied to clinical practice. Despite the disheartening results, many new multivariate risk prediction models have since been developed.
The aim of this research is to conduct an up to date, systematic review assessing which of the pre-operative multivariate data risk models most accurately predict outcomes following oesophagectomy. The primary outcome will be their ability to predict perioperative mortality. The secondary outcomes of the review will focus on their predictive capacity for major morbidity, overall morbidity and index complications such as anastomotic leak and adverse cardiorespiratory events. The working hypothesis is that this systematic review will aid surgeons to use the most accurate preoperative prediction model to select appropriate patients for oesophagectomy, and to aid informed consent for patients in relation to their individual surgical risks and thus allocate resources more appropriately to high-risk patients.
MATERIALS AND METHODS
Search strategy and article selection
A systematic review of the existing literature was undertaken, incorporating the MEDLINE, EMBASE and Cochrane review databases. The search terms applied were ((Oesophagectomy) AND (Risk OR predict OR model OR score) AND (Outcomes OR complications OR morbidity OR mortality OR length of stay OR anastomotic leak)). The articles generated from each search were collated and processed with reporting in accordance with the PRISMA model[16]. Duplicates were excluded, then preliminary screening of titles and abstracts for potentially relevant publications was conducted by the first author. Potentially relevant texts were then assessed in full for eligibility with reference to the inclusion and exclusion criteria by two authors. No pre-existing protocol for a systematic review on this topic was found.
Inclusion and exclusion criteria
The inclusion criteria applied were articles which assessed multivariate based tools using exclusively pre-operatively available data to predict perioperative patient outcomes following oesophagectomy. The perioperative period was defined as any duration whilst an inpatient from the index oesophagectomy admission and no more than 90 d post-operative if the patient had been discharged. Given the significant reduction in morbidity and mortality in recent decades, only articles published in English from 2000 onward were included. The exclusion criteria were publications that described models requiring intra-operative or post-operative data and articles appraising only univariate predictors such as American Society of Anesthesiologists score, cardiopulmonary fitness or pre-operative sarcopenia. Articles that exclusively assessed distant outcomes such as long-term survival or disease-free survival were excluded as were publications using cohorts mixed with other surgical procedures. Studies which presented insufficient data for meaningful analysis, such as calibration measures in the form or P-values or area under the receiver operating characteristic curve (AUC) and/or discrimination statistics, were also excluded. Abstracts that were superseded by full articles were excluded. Abstracts from conference proceedings not subsequently published in full were considered eligible for inclusion, provided it included sufficient data for meaningful analysis as outlined above.
Data extraction and synthesis
The essential study characteristics extracted included the study period, geographical location, data source including the number of centres involved, sample size and case mix descriptors such as type of operation. Patient characteristics including the proportion of neoadjuvant therapy use and histological subtype were also extracted. For each article, we recorded the model or models which were tested within and essential performance metrics such as discrimination and calibration. Outcome measures such as definitions of perioperative mortality and morbidity were also extracted. Heterogeneity of surgical method was considered by identifying and classifying surgical technique into either transthoracic, transhiatal, hybrid or totally minimally invasive oesophagectomy for each article. Heterogeneity in outcome definitions was minimised by considering the broad outcomes of mortality, major morbidity as defined as grade three or four by the Clavien-Dindo classification, overall morbidity and respiratory complications[17]. Index outcomes such as anastomotic leak, readmission, return to theatre and length of stay were also considered when specifically reported. All risk prediction models were analysed in the following five domains: Clinical credibility, methodological quality, external validation, model performance and clinical effectiveness.
Clinical credibility
Clinical credibility is whether the characteristics of the prognostic model encourage clinicians to utilise the system[18-20]. This was first outlined in the systematic review of clinical prediction models in 2011 and applied to the appraisal of oesophageal resection risk models in 2014[14,21]. There are seven components addressed in the assessment of clinical credibility and each is scored in the affirmative, partially or negative. These include whether the model uses oesophageal specific factors and avoids using thresholds for data categorisation. It also considers whether the data is available prior to the time of clinical decision-making, if the data is objective and how easily the data required to generate the outcome can be obtained. The last two factors consider whether the model can be rendered in a way understandable to the clinician and if it effectively stratifies the risk of a particular outcome in a clinically useful fashion. A full description of the methods applied to assessing clinical credibility has been supplied in the Supplementary materials.
Methodological quality
We adopted the quality assessment framework of Minne et al[21] to ensure a high standard of methodological quality of the examined studies and to minimise the risk of bias[22-24]. This utilises a framework of twenty points with eight points allotted to study participation characteristics, four points to prognostic factor and outcome measurement characteristics and the remaining eight points to the methodological integrity of the study analysis[24]. Models which satisfied a particular component were awarded one point, partial satisfaction conveyed half a point and no points were awarded if the relevant component was not satisfied. A detailed outline of this assessment criteria can be found within the Supplementary materials.
External validation
We assessed whether the included studies reported a new model or externally validated an existing model. We subsequently analysed if a given model had been externally validated within a separate population.
Model performance
The performance of each model was compared based on discrimination and calibration metrics. Discrimination is the ability of the model to discern between those that will and will not develop an outcome, in this case post-operative complications[25]. The accuracy with which a predictive model discriminated between outcomes was measured in terms of area under the receiver operating characteristics (ROC) curve or c-statistic. In the instance of the model having no discriminative ability, the c-statistic will be 0.5, whereas a c-statistic of 1 suggests perfect discrimination[26]. The threshold for acceptable discriminative capacity has been previously defined as a c-statistic exceeding 0.7[27]. Calibration pertains to the fidelity between the actual and the predicted frequency of an outcome[25]. This is represented in terms of Hosmer and Lemeshow goodness of fit P-values and observed to expected outcome (O:E) ratios. A P-value of greater than 0.05 indicates adequate calibration on goodness of fit when applied to linear regression models and an O:E ratio of 1 indicates perfect calibration[28]. An O:E ratio of < 1 indicates that the model overestimates the predicted outcome, whereas a ratio of > 1 indicates it underestimates the frequency of the predicted outcome measure[28]. Where adequate data reporting allowed, weighted AUC discrimination metrics were generated for each model by calculating the mean across individual studies with weighted reference to the study cohort size.
Clinical effectiveness
We also assessed all studies for evidence that the application of any of the individual models has been clinically proven to improve patient outcomes.
RESULTS
Search results
The initial search of composite databases yielded 8715 articles which reduced to 5827 following the deduplication process. After title and abstract screening, 197 potentially relevant texts were retrieved for detailed review. Of these, a total of 27 articles satisfied the inclusion and exclusion criteria. The rationale for exclusion of the 170 articles omitted is illustrated (See Figure 1). In total, thirteen articles were developing new predictive risk models for oesophagectomy[29-41] (Table 1). Two of these studies, by Filip et al[36] and Wan et al[41] respectively, also served to externally validate other existing models. The remaining 14 articles exclusively externally validated existing models on new data sets[42-55] (Table 2). Many studies sought to test the performance of multiple models within the same dataset. These 27 articles appraised the use of a total of 21 different preoperative multivariate risk prediction models in oesophagectomy. As stated above, thirteen of the twenty-one models had their development study within the list of retrieved articles. The remaining eight models were developed for predicting outcomes in patients not initially undergoing oesophagectomy but were subsequently validated in an oesophagectomy cohort[56-63]. A reference key for the various abbreviations used in relation to the models is provided in Figure 2.
The included studies were published over a fourteen year period and originated from four different continents. Ten studies arose from North America, nine from Europe, six from Asia and two, both involved Europe with the second databases arising from North America and Australia respectively. All multivariate models utilised logistical regression of retrospective patient cohort data. The thirteen articles developing a new predictive model had a median study population size of 1172 (range 90-10826). The fourteen articles exclusively validating existing models had a median study population size of 246 (range 43-1039).
There was significant heterogeneity in operative approach and technique within the studies. Twenty-two of the articles incorporated open oesophagectomy, all included an open transthoracic procedure (Ivor-Lewis, left thoracolumbar or McKeown), fifteen of which utilised a transhiatal approach, and eight included minimally invasive oesophagectomy with three incorporating patients undergoing a hybrid oesophagectomy approach. Only two studies exclusively dealt with patients undergoing minimally-invasive oesophagectomy. Three studies of large national multicentre databases failed to detail the operative strategy.
In total, 24 of the 27 studies reported the overall rate of neoadjuvant therapy, including two studies for which this was an exclusion criteria. The rates observed varied significantly between studies, ranging from 3.6% to 87.0%. The total combined samples had 33.6% receiving neoadjuvant therapy. The histological subtype of oesophageal cancer was reported in 16 of the 27 studies, including three studies originating from Asia and thirteen from Western nations. Overall, where reported, 56.3% of patients had adenocarcinoma compared to 37.9% with squamous cell carcinoma. Across the studies 5.8% had another histological tumour type. These characteristics are reported across Tables 1 and 2.
Clinical credibility
The median clinical credibility score, out of 7, was 5.5 (range 4.5-6) (Table 3). Six models scored highest at 6 out of 7: The Rotterdam, Philadelphia, Amsterdam, prognostic nutritional index (PNI), and the original and revised STS models[30-33,37,56]. Twelve of these twenty-one preoperative models were oesophageal-specific and all models provided timely data for clinical decision making. Three of these models used subjectively reported patient health questionnaire data. Seventeen of the twenty-one preoperative models were considered easy to generate with the other four reliant on pre-operative spirometry, which may not be routinely performed. Three of the 21 preoperative models were considered challenging to understand. Sixteen of the twenty-one preoperative models were found to generate a useful scoring range to prognosticate patient outcomes.
Table 3 Clinical credibility of preoperative models.
Only 20 of the models were able to be appraised for methodological quality, with the prognostic nutritional index original development study being unavailable in English[56]. Overall, the median score was 7.5 out of 8 (range 6-8). Of the model development studies, all but the Geriatric Nutrition Risk Index model sufficiently outlined the setting and period in which the study was conducted[59]. Five of the model development studies failed to outline their exclusion criteria appropriately. All studies detailed their patient mix and number of patients. Just one of the development studies had fewer than 100 patients and one model failed to report the mortality rate of patients. Sixteen models reported the characteristics of their cohort sufficiently and one scored partial marks in this area. Seven development studies did not utilise a sample patient group representative of the population to which the model would be applied. These omissions often related to a single gender within the sample, neoadjuvant treatment being an exclusion criteria or patients being selected based on age requirements.
Methodological quality - prognostic factor and outcome measurement
The majority of the development studies available for analysis performed well in defining their prognostic factors and outcome measurements. The median score was 4 out of 4 (range 3-4). The lowest performing models achieved three out of a possible four points and this occurred in four models. All development studies defined their prognostic factors and model type, as well as their outcomes. Four of the models failed to outline their handling of missing data and a further two only did so in part.
Methodological quality - analysis
The median score for methodological quality of analysis was 5.75 out of 8 (range 4-8). All studies which developed preoperative models had adequate reporting on their evaluation measures, model building strategy and testing method. Seven failed to test or report the model’s discriminatory capacity and fourteen also failed in reporting calibration. Only six studies also tested model performance on a testing set. Five studies had insufficient data to appraise the quality of their analysis fully and there were two instances of selective reporting found. One quarter of the preoperative models were compared to existing predictive tools within their development study.
Methodological quality - overall performance
Overall, the average score of methodological quality for the 20 studies appraised was 16.7 out 20. The median and mode score achieved was 16.5. The lowest scoring models were the Charlson comorbidity index, Cologne score and geriatric nutritional risk index, all of which scored fourteen[29,57,59]. The best scoring risk prediction models in this group for methodological quality were the PNI-multivariate score and the RAI-revised score, each scoring nineteen out of 20[36,63]. The overall methodological quality of the preoperative models is outlined in Table 4.
Table 4 Methodological quality (overall performance) for preoperative models.
Model
Study participation (out of 8)
Measurements (out of 4)
Analysis (out of 8)
Total (out of 20)
PNI
N/A
N/A
N/A
N/A
CCI
6
4
5
14
ACCI
6
4
4.5
14.5
GNRI
6.5
3
4.5
14
Cologne
7
3
4
14
Rotterdam
7.5
4
6
17.5
Philadelphia
7.5
4
5
16.5
Amsterdam
8
3.5
7
18.5
Original STS
8
4
5.5
17.5
Ferguson
7.5
4
5
16.5
NSQIP SRC
7.5
3.5
6
16.5
Takeuchi
8
3
7
18
PNI multivariate
8
4
7
19
Revised STS
8
4
4.5
16.5
PER
7
4
4
15
RAI-A
7
4
6.5
17.5
5 Factor MFI
6.5
3
6.5
16
PPCS
7
4
5.5
16.5
JNCD
8
4
6.5
18.5
RAI-revised
7
4
8
19
RAI-revised (CC)
8
4
6.5
18.5
External validation
Eight of the twenty-one preoperative prediction models had been previously developed and were externally validated within this group of articles. Of the thirteen preoperative risk models that were development studies within the collated articles, six were subsequently externally validated. In total 14 out of 21 preoperative models have been externally validated. These findings are outlined in Figure 3.
Figure 3 External validation status of pre-operative models.
CCI: Charlson comorbidity index; ACCI: Age-adjusted comorbidity index; GNRI: Geriatric nutritional risk index; NSQIP SRC: National Surgical Quality Improvement Program Surgical Risk Calculator; RAI-A: Administrative risk analysis index; MFI: Modified frailty index; STS: Society of Thoracic Surgeons Oesophagectomy Composite Score; PNI: Prognostic nutritional index; PPCS: Predicting postoperative complications score; JNCD: Japanese National Clinical Database.
Model performance - perioperative mortality
Fourteen of the twenty-seven included studies had an outcome measure related to perioperative mortality, but the mortality endpoints varied across studies, with some considering inpatient mortality and others selecting a post-operative time frame, typically 30 or 90 d. Multiple papers appraised two or more performance models, leading to a total of twenty instances of a preoperative risk model being tested for predicting mortality. Overall, thirteen of the twenty-one preoperative prediction models were tested against mortality. Eleven of the models utilised discrimination, represented through area under the ROC curve. Three models had a weighted average exceeding 0.70, thereby indicating clinical utility. These included the Takeuchi score, the revised STS model and the National Quality Improvement Project (NSQIP) surgical risk calculator[35,37,60]. Calibration was represented more heterogeneously, the majority used Hosmer-Lemeshow goodness of fit or O:E ratios but of the fourteen studies which tested models against mortality on twenty occasions, calibration was reported in just eight instances. The calibration was adequate in all instances. The best performing preoperative calibration model in terms of calibration was the Rotterdam score[30]. This was adequately calibrated to mortality in each of the three instances it was tested[30,42,52]. The Philadelphia score was also adequately calibrated in both studies it was tested[31,42]. The overall performance of these models in relation to predicting mortality outcomes is illustrated in Table 5.
Table 5 Summary of the performance for all preoperative models in predicting perioperative mortality.
Five of the twenty-seven studies had an outcome measure related to perioperative major morbidity all based on a grade three Clavien-Dindo complication or higher. All five preoperative multivariate models reported discrimination statistics in the form of area under the ROC curve. Two preoperative models had a weighted mean exceeding 0.7: The predicting postoperative complications score (PPCS) model and the PNI multivariate[36,39]. Neither model has been externally validated in a second cohort as reaching the utility threshold. Only on one occasion was calibration reported in predicting major morbidity, namely the PNI-multivariate model, which was found to be sufficiently calibrated[36]. Model performance in relation to major morbidity outcomes is summarised in Table 6.
Table 6 Summary of the performance for all preoperative models in predicting perioperative major morbidity.
Model performance - overall perioperative morbidity
Eleven out of the twenty-seven studies measured outcomes in relation to overall perioperative morbidity, not specified to respiratory complications. There were seventeen instances of a preoperative models being tested in predicting overall morbidity found. Eleven different models were tested for these complications, with nine having discriminatory performance represented through area under the ROC curve. No model possessed a weighted mean that reached the threshold for clinical utility. The best performance was the Amsterdam model with a weighted AUC of 0.64[32]. Only eight of the seventeen instances in which the models were tested for predicting overall complications reported calibration with it being sufficient calibration on five occasions. The Amsterdam model was well calibrated in all three studies in which it was reported[32,36,43]. The NSQIP was appropriately calibrated in one out of two studies and the Prognostic Nutritional Index was sufficiently calibrated in the sole study it was reported[36,53,54]. A summary of model performance in predicting perioperative morbidity outcomes is presented in Table 7.
Table 7 Summary of the performance for all preoperative models in predicting perioperative morbidity.
Model performance - perioperative respiratory complications/anastomotic leak/readmission/return to theatre
Four articles appraised five instances of three different model’s performance in predicting respiratory complications. These included the Ferguson score, the geriatric nutritional risk index and the prognostic nutritional index, however, none of these reached a weighted mean c-statistic of clinical utility[34,56,59]. The Ferguson score was the best performing in terms of discrimination, reaching significance in two out of the three studies in which it was tested but only had a weighted-average c-statistic of 0.669[34,49,50]. The Ferguson model was appropriately calibrated in both studies for which this was reported[34,49]. None of the other models had reporting of calibration. A single study by Ohkura et al[40] assessed model performance in predicting anastomotic leak rate but this failed to reach sufficient discrimination and did not report calibration. Only the NSQIP surgical risk calculator was tested specifically for the prediction of readmission and return to theatre rates[53-55]. For return to theatre, this model was poorly calibrated and was unable to discriminate outcomes in all studies[53-55]. The surgical risk calculator demonstrated utility and good calibration for predicting readmission in a single study but overall performed poorly in this area too[55]. A summary of model performance for these secondary outcome measures is illustrated in Table 8.
Table 8 Summary of the performance for all preoperative models in predicting respiratory complications, return to theatre, readmission and anastomotic leak.
The summary of all the models and their performance for each outcome against which they were tested has been outlined for preoperative models (Tables 5-8). The weighted average area under the ROC curve is presented in each of the major four outcomes for every model in which these were reported (Figure 4). Meaningful subgroup analysis of model performance based on surgical approach was not feasible as many articles incorporated multiple surgical approaches and did not delineate model performance for each technique. Similar limitations also prevented subgroup analyses of model performance on the basis of histological subtype and the administration of neoadjuvant chemotherapy.
Figure 4 Weighted mean of c-statistics for each major outcome.
CCI: Charlson comorbidity index; ROC: Receiver operating characteristic; ACCI: Age-adjusted comorbidity index; GNRI: Geriatric nutritional risk; NSQIP SRC: National Surgical Quality Improvement Program Surgical Risk Calculator; RAI-A: Administrative risk analysis index; MFI: Modified frailty index; STS: Society of Thoracic Surgeons Oesophagectomy Composite Score; PNI: Prognostic nutritional index; PPCS: Predicting postoperative complications score.
Clinical effectiveness
None of the models were tested prospectively in terms of whether adoption of the model in clinical decision making would lead to improved clinical outcomes.
Overall performance
The overall performance of each model within the five domains is outlined in Table 9.
Table 9 Summary of the preoperative models across the five categories.
This systematic review included twenty-seven articles utilising twenty-one different preoperative risk prediction models deemed to forecast outcomes after oesophagectomy. Twelve of these were specifically devised for oesophageal resection and fourteen models have been externally validated. The clinical credibility of the development studies of these models was generally strong. The methodological quality of the majority of the studies was also sound, with more recent studies trending better in this assessment. Only one model’s development study was not available for analysis. However, with respect to model performance, the findings were underwhelming and there were only a few instances in which models demonstrated clinical utility.
Across the breadth of the articles, just three preoperative risk models possessed a weighted mean of discriminatory capacity sufficient to be of clinical utility in predicting perioperative mortality. These three models were the NSQIP surgical risk calculator, the Takeuchi score and the revised STS model[35,37,60]. It must be noted that of the two occasions that the NSQIP surgical risk calculator and Takeuchi score were tested, both reached clinical utility on only one of the two occasions[35,51,54,55]. Furthermore, the revised STS model is yet to be externally validated. Calibration was not reported for the Takeuchi score or revised STS model but the NSQIP surgical risk calculator reported calibration once, and performed well[54]. A handful of other models displayed clinically useful discrimination in one of the two studies in which they were tested but failed to meet this threshold in the weighted mean. These included the Charlson comorbidity index, the age-adjusted Charlson comorbidity index and Rotterdam scores[30,46]. All three of these models performed well with respect to calibrating expected mortality in the studies in which this was reported[30,42,44,52].
In terms of the preoperative prediction of non-fatal complications, the performance of the models was also underwhelming. Only two models demonstrated clinical utility forecasting perioperative major morbidity: The PPCS model and the PNI-multivariate[36,39]. The PNI-multivariate model had good calibration in its only study whereas the PPCS model calibration remains unreported in the literature[36]. The clinical credibility of both were strong and the methodological quality of the PNI-multivariate was sound[36]. However, neither of these models have been externally validated. No preoperative risk model demonstrated adequate performance in discriminating overall morbidity. The best performer in this area was the Amsterdam score which calibrated well but was unable to sufficiently discriminating outcomes[32]. Similarly, no model consistently displayed clinical utility in predicting respiratory complications. The most promising model was the Ferguson pulmonary score, developed specifically for predicting respiratory outcomes[34]. In two of three studies, it performed well in discrimination and calibration, but the weighted mean was adversely affected by a poor performance in the third study[34,49,50]. Discouragingly, no preoperative risk model could predict anastomotic leak, readmission or return to theatre.
The results of this systematic review are consistent with the major findings of previous systematic reviews in this area. Findlay et al[14] concluded that no preoperative model predicted post-operative morbidity or mortality with sufficient accuracy and Warnell et al[15] concluded that no models could be applied to clinical practice with any confidence. The models identified in our review as having clinical promise in predicting mortality and major complications were developed subsequent to these reviews. The reasons for vast majority of these models failing to sufficiently predict outcomes are multifactorial. Most clinical prediction tools are generated from outcome data from the same cohort on which the model is subsequently tested[23]. This predisposes the models to bias through overfitting to the development data set and thus subsequently poor performance when applied to an external population dataset[23]. In addition, several models were developed from a single centre with a relatively small dataset that further confounded their ability to predict uncommon clinical outcomes especially considering the relative rarity of mortality or major morbidity post-oesophagectomy. Larger development models are therefore required to reliably predict these events.
Aside from the studied multivariate risk models, there are a plethora of single factor prognostic indicators researched over this period. There have been three studies of the discriminatory capacity of cardiopulmonary fitness testing (CPEX), often represented through anaerobic threshold and VO2 maximum[64]. In each study CPEX fell short of reaching clinical utility thresholds in predicting major complications following oesophagectomy[65,66]. Preoperative sarcopenia, represented through grip strength or volumetric psoas muscle analysis, has also been highlighted as a prognostic marker for perioperative and long-term outcomes following oesophagectomy. But again, the performance of sarcopenia in predicting outcomes following oesophagectomy has been highly variable[67]. A systematic review conducted in 2020 by Papaconstantinou et al[67] found a statistically significant relationship between preoperative sarcopenia and overall perioperative morbidity, respiratory complications and anastomotic leaks. However, the same study failed to demonstrate correlative significance for sarcopenia and perioperative mortality or major complications (Clavien-Dindo grade III or higher)[67].
There are a number of strengths to this review. The review was conducted thoroughly and reported in accordance with the PRISMA method, outlining the study search and selection strategy. There was no iterative manipulation of the search terms or strategy to allow for selective inclusion or exclusion or specific articles. To the knowledge of the authors, this is the third systematic review to appraise multivariate risk models in the prediction of perioperative outcomes following oesophagectomy. It just the second to incorporate qualitative analysis of the risk models involve. This review is the first to consider the issue since 2015 and over the intervening period, there has been a substantial proliferation of multivariate risk models in the literature. Therefore, this systematic review is the largest of its kind. Although somewhat peripheral to the scope of this review, the temporal gap between this review and the preceding systematic review means this review can uniquely consider the performance of these multivariate risk models against the burgeoning list of other recently developed clinical predictors as outlined above. In contrast to a previous related effort, this review has not excluded low-volume centres in the analysis. Perhaps the greatest strength of this submission is that it is the first to isolate models which exclusively use preoperative variables. This is important because by their very nature, only preoperative risk prediction models can assist surgeons in selecting appropriate surgical candidates and appropriately counselling these patients of their risks prior to an operation.
Despite this, a number of common challenges were encountered. The quality of the results generated was limited by the completeness of reporting in the original publications added to which is a risk of positive finding publication bias. We limited our search to articles published in English and from the year 2000 onward, which whilst pragmatic, could have led to the exclusion of valuable publications. This review also did not consider long-term survival or patient reported quality of life outcomes, both of which may influence the decision whether to undertake surgical intervention. Qualitative analysis of the risk prediction models, whilst deemed a source of strength, can sometimes be subjective. There were also several challenges unique to this topic, many of which were also encountered during the preceding systematic reviews. Across the studies, there was significant heterogeneity in clinical practice and methodology in outcome measurements. Much of this related to the regional and temporal variance observed in the treatment of oesophageal cancer within the studies.
These limitations also highlight areas in which further research could be focused. A few preoperative prediction models do show promise but have not yet been externally validated. If these models were tested in a different population group, it would certainly strengthen the case for their application. Owing to the low risk of mortality following oesophagectomy, any attempt to demonstrate clinical improvement would require a large multicentre, long-term prospective clinical trial, this likely contributes to why none of the studies have been used to show prospective improvement in clinical outcomes. If a model was demonstrated to lead to better outcomes, it would encourage surgeons to utilise such model in everyday practice. Finally, with an increasing emphasis on individualised medicine, future research should also seek to develop and define models that also focus on long-term survival and patient reported quality of life outcomes.
CONCLUSION
A large number of clinical multivariate risk models have been developed or adapted to use in predicting perioperative outcomes including morbidity, major morbidity and mortality following oesophagectomy. By being based on preoperative variables, they are designed to aid in patient selection for surgical resection and to guide informed preoperative counselling of patients. This study has demonstrated that most models are clinically credible and were constructed with sound methodological quality, but their performance was often insufficient to prognosticate patient outcomes. In total, three models were identified as being capable in discriminating patients for mortality: The NSQIP surgical risk calculator, the revised STS score and the Takeuchi model. Two models predicted postoperative major morbidity: The PPCS model and PNI-multivariate model. However, most of these models are not externally validated and none have shown clinical effectiveness in improving outcomes. Further research is needed before prediction models can be confidently applied to clinical practice in selecting appropriate surgical candidates, counselling patients on surgical risk and guiding postoperative resource allocation.
ARTICLE HIGHLIGHTS
Research background
Oesophageal cancer is the eighth most common type of cancer and sixth leading cause of cancer-related death worldwide. If it is detected in the early stages, an oesophagectomy can be undertaken with realistic curative intent. Unfortunately, this surgery comes with a significant morbidity burden and can result in fatal outcomes, making appropriate selection of surgical candidates imperative. Numerous multivariate risk prediction models have been devised to augment this decision-making with ongoing conjecture as to which risk prediction tool is most reliable. This publication is the first systematic review in seven years to attempt to resolve which model most accurately predicts perioperative outcomes following oesophagectomy.
Research motivation
The identification of the best preoperative risk prediction model would allow surgeons apply this to clinical practice. Such a tool may assist in augmenting clinical decision making to better identify and counsel appropriate surgical candidates for oesophagectomy. It is expected that improved patient selection would lead to overall improved perioperative outcomes for patients suffering from oesophageal cancer.
Research objectives
The objective of this research is to conduct a contemporary systematic review assessing which preoperative multivariate risk model best predicts perioperative oesophagectomy outcomes. The primary objective relates to appraising predictive performance for mortality outcomes. The secondary objectives are to assess the ability of the multivariate models in forecasting major morbidity, overall morbidity and specific key complications such as respiratory complications and anastomotic leak.
Research methods
A systematic review incorporating the MEDLINE, Embase and Cochrane databases was conducted from 2000-2020. Applied search terms were ((Oesophagectomy) AND (Risk OR predict OR model OR score) AND (Outcomes OR complications OR morbidity OR mortality OR length of stay OR anastomotic leak)). Only multivariate based tools which utilised exclusively data available preoperatively to predict perioperative outcomes following oesophagecotmy were included with articles generated, collated and then reported in accordance with PRISMA guidelines. All risk models were appraised across the five domains of clinical credibility, methodological quality, model performance, external validation and clinical effectiveness.
Research results
The initial search yielded 8715 articles which was reduced to 197 potentially relevant texts after deduplication, title and abstract screening. Following detailed assessment of these articles, 27 published studies were ultimately included with these examining 21 multivariate preoperative risk prediction models. The majority of models were clinically credible with sound methodological quality but many models still require external validation and none had yet proven clinical effectiveness with their adoption. Three models adequately predicted perioperative mortality (National Surgical Quality Improvement Program surgical risk calculator, revised Society of Thoracic Surgeons oesophagectomy composite score and Takeuchi model) whilst two (predicting postoperative complications score and prognostic nutritional index-multivariate model) predicted major morbidity sufficiently.
Research conclusions
There are a few well-constructed and credible multivariate risk prediction models that demonstrate promise in forecasting perioperative mortality and major morbidity outcomes. However, more research is required in the sphere of external validation and to demonstrate improved clinical outcomes with the adoption of these models in preoperative surgical patient selection.
Research perspectives
There is a research gap in externally validating some of these models which have yet to be assessed outside of their development cohort. Ultimately, the direction of future research should involve the development of a prospective randomised controlled trial in which one group would utilise clinical discretion with the other applying one of the promising preoperative risk prediction models in determining appropriate surgical candidates. In such a trial, clinical effectiveness with the adoption of a risk prediction model could be demonstrated if improved patient outcomes were observed. This would provide compelling evidence for the broader application of such a risk prediction model in patient selection for oesophagectomy.
ACKNOWLEDGEMENTS
We would like to acknowledge the assistance of Nikki May, SA Health librarian in the construction and execution of the search strategy. This work was initially undertaken as part of the University of Edinburgh, Masters of Surgical Science.
Footnotes
Provenance and peer review: Unsolicited article; Externally peer reviewed.
Kattan MW, Yu C, Stephenson AJ, Sartor O, Tombal B. Clinicians versus nomogram: predicting future technetium-99m bone scan positivity in patients with rising prostate-specific antigen after radical prostatectomy for prostate cancer.Urology. 2013;81:956-961.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 38][Cited by in F6Publishing: 40][Article Influence: 3.6][Reference Citation Analysis (0)]
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews.BMJ. 2021;372:n71.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 32381][Cited by in F6Publishing: 28484][Article Influence: 9494.7][Reference Citation Analysis (0)]
Taylor AP, Webb RI, Barry JC, Hosmer H, Gould RJ, Wood BJ. Adhesion of microbes using 3-aminopropyl triethoxy silane and specimen stabilisation techniques for analytical transmission electron microscopy.J Microsc. 2000;199:56-67.
[PubMed] [DOI][Cited in This Article: ]
Lagarde SM, Reitsma JB, Maris AK, van Berge Henegouwen MI, Busch OR, Obertop H, Zwinderman AH, van Lanschot JJ. Preoperative prediction of the occurrence and severity of complications after esophagectomy for cancer with use of a nomogram.Ann Thorac Surg. 2008;85:1938-1945.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 79][Cited by in F6Publishing: 82][Article Influence: 5.1][Reference Citation Analysis (0)]
Wright CD, Kucharczuk JC, O'Brien SM, Grab JD, Allen MS; Society of Thoracic Surgeons General Thoracic Surgery Database. Predictors of major morbidity and mortality after esophagectomy for esophageal cancer: a Society of Thoracic Surgeons General Thoracic Surgery Database risk adjustment model.J Thorac Cardiovasc Surg. 2009;137:587-95; discussion 596.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 252][Cited by in F6Publishing: 273][Article Influence: 18.2][Reference Citation Analysis (0)]
Takeuchi H, Miyata H, Gotoh M, Kitagawa Y, Baba H, Kimura W, Tomita N, Nakagoe T, Shimada M, Sugihara K, Mori M. A risk model for esophagectomy using data of 5354 patients included in a Japanese nationwide web-based database.Ann Surg. 2014;260:259-266.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 338][Cited by in F6Publishing: 411][Article Influence: 45.7][Reference Citation Analysis (0)]
Raymond DP, Seder CW, Wright CD, Magee MJ, Kosinski AS, Cassivi SD, Grogan EL, Blackmon SH, Allen MS, Park BJ, Burfeind WR, Chang AC, DeCamp MM, Wormuth DW, Fernandez FG, Kozower BD. Predictors of Major Morbidity or Mortality After Resection for Esophageal Cancer: A Society of Thoracic Surgeons General Thoracic Surgery Database Risk Adjustment Model.Ann Thorac Surg. 2016;102:207-214.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 147][Cited by in F6Publishing: 180][Article Influence: 22.5][Reference Citation Analysis (0)]
Reeh M, Metze J, Uzunoglu FG, Nentwich M, Ghadban T, Wellner U, Bockhorn M, Kluge S, Izbicki JR, Vashist YK. The PER (Preoperative Esophagectomy Risk) Score: A Simple Risk Score to Predict Short-Term and Long-Term Outcome in Patients with Surgically Treated Esophageal Cancer.Medicine (Baltimore). 2016;95:e2724.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 7][Cited by in F6Publishing: 7][Article Influence: 0.9][Reference Citation Analysis (0)]
Saito T, Tanaka K, Ebihara Y, Kurashima Y, Murakami S, Shichinohe T, Hirano S. Novel prognostic score of postoperative complications after transthoracic minimally invasive esophagectomy for esophageal cancer: a retrospective cohort study of 90 consecutive patients.Esophagus. 2019;16:155-161.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 3][Cited by in F6Publishing: 6][Article Influence: 1.2][Reference Citation Analysis (0)]
Ohkura Y, Miyata H, Konno H, Udagawa H, Ueno M, Shindoh J, Kumamaru H, Wakabayashi G, Gotoh M, Mori M. Development of a model predicting the risk of eight major postoperative complications after esophagectomy based on 10 826 cases in the Japan National Clinical Database.J Surg Oncol. 2020;121:313-321.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 13][Cited by in F6Publishing: 26][Article Influence: 6.5][Reference Citation Analysis (0)]
Bosch DJ, Pultrum BB, de Bock GH, Oosterhuis JK, Rodgers MG, Plukker JT. Comparison of different risk-adjustment models in assessing short-term surgical outcome after transthoracic esophagectomy in patients with esophageal cancer.Am J Surg. 2011;202:303-309.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 33][Cited by in F6Publishing: 33][Article Influence: 2.5][Reference Citation Analysis (0)]
Filip B, Hutanu I, Radu I, Anitei MG, Scripcariu V. Assessment of different prognostic scores for early postoperative outcomes after esophagectomy.Chirurgia (Bucur). 2014;109:480-485.
[PubMed] [DOI][Cited in This Article: ]
Yamana I, Takeno S, Shibata R, Shiwaku H, Maki K, Hashimoto T, Shiraishi T, Iwasaki A, Yamashita Y. Is the Geriatric Nutritional Risk Index a Significant Predictor of Postoperative Complications in Patients with Esophageal Cancer Undergoing Esophagectomy?Eur Surg Res. 2015;55:35-42.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 30][Cited by in F6Publishing: 35][Article Influence: 3.9][Reference Citation Analysis (0)]
D'Journo XB, Berbis J, Jougon J, Brichon PY, Mouroux J, Tiffet O, Bernard A, de Dominicis F, Massard G, Falcoz PE, Thomas P, Dahan M. External validation of a risk score in the prediction of the mortality after esophagectomy for cancer.Dis Esophagus. 2017;30:1-8.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 8][Cited by in F6Publishing: 12][Article Influence: 1.7][Reference Citation Analysis (0)]
Ravindran K, Escobar D, Gautam S, Puri R, Awad Z. Assessment of the American College of Surgeons National Surgical Quality Improvement Program Calculator in Predicting Outcomes and Length of Stay After Ivor Lewis Esophagectomy: A Single-Center Experience.J Surg Res. 2020;255:355-360.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 1][Cited by in F6Publishing: 3][Article Influence: 0.8][Reference Citation Analysis (0)]
Onodera T, Goseki N, Kosaki G. [Prognostic nutritional index in gastrointestinal surgery of malnourished cancer patients].Nihon Geka Gakkai Zasshi. 1984;85:1001-1005.
[PubMed] [DOI][Cited in This Article: ]
Papaconstantinou D, Vretakakou K, Paspala A, Misiakos EP, Charalampopoulos A, Nastos C, Patapis P, Pikoulis E. The impact of preoperative sarcopenia on postoperative complications following esophagectomy for esophageal neoplasia: a systematic review and meta-analysis.Dis Esophagus. 2020;doaa002.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 32][Cited by in F6Publishing: 24][Article Influence: 6.0][Reference Citation Analysis (0)]