INTRODUCTION
Cardiotocography (CTG) or the interpretation of fetal heart rate (FHR) patterns has been the most widely accepted and practiced method of intrapartum fetal monitoring for over 50 years. However, it remains the most controversial and problematic issue in Obstetrics despite being the commonest medical procedure in the western world and also the most extensively studied[1,2]. Controlled prospective trials show that caesarean section and operative delivery rates are increased with continuous CTG without improvement in fetal/neonatal short or long term outcomes irrespective of low or high risk labours[3]. Nevertheless, the CTG has become a “standard of care” expected especially in the presence of common risk factors. Severe perinatal hypoxia remains rare but can lead to distressing catastrophic outcomes like perinatal death or permanent neurological damage. Most developed countries spend big sums of money on litigation for cerebral palsy. For example the National Health Service in England paid out £3.1 billion (49% value of all claims) for negligence linked to maternity care in the past decade, mainly for cerebral palsy and errors in the interpretation of CTGs[4]. The term electronic fetal monitoring (EFM) is generally used to encompass techniques other than simple auscultation of FHR with a stethoscope or Doppler device. Although perceived to be a “defensive” practice, it is the EFM that has been the main driver for the increasing litigation for neonatal neurological injury and cerebral palsy[5]. This financial imperative (rightly or wrongly) pushes the issue of “intrapartum fetal monitoring” to the top of the patient safety agenda. This of course highlights the need, opportunity and potential to improve intrapartum fetal monitoring and patient safety. There is a strong desire and wish for some other different modern technology to replace CTG. However, presently it is difficult to see which cutting edge technology would be user-friendly, relatively non-invasive, and at the same time cost-effective and suitable for mass-application to the physiological process of childbirth. National Professional bodies have devised 3-tier systems of CTG interpretation which utilise multiple FHR parameters in different combinations with increasing degrees of abnormalities with an aim to achieve acceptable positive and negative predictive values to detect fetal academia[1,6-11]. However, the North American 3-tier has been found wanting[2,12,13] with a major drawback that almost 80% of all FHR tracings fall in the Category II of indeterminate significance[12]. “Early” decelerations are extremely rare and all other FHR decelerations (late and all variable decelerations) fall in the category II[12]. The clinicians can “continue to observe”, “evaluate further” or “deliver” on individualized basis as no management algorithm can be prescribed for category 2, which has been a major criticism[12,13]. In the United Kingdom, there have been no significant clinical trials of the 3-tier system but the concept of “atypical variable decelerations” (with major impact on classification of CTGs) has been found to be flawed[4]. The National Institute for Health and Clinical Excellence (NICE) has recently abandoned the sub-classification of variable decelerations into “typical” and “atypical”[6]. Moreover, the CTG has a 99.8% false positive rate in predicting CP (cord pH < 7.00)[3] and there is scant evidence if at all that EFM has improved neonatal well-being. The very high disappointment with EFM has led Sartwelle et al[5] to argue that EFM is a “junk science”. They propose that during medico-legal proceedings, the evidence from EFM should be considered invalid and inadmissible, based on the “Daubert doctrine” which excludes “junk science” from the courtrooms. They make a strong reasoned argument for a “change in course or abandonment of the ship (i.e., EFM)”; a wake-up call for Obstetricians[5].
This brief analytical review is mainly directed at Obstetricians and midwives and intended to encourage a wide debate on the current perspectives, possible deficiencies, remedies, and future developments. It is not a “systematic review” of EFM which has already been presented by NICE and other national professional bodies[1,6,8,10,14,15]. This limited review is not intended to be entirely comprehensive or all-inclusive. Indeed some facets of EFM, not essential to the thesis presented, would be outside the remit of this paper. The issues and opinions described remain controversial by very nature but need to be debated and some experts may hold different views. The main focus is “CTG and FHR patterns especially decelerations”. Other techniques like “intermittent auscultation (IA) of FHR”, fetal electrocardiography (ECG) (STAN or ST analysis), fetal oximetry and computerised CTG interpretation will be discussed briefly and not in-depth. The 3-tier systems of CTG interpretation in specific or proposing an alternative “proven” system are not the subject or purpose of this review.
EFM AND EVIDENCE BASED MEDICINE
EFM (CTG) became a routine clinical practice following pioneering work of Hon and Caldeyro-Barcia in 1950s[16,17]. Following the revolution of “evidence based medicine” (EBM) since 1980s, EFM has been subjected numerous clinical studies and trials. The International Federation of Gynaecology and Obstetrics (FIGO)[11] first proposed a 3-tier system of graded FHR abnormalities, variations of which have been adopted by most national guidelines to standardise the terminology as well as clinical intervention. But none of the national 3-tier classifications were published with the estimate of their sensitivity or false positivity[18]. A recent high quality study found no correlation of American 3-tier system to neonatal acidemia[13]. Although there are studies showing good correlation between CTG and neonatal acidemia, the overall quality of available evidence of reliability of CTG can be summarized in the words of NICE as, “The evidence is of moderate and low quality that there is moderate to low degree of association between different FHR parameters and neonatal acidosis”[6]. The extensive experience, evaluation and application of EBM has done very little to lessen these controversies. The reasons seem to be as follows: (1) the significant variations in definitions and grading of FHR parameters in different studies over the years make it extremely difficult to draw valid conclusions. Moreover, different and variable benchmarks of fetal outcome like pH (7.05 to 7.20); base deficit (-8.0 to -12.0), lactate, Apgar scores, etc. further complicate comparisons; (2) outcomes of importance (e.g., hypoxemic ischemic encephalopathy) are very rare so that large numbers of cases would be needed to show a difference; (3) it is almost impossible to separate “treatment effect” because the intervention in presence of abnormal CTG modifies the neonatal outcome. It is unethical and impractical to conduct truly blinded randomised controlled trials (RCTs); (4) the fetal heart rate is only a surrogate for fetal hypoxia and not a very good one[6]; (5) complex tasks of “pattern recognition” together with clinical evaluation may not be captured in simple algorithms and not reflected in the research trials and reviews[6]. Previous studies have often produced contradictory results owing to methodological and logistical limitations. This also means that definitive evidence from clinical studies (hard to come) need not be a precondition for the reform of CTG interpretation if prompted and supported by careful systematic observation, deliberation and restoration of physiological principles. There is a place for Bayesian approach with variable emphasis on observational data[19]; and (6) framing and Confirmation biases: These poorly recognised biases seem important correctible factors. “Anchoring/framing bias” is the tendency to create coherent initial picture without examining all available information[20-22]. “Confirmation bias” follows when we selectively focus upon evidence that supports our beliefs, while ignoring more comprehensive evidence that disproves these idea[20-22]. These biases are said to be very common and may particularly apply to interpretation of FHR decelerations[22] which are centre-stage in interpretation of CTG[23].
Despite the lack of good quality evidence of improvement in neonatal outcome overall, many Obstetricians believe that failure to use EFM may lead to bad outcome. The CTG generates an explicit confirmation and documentation of FHR which can be reassuring to patients and health workers, but has a potential to provoke anxiety as well. Recently the author conducted a survey of personal preferences of clinicians in our Institute with existing liberal culture of “IA of FHR”. They were reminded of the evidence that CTG is not reliable in preventing CP or intrapartum hypoxia even in the presence of risk factors but does increase the operative delivery rate. All 14 Obstetricians and 11 of 15 midwives still chose to have CTG for themselves or their partners in the presence of risk factors. Thus, there seems a disconnect between the “evidence” and the beliefs/experience of birth-attendants themselves. Secondly, they seem to judge any harm from EFM in a different context/balance.
POSSIBLE REMEDIES TO IMPROVE CTG INTERPRETATION
There is a scope to make significant improvements and international congruence in CTG interpretation although one could not expect it to become a perfect test. The remedial measures could start by addressing variations and flaws/biases in categorization of FHR decelerations, variations in 3-tier CTG interpretation systems, standardization of CTG recording speed and refining the place of confirmatory tests of fetal well-being like fetal scalp blood sampling (FSBS).
VARIATIONS IN CATEGORIZATION OF FHR PARAMETERS
There is relative consistency and uniformity in defining normal and abnormal baseline FHR, baseline variability and accelerations in different countries on both sides of the Atlantic over the last 50 years. However, the opposite is true when it comes to FHR decelerations which are more complex to interpret by very nature. Unfortunately, unambiguous and specific standardised definitions of FHR decelerations are missing in NICE guidelines[5,6] thus leaving several different sources to propose their own definitions often based on arbitrary and sometimes unscientific or implausible (fictional) criteria[4]. This can constitute major framing biases which can go on to corrupt subsequent systems of interpretation incorporating them. FHR decelerations are the most common aberrant features and thus often deterministic in classifying CTGs into the 3-tier systems. Indeed the FHR decelerations were the “low hanging fruit” (generally with major correlation to outcomes) which were immediately picked up by the pioneers like Hon et al[16] and the group of Caldeyro-Bracia[17] based on clearly discernible observational evidence[22]. They categorised FHR decelerations based on time relationship to contractions only as indeed reflected in the terminology itself[16,17]. As a secondary hypothesis, they proposed that the early decelerations may be the result of head compression and those with variable time relationship to contractions may be due to cord compression although their classification was not primarily “etiological”[16,17,24]. Similar categorization was rooted and practiced in British Obstetrics until very recently which meant “early” decelerations were the most common variety (Figure 1)[25-28]. FHR decelerations can be said to be of two main types, one due to benign parasympathetic (vagal) reflex and other due to hypoxic (chemoreceptor) vagal reflex or direct suppression of myocardium in later stages[24]. The clue to differentiating this is in the “timing” rather than “shape” since hypoxia during contraction has a lag time to develop or worsen[24]. FHR decelerations which start recovering immediately after the peak of contraction (early timing) are not likely to have hypoxic component and hence it would be important to appropriately recognise them as benign (“early”). On the other hand the etiology of decelerations would always remain putative and possibly multifactorial with one of the causes predominant[24]. Although the current American and European categorizations of FHR decelerations seems to claim its foundation and legitimacy from the pioneering work and terminology of Edward Hon, they constitute a significant departure from Hon’s original description[16]. Sometime in the late 20th century the American classification of decelerations seems to have become primarily “etiological” despite the many pitfalls. All decelerations with rapid descent were presumed to be due to cord compression even though head-compression could also cause rapid decelerations[23-25]. Thus all rapid decelerations were by definition called “variable” even though majority of them started early during contractions with nadir corresponding to the peak of contraction (early timing). This paradoxically made “early benign decelerations” extremely rare in the recent American practice. Does this represent a framing bias in need of correction? Terms that are specific, precise and truly descriptive (embody what it says on the tin) tend to be useful or convey meaningful information. Misleading (ambiguous) terminology can lead to loss of meaning. Moreover, the major focus on putative “repeated cord compression” as a main cause for development of fetal hypoxemia lacks clinical evidence and ignores the most likely cause of hypoxemia namely the repeated drop in maternal uteroplacental perfusion during contractions (especially on the background of reduced uteroplacental reserve or excessive uterine action). Etiological classification (misconstrued?) placing vast majority of FHR decelerations in the category of “variable decelerations” - based on unscientific hypotheses and misjudged application of animal experiments[22,24] - does not seem to have worked and indeed has been suggested to lead to loss of meaning. The pathophysiological hypotheses proposed for “cord compression decelerations” have several contradictions and “rapid” descent of decelerations does not discriminate between decelerations due to cord or head compression[22,24]. The heterogeneity in categorization of FHR decelerations and the interpretation of their significance (pathological nature) has been so great over the years and in different studies in various countries that it has become impossible to draw any valid conclusions from the huge amount of clinical studies available in the literature. Although, more meaningful research is always welcome, it has been hard to come in this field, and should not be a precondition for examining validity of every aspect of CTG interpretation. Debating and arguing in a deliberative and interactive context can also help us to reach valid conclusions or closer to the scientific truth[22]. There is a substantial observational and experimental evidence that shape or rate of descent of FHR decelerations does not correlate to etiology of decelerations or fetal condition[17]. It would be greatly beneficial to reform the categorization of FHR decelerations in the USA and Europe correcting the framing/confirmations biases and flaws - the compatibility of which with scientific practice can be debated. Such a reform could go a long way in improving the reliability and further evolution of 3-tier systems. Simply standardization/uniform adoption and application of EBM principles on their own are unlikely to compensate for fundamental framing and confirmation biases.
Figure 1 Diagrammatic representation of early, late and variable decelerations as practiced in British Obstetrics before 2007 (Reproduced with kind permission from “Principles of Obstetrics” by Bryan Hibbard, 1988)[26].
Note the apparent rapid descent of early decelerations. CTG paper speed 1 cm/min.
CTG RECORDING SPEED
On the surface, it may seem unimportant that there is a difference in the CTG recording speed in different countries viz 3 cm/min in North America and 1 cm/min in United Kingdom and Australia-New Zealand. However, the CTG speed represents the horizontal scale and determines the “apparent” shape of the FHR waveforms especially decelerations. Thus gradual decelerations (U shaped) on an American CTG would appear rapid (V shaped) on British CTG. In fact the faster speed of CTG (3 cm/min) may have (erroneously?) drawn more attention to the shape of the deceleration waveform. Baseline variability, accelerations and decelerations can be judged quite well on CTGs with both speeds. With abandonment of reliance on so called atypical features of FHR decelerations, it is no longer necessary to look for FHR variability during a deceleration, which may have been possible with faster CTG speed of 3 cm/min. In any case this was never found practically useful. The slower CTG speed (1 cm/min) does seem to have one distinct advantage that FHR patterns over much longer time periods, e.g., 30-60 min can be visually examined and interpreted at a glance whether on a paper tracing or on a monitor screen. Since visual characterization and analysis of FHR waveforms has been of such critical importance, it would be highly desirable to adopt one uniform speed of CTG tracing across the globe to reduce heterogeneity in description and interpretation.
CONFIRMATORY/ADJUNCTIVE TESTS OF FETAL WELL-BEING
FHR is a relatively non-specific and poor surrogate for fetal condition[6]. Thus, with the current CTG interpretation, to achieve a very low false negative value for fetal acidemia, one has to settle for high false positive rate. In some clinical situations abnormal CTG may be enough to expedite delivery. But many times an adjunctive test may be necessary. In the United Kingdom, FSBS is a widely accepted and practiced test. Even FSBS is no stranger to controversy. There are reviews which propose that addition of FSBS to CTG interpretation does not improve outcomes or reduce operative intervention[29]. However, such meta-analyses are fraught with significant flaws and biases arising from dubious and variable quality of studies included. This seems another example as to how evidence from meta-analysis of several clinical studies runs counter to practically observed benefits of a long accepted practice. Vast majority of British hospitals use FSBS and find it practically useful in every day practice and hence it is unlikely that FSBS will be abandoned in British practice any time soon. FSBS following an abnormal CTG is quite often shows normal result thus allowing continuation of labor and often achieving vaginal delivery. It seems possible that FSBS result may be falsely positive (acidemic) because of stasis of blood flow in peripheral tissues especially in the presence of significant caput; but it is the extremely low “false negative” rate of FSBS that makes it very useful in day to day practice. FSBS is uncommon in American practice, but that probably leaves a gap to be filled. Fetal scalp stimulation test and vibroacoustic stimulation test seem promising[15] but they need to be more extensively and systematically studies. The role of fetal ECG (STAN) is discussed later.
IA
NICE suggests that about 45% of all labors are at low risk for fetal hypoxia and strongly recommends IA for these labors with fairly specific criteria to switch over to CTG[6]. The RCTs of IA vs CTG have shown equivalent perinatal outcome with reduced operative intervention in low risk labours[3]. However, even meta-analysis of these trials is underpowered to show possible differences because of rarity of serious adverse outcomes. Fortunately, the incidence of significant birth asphyxia in the absence of risk factors or an acute intrapartum adverse event is very low. Hence, a few regimes of IA can be quite loose or relaxed without frequent noticeable adverse events. For example, in Netherlands where all home births receive IA only, no structured guidelines are followed and as a convention FHR is auscultated every 2 h or so in the active first stage (personal correspondence with Jonge A de, 2015)[30]. NICE on the other hand recommends counting FHR with Doppler or Pinard stethoscope for 60 s after a contraction every 15 min in first stage and every 5 min in second stage of labor[6,7]. The intention is to detect/suspect late FHR decelerations[6]. In developing countries like China and India, the vast majority of labors (low and high risk) are monitored by IA and local protocols are often unstructured and variable; but are likely to be increasingly modelled on NICE guidelines (personal correspondence). With the increasing use of IA, it is hoped that there may be future refinements[31,32]. However, it seems unlikely that more RCTs of IA vs CTG (requiring very large number of subjects) will be conducted in future.
FETAL ECG
Fetal ECG is recorded with a fetal scalp electrode. The ST segment analysis (STAN) has been practiced for over a decade mainly in Nordic countries but also in a few centres in United Kingdom and United States. Being a new arrival, STAN has undergone relatively ample evaluation in well-designed clinical trials. But, Five RCTs and five systematic reviews with meta-analyses have shown very divergent results[33]. Does this suggest that we may be looking for marginal gains here? Moreover, an absence of clear background, lack of transparency and a sense of Magic Black Box have been associated with STAN[33]. Most importantly and perplexingly, an ST event is supposed to lose its significance if the CTG is “normal”. Hence, STAN results (unlike FSBS) seem dependent on the traditional CTG interpretation. Thus any major changes in the CTG interpretation[2,6] would further complicate the interpretation of the trials of CTG + STAN vs CTG alone by changing the “starting line” as well as the “finishing line”. The largest RCT in United States on 11108 women did not show improved fetal outcomes or reduction in operative delivery unlike some of the trials in Europe[15,34]. The authors of this trial highlighted the need for caution when extrapolating results from studies outside the United States. This seems a major weakness in application of STAN, because the interpretation of “non-reassuring CTG” (e.g., Category II of American 3-tier system) varies markedly between United States, Europe, United Kingdom, Australia and New-Zealand. Moreover, the Category II of American 3-tier system has been shown to be clinically unhelpful and already an additional algorithm[2] has been proposed to identify the “real” non-reassuring CTGs within the Category II, which may become part of ACOG guidelines. At the same time there are some Obstetric units in Europe[35] and Australia (personal correspondence) which have abandoned the use of STAN because of serious adverse outcomes. All these seem major challenges to STAN and a different strategy which evaluates STAN results independent of CTG may need to be considered.
INTRAPARTUM FETAL PULSE OXIMETRY
This technology involves attachment of light emitting sensor to fetal scalp or temple to measure the proportion of haemoglobin that is carrying oxygen: Thus oxygenation. A recent Cochrane review found that intrapartum fetal pulse oximetry (IFPO) as adjunct to CTG (thus again dependent on correct interpretation of CTG) did not improve neonatal outcome or reduce the overall incidence of caesarean[36]. Again IFPO in parallel to CTG and FSBS may need to be subjected to more extensive clinical studies.
COMPUTERISED CTG INTERPRETATION
It is hoped that the long anticipated computer aided analysis of CTG will be more objective and reliable (overcome human factors) and may eventually replace visual CTG interpretation. However, despite the exponential increase in analytical and functional power of digital technology in the last decade, the development and adoption of “computerised CTG” has remarkably lagged behind. The difficulties are providing good quality evidence of its reliability or superiority over visual CTG interpretation, licensing and medicolegal considerations. Secondly, although the computerised CTG interpretation takes away inter-observer variation, there can be variations in FHR signal sampling and processing[15], and there may be a need for standardization probably supported by clinical trials, a challenging task in the field of EFM. Many singular instruments/parameters like “total deceleration area”, “short-term variability”, “approximate entropy” and “phase-rectified signal averaging” have been shown to correlate to fetal status to variable degrees. But a much “stronger” correlation with useful positive and negative predictive values would be required for clinical application. It is worth noting that the FHR patterns during the expulsive second stage are quite different from first stage (more frequent and deeper decelerations and higher variability). A mental adjustment is made for this difference during visual CTG interpretation which may not occur in computerised analysis. Hence, separate/different computerised analysis criteria (e.g., deceleration area) for the first and second stage of labour would be highly desirable or indeed may improve correlation with fetal status[24]. One could argue that the computerised analysis should emulate the principles of visual CTG interpretation. The results of the “INFANT study”[38] evaluating a computer based “Intelligent Fetal Assessment” system (K2MS, Plymouth)[37] vs continuous CTG are eagerly awaited. The “Infant” (Intelligent Fetal Assessment) software emulates the principles of visual interpretation and provides 4 colour-coded categories[37]. Will such softwares need to be recalibrated and re-evaluated when visual CTG interpretation is changed significantly[2,6]? These are major challenges. Another software PeriCALM PatternsTM (PeriGen, Princton, NJ, United States) attempts to recognise EFM patterns based on baseline, baseline variability, FHR decelerations and contractions[18]. The “Infant” as well as PeriCALMTM do not claim to replace human visual CTG interpretation but propose to provide additional support at present. In any case, the recording and archiving all CTGs digitally and testing cord blood gases routinely in every delivery would be highly desirable. This would facilitate well designed retrospective studies which can be very informative especially when prospective RCTs are often impractical and resource-intensive.
COMMON PITFALLS IN INTRAPARTUM FETAL MONITORING
Errors in CTG interpretation are possible at any of the three stages involved namely signal acquisition, interpretation of signal (e.g., FHR pattern) and clinical intervention[18].
Signal acquisition
The technology of obtaining FHR record with external Doppler and fetal scalp electrode has improved remarkably. Both maternal and fetal heart rates should be recorded and displayed which should eliminate mistaking maternal heart rate signal as FHR. A good record of timing and duration of contractions should be obtained in order to correlate them to any FHR decelerations.
Interpretation of FHR patterns
Variation in interpretation of CTG remains a problem although a lot of standardisation of terminology and analysis has been achieved over last 15 years by most national guidelines[1,6-11,14]. The current 3-tier systems are a graded classification of increasing abnormality of combinations of many FHR parameters like baseline, baseline variability and types of decelerations. Experts admit that CTG interpretation still has an element of “art” (expertise and intuition) in addition to “science based rules”[6]. Moreover, if standardisation is of the wrong sort then it is likely to be misleading and counterproductive. Framing/confirmation biases should not be dismissed as only of “academic interest”[39]. A major or critical framing bias would corrupt all consequent structures (e.g., 3-tier systems) and developments. Thus the guidelines for CTG interpretation are still evolving and could undergo further material change. Secondly, similar CTG abnormalities have a more serious implication (higher positive predictive value) in the presence of high risk factors like growth retardation, thick meconium staining, infection, etc. Lastly, several human factors can affect interpretation like tiredness, tunnel vision, and failure of situational awareness etc[18]. These should be addressed by working hours regulations, appropriate staffing levels, fresh eyes approach and regular training updates/skills drills.
Clinical intervention
This final step is directly responsible for improving fetal outcome and safety[18]. The type and speed of clinical intervention has to be fine-tuned to the degree and evolution/progression of CTG abnormality in the given clinical scenario. It is a complex balance to undertake appropriate action without unduly increasing operative intervention. Automated warning systems when based on more reliable computerised CTG interpretation criteria would be very useful to improve patient safety.
CONCLUSION
Visual analysis of complex FHR patterns in response to uterine contractions (CTG) in the context of clinical setting remains the most widely practiced method of intrapartum fetal monitoring. National guidelines, (scientific) standardisation of terminology and structured systems of interpretation are important. However, there has been major criticism and disappointment associated with CTG. Hence, it seems urgent and important to have a fresh unbiased thorough assessment, reform CTG interpretation and eliminate any obvious framing/confirmation biases to restore scientific/physiological basis supplemented by clinical studies. International congruence on most aspects of CTG interpretation (definitions of FHR parameters, CTG recording speed, 3-tier systems, etc.) is highly desirable to facilitate future meaningful clinical studies, evaluation and progress. The birth attendants should apply critical thinking and reflection to all techniques of EFM and clinical cases/context, so that they can develop the ability (science and art) to make that final all-inclusive management decision. They should actively participate in the debate in this very practical and clinical subject thus contributing to the future knowledge and developments.
P- Reviewer: Bernardes J, Chang YL, Hen YS, Zhang XQ S- Editor: Ji FF L- Editor: A E- Editor: Wu HL