Misclassification of smoking habits: An updated review of the literature

doi:10.13105/wjma.v7.i2.31

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 7, Issue 2

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (12465)

All Articles published online

The chart showing PDF series, WORD series, HTML series, Figures (1-1) series, Tables (1-5) series.

Item

Count

PDF

481

WORD

271

HTML

5988

Figures (1-1)

440

Tables (1-5)

528

Sum=7708

Featured Article

The chart showing Browse series, Download series.

Item

Count

Browse

617

Download

1116

Sum=1733

Publishing Process of This Article

Item

Count

Browse

1168

Download

1856

Sum=3024

Feb 22, 2019 (publication date) through Aug 16, 2025

Times Cited of This Article

Times Cited (5)

Journal Information of This Article

Publication Name

World Journal of Meta-Analysis

ISSN

2308-3840

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Meta-Analysis Open Access

World J Meta-Anal. Feb 22, 2019; 7(2): 31-50
Published online Feb 22, 2019. doi: 10.13105/wjma.v7.i2.31

Misclassification of smoking habits: An updated review of the literature

Janette S Hamling, Katharine J Coombs, Peter N Lee

Janette S Hamling, RoeLee Statistics Ltd., 17 Cedar Road, United Kingdom

Katharine J Coombs, Peter N Lee, P.N. Lee Statistics and Computing Ltd., Sutton SM2 5DA, United Kingdom

ORCID number: Janette S Hamling (0000-0001-7788-4738); Katharine J Coombs (0000-0003-0093-7162); Peter N Lee (0000-0002-8244-1904).

Author contributions: Hamling JS and Lee PN planned the study; Coombs KJ carried out the literature searches, assisted by the other authors; Hamling JS carried out the data entry, assisted by Lee PN; Hamling JS carried out the statistical analyses along lines discussed and agreed with Lee PN; Lee PN drafted the paper which was critically reviewed by the other authors.

Conflict-of-interest statement: All the authors are long-term consultants to various tobacco companies and organizations.

PRISMA 2009 Checklist statement: The authors have read the PRISMA 2009 Checklist, and the manuscript was prepared and revised according to the PRISMA 2009 Checklist.

Open-Access: This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Corresponding author: Peter N Lee, MA, Director, Senior Statistician, P.N. Lee Statistics and Computing Ltd., 17 Cedar Road, Sutton SM2 5DA, United Kingdom. peterlee@pnlee.co.uk

Telephone: +44-20-6428265 Fax: +44-20-8642135

Received: November 29, 2018
Peer-review started: November 29, 2018
First decision: December 15, 2018
Revised: January 21, 2019
Accepted: January 21, 2019
Article in press: January 21, 2019
Published online: February 22, 2019
Processing time: 85 Days and 15 Hours

Abstract

BACKGROUND

Misclassification of smoking habits leads to underestimation of true relationships between diseases and active smoking, and overestimation of true relationships with passive smoking. Information on misclassification rates can be obtained from studies using cotinine as a marker.

AIM

To estimate overall misclassification rates based on a review and meta-analysis of the available evidence, and to investigate how misclassification rates depend on other factors.

METHODS

We searched for studies using cotinine as a marker which involved at least 200 participants and which provided information on high cotinine levels in self-reported non-, never, or ex-smokers or on low levels in self-reported smokers. We estimated overall misclassification rates weighted on sample size and investigated heterogeneity by various study characteristics. Misclassification rates were calculated for two cotinine cut points to distinguish smokers and non-smokers, the higher cut point intended to distinguish regular smoking.

RESULTS

After avoiding double counting, 226 reports provided 294 results from 205 studies. A total of 115 results were from North America, 128 from Europe, 25 from Asia and 26 from other countries. A study on 6.2 million life insurance applicants was considered separately. Based on the lower cut point, true current smokers represented 4.96% (95% CI 4.32-5.60%) of reported non-smokers, 3.00% (2.45-3.54%) of reported never smokers, and 10.92% (9.23-12.61%) of reported ex-smokers. As percentages of true current smokers, non-, never and ex-smokers formed, respectively, 14.50% (12.36-16.65%), 5.70% (3.20-8.20%), and 8.93% (6.57-11.29%). Reported current smokers represented 3.65% (2.84-4.45%) of true non-smokers. There was considerable heterogeneity between misclassification rates. Rates of claiming never smoking were very high in Asian women smokers, the individual studies reporting rates of 12.5%, 22.4%, 33.3%, 54.2% and 66.3%. False claims of quitting were relatively high in pregnant women, in diseased individuals who may recently have been advised to quit, and in studies considering cigarette smoking rather than any smoking. False claims of smoking were higher in younger populations. Misclassification rates were higher in more recently published studies. There was no clear evidence that rates varied by the body fluid used for the cotinine analysis, the assay method used, or whether the respondent was aware their statements would be validated by cotinine - though here many studies did not provide relevant information. There was only limited evidence that rates were lower in studies classified as being of good quality, based on the extent to which other sources of nicotine were accounted for.

CONCLUSION

It is important for epidemiologists to consider the possibility of bias due to misclassification of smoking habits, especially in circumstances where rates are likely to be high. The evidence of higher rates in more recent studies suggests that the extent of misclassification bias in studies relating passive smoking to smoking-related disease may have been underestimated.

Key Words: Misclassification; Smoking; Cotinine; Cigarettes; Tobacco use; E-cigarettes; Passive smoking; Bias; Systematic review; Meta-analysis

Core tip: We update a meta-analysis of evidence on accuracy of reported smoking, using cotinine as a marker. From 200+ studies, we estimated various misclassification rates. True smokers represented 3.00% (2.45%-3.54%) of reported never smokers and 10.92% (9.23%-12.61%) of reported ex-smokers. Reported never and ex-smokers formed 5.70% (3.20%-8.20%) and 8.93% (6.57%-11.29%) of true smokers. Falsely claiming never smoking was extremely common in Asian women. Rates of falsely claiming quitting were high in pregnant women and diseased individuals advised to quit. Smoking misclassification causes overestimation of true passive smoking relationships, a problem exacerbated by increasing misclassification rates in recently published studies.

Citation: Hamling JS, Coombs KJ, Lee PN. Misclassification of smoking habits: An updated review of the literature. World J Meta-Anal 2019; 7(2): 31-50
URL: https://www.wjgnet.com/2308-3840/full/v7/i2/31.htm
DOI: https://dx.doi.org/10.13105/wjma.v7.i2.31

INTRODUCTION

When interviewed, someone may deny current or past smoking habits, or even falsely claim to be a smoker or to have smoked. While random misclassification of smoking habits tends to understate true relationships of disease with smoking, it may overstate relationships with spousal smoking. This overstatement arises because studies of the effects of spousal smoking are typically conducted in self-reported never smokers and because smokers tend to marry smokers. Thus, random misclassification of smoking results in a higher proportion of misclassified true smokers in the group whose spouse smokes[1]. In studying the relationship with disease of a variable correlated with smoking, smoking misclassification also affects the extent to which the statistician can adjust for confounding by smoking. Tzonou et al[2] notes that even a 10% error in a confounding variable leaves about half the confounding effect remaining after adjustment.

To determine the likely extent of bias, it is clearly advantageous to obtain information on the extent of inaccuracy in reported statements on smoking. One approach (not considered here), which gives information on the extent to which smokers may deny past smoking, is to compare statements made at separate time points. A second approach (the subject of this review) is based on studies using cotinine (a metabolite of nicotine), typically measured in blood, saliva or urine, as an objective indicator of recent smoking. Levels higher than an appropriate cut-off cannot arise from passive smoking or dietary sources of nicotine, and must in practice have arisen from smoking, smokeless tobacco, nicotine replacement therapy or, in recent years, electronic cigarettes[3-6].

Over 20 years ago, Lee and Forey[7] reviewed evidence from 35 studies where smoking habits were validated by cotinine, and since then other reviews have considered some of the evidence[8-12]. However, these reviews are mostly quite old and, as will become apparent, none consider more than a fraction of the relevant evidence. Here we present a detailed review of the evidence, although, as there are numerous studies using cotinine to validate smoking status, we restrict attention to those measuring cotinine in urine, saliva or blood (serum or plasma) in 200+ participants. Also, as interest is mainly in high cotinine levels in self-reported non-smokers, we exclude studies restricted to self-reported smokers. However, providing a study presents the required data for self-reported non-smokers, we do summarize data on low cotinine levels in self-reported smokers. We also exclude studies of young children, who would have a very low likelihood of smoking.

MATERIALS AND METHODS

Study inclusion criteria

These include: At least 200 participants with cotinine levels determined in saliva, urine, blood (serum or plasma); data available on misclassification rates in self-reported non-smokers, never smokers or ex-smokers (or self-reported non-, never- or ex-tobacco users); data in populations reasonably likely to smoke (i.e., not infants or young children); and published in English.

The study also had to provide data for cotinine cut points distinguishing smokers from non-smokers. For plasma, serum and saliva, the lower cut point (Cut 1) had to be in the range 8-35 ng/mL, while for urine it had to be within 50-150 ng/mL, these covering the range of cut points commonly considered appropriate: Reports showing the bimodal distribution of non-smoker and smoker cotinine in saliva[13,14], serum[15,16] and urine[17,18] support these ranges as do the ranges for non-smokers and smokers found in individual studies[9] and the ranges used in other analyses[13,19]. For some analyses, we used a higher cut point (Cut 2), as used by some researchers[20-26]. To ensure distinct ranges for Cut 1 and Cut 2 we required that, for plasma, serum and saliva Cut 2 had to be at least 50 ng/mL, while for urine it had to be 250-750 ng/mL. Studies using 10%, or 30%, of the mean smoker value for distinguishing smokers were also accepted as providing equivalent data to, respectively, Cut 1 and Cut 2.

Misclassification rates

Suppose one has data from a study as follows: Self-reported smoking habits [Non-smoker (Participants studied: A, Number misclassified: E); Never-smoker (Participants studied: B, Number misclassified: F); Ex-smoker (Participants studied: C, Number misclassified: G); Current smoker (Participants studied: D, Number misclassified: H)].

Misclassified participants are those with cotinine levels above the defined cut point for non-smokers and those below the cut point for current smokers. Noting that A = B + C and E = F + G we sought to derive the following “misclassification rates”, with “true” status based on cotinine levels: Rates M1-M3: Percentage of self-reported non-smokers (E/A), never smokers (F/B) or ex-smokers (G/C) whose cotinine implies current smoking (“true current smokers”); Rate M4: Percentage of self-reported current smokers (H/D) whose cotinine implies non-smoking (“true non-smokers”). This may include occasional smokers who did not smoke in the days leading up to the sample being taken for cotinine analysis; Rates M5-M7: Percentage of true current smokers who report being non-smokers [E/(D - H + E)], never smokers [F/(D - H + E)] or ex-smokers [G/(D - H + E)]; Rates M8-M10: Percentage of self-reported current smokers plus misclassified non-smokers who report being non-smokers [E/(D + E)], never smokers [F/(D + E)] or ex-smokers [G/(D + E)]; and Rate M11: Percentage of true non-smokers who report being current smokers [H/(A - E + H)]. As for rate M4, this may include some occasional smokers.

Not all these rates can be calculated, often because data are unavailable for self-reported current smokers or non-smokers are not separated into never- and ex-smokers. While, assuming that cotinine is the gold standard, rates M5-M7 are theoretically superior to rates M8-M10 for estimating the extent current smokers deny smoking, studies where cotinine is only measured on reported non-smokers provide no estimate of H, so preclude estimation of rates M5-M7.

Where possible, rates were calculated for both Cut 1 and Cut 2. Where a study provided a choice of cut-offs for plasma, serum or saliva, we used that closest to 20 ng/mL for Cut 1 and that closest to 100 ng/mL for Cut 2. For urine, we used cut points closest to 100 ng/mL and 500 ng/mL respectively. These represent the mid-point of the ranges used.

Literature sources

We considered, in turn, four information sources: A previous attempt to summarize relevant data[7]; papers filed under “COT” in the P.N. Lee Statistics and Computing Ltd. database, accumulated over many years; a search on PubMed using the term “cotinine”; and studies referenced in misclassification review papers discovered in our searches. Initially, papers were accepted based on the study inclusion criteria described above, with doubtful cases resolved following intra-author discussions.

Data recorded

For each study report, data were extracted by one of us and checked by another. Recorded study characteristics included the source reference, location, sexes studied, representativeness of the sample, whether participants were aware their samples would be tested for smoking, body fluid and assay method used for cotinine assay, cut-offs used, whether smoking groups were differentially sampled, whether results were separately available for never and former smokers, and whether the sample was of the general population, pregnant women, from both arms of a case-control study, or of diseased individuals. Data from each study included the numbers of participants in the relevant smoking/non-smoking groups, and the numbers in these groups with cotinine values indicating misclassification. Where necessary, numbers were estimated from data provided in figures.

We also recorded information on the smoking (or tobacco use) index studied, and a study quality measure based on the extent of account taken of other nicotine sources that could produce cotinine levels above the cut point, such as other smoking products (pipes, cigars), smokeless tobacco (snuff, chewing tobacco), nicotine replacement therapy (gums, patches) and e-cigarettes.

The smoking indices considered were cigarette smoking, smoking (of any product) and any tobacco use (smoking or smokeless tobacco use). For all three indices, we recorded whether individuals using nicotine replacement therapy or e-cigarettes had been excluded from the estimation of misclassification rates and, if not, whether the author had referred in the source paper to nicotine replacement therapy or to e-cigarette use as possible confounders. For smokers (of any product) we similarly recorded data on consideration of smokeless tobacco, while for cigarette smokers specifically, we also recorded data on smoking of other products. A study was considered of good quality if users of all non-index tobacco products had been excluded from analysis.

Adjusting rates for differential sampling

In a few studies, populations were differentially sampled by reported smoking habits. This has no effect on rates M1-M4, as the calculation is within smoking group. However, for rates M5-M11, failure to consider differential sampling would bias rate calculations. We avoided this by calculating adjusted numbers of participants and misclassifieds. Two relevant situations occurred. In the first, results were only available for non-smokers and current smokers, sampled in the ratio 1:S. With asterisks indicating adjusted numbers, we used the formulae D^* = D/S, H^* = H/S, A^* = A and E^* = E.

In the second situation, results were separately available for never-, ex- and current smokers, the groups being sampled in the ratio 1:U:V. Here, the adjusted numbers were B^* = ZB, C^* = ZC/U and D^* = ZD/V, where Z = (B + C)/(B + C/U) is a scaling factor set so the adjusted and observed numbers of non-smokers are equal. The adjusted numbers of misclassified individuals were then obtained by multiplying the adjusted numbers of participants by the observed misclassification rate, i.e., F* = B^*F/B, G^* = C^*G/C and H^* = D^*H/D.

Avoiding double-counting

It was necessary to ensure use of the greatest amount of information while avoiding double-counting as far as possible. This was particularly difficult for some large studies where many reports are available. Various rules were defined to avoid double-counting. Thus, results from a single study should not be included in the same analysis for sexes combined and individually, or (except for analysis by body fluid) for more than one body fluid. Also, when study results are reported in multiple publications (or in multiple forms in one publication), we preferred rates based on the most participants, for all four smoking groups than just some, for males and females separately rather than combined, and results not based on differential sampling.

Meta-analyses and meta-regressions

Mean misclassification rates with 95%CIs were estimated by analysis of variance, weighted by the number of participants the specific rate estimate was based on. Analyses were carried out based on all available results (avoiding double-counting) and, for Cut 1 only, by levels of various factors. These were body fluid, assay method, study type (separating studies of the general population, of pregnant women. and other studies - of diseased individuals and case-control studies), age, participants’ awareness that cotinine samples were used to validate their reported smoking habits, period of publication of the source paper, study quality (as described above), for studies of women whether they were pregnant or not, the index of smoking (or tobacco use), sex, location, and the interaction of sex and location. Where rates were estimated by factor level, the significance of differences between factor levels was estimated by a heterogeneity test.

A publication by Palmier et al[26] reported results from a study of urine samples of about 6.2 million life insurance applicants, providing data only for Cut 2. This study involved more participants than all the other studies combined so including its results would have meant the overall weighted estimates were dominated by its contribution. We therefore excluded it from the meta-analyses and present its results separately. For each rate we also carried out a multivariate analysis. This involved a stepwise procedure successively including the most significant factor, stopping when no further factor was significant at P < 0.01.

RESULTS

Literature searches

Figure 1 summarizes the literature searches carried out. Our earlier review[7] presented results from 36 studies provided in 30 publications and two personal communications. Four publications[27-30] were rejected as reporting studies based on fewer than 200 cotinine measurements, as were two of three studies reported in another publication[31]. Two reports[32,33] have been replaced by a later fuller report[11]. One report[34] was superseded by a later report filed under “COT” in our in-house database[35]. This left 26 sources reporting 29 studies.

Open in New Tab Full Size Figure Download Figure

Figure 1 Data sources and processing.

Of 767 publications filed under “COT”, 32 were already considered in our earlier review[7] and four were reviews of misclassification studies[9-12]. This left 731 for further consideration. Of these, 591 failed our inclusion criteria and 33 provided inadequate data (e.g., having a cut point too low, testing using a substance with no accepted cut point, such as hair or umbilical cord serum, or providing too little information). Checking the bibliographies of the four reviews yielded nine additional data sources. This resulted in 116 publications providing useful data for 119 studies.

A PubMed search on “Cotinine” on 5th January 2017 produced 4353 hits. Of these 3577 were rejected from inspecting abstracts and 226 had already been considered. The remaining 550 publications were obtained and examined in more detail. Four hundred and twenty-three failed the inclusion criteria or provided inadequate information, leaving 127 for further consideration, these providing 130 study reports. Examining the reference lists in five further reviews produced no additional relevant references. Overall, therefore, there were 278 study reports from 269 sources.

Avoidance of double-counting

Supplementary File 1 describes our attempts to limit double-counting. It gives details of each study reported by more than one publication and each publication reporting more than one type of misclassification data, such as results at several stages (e.g., early pregnancy, late pregnancy) or for more than one body fluid or cotinine assessment method. For each such study Supplementary File 1 identifies the data available, which are to be excluded from analysis and, for the data to be included, whether it should always be included or only included in some analyses. For rejected reports, the reason for rejection is given, often because it reports a smaller sample size than given elsewhere. Where there is no difference in sample size, other reasons for rejection are given, such as a non-conventional cotinine test method or data for sexes combined when alternative sources give data by sex.

The decisions on inclusion or rejection took account of the smoking categories reported. For example, where one report gave results for non-smokers and current smokers but another for the same study gave only never smoker results, both sources could be included because the calculation of misclassification rates considers either non-smokers or never smokers. By this process, 52 study reports were excluded, leaving 226 study reports on 205 separate studies.

Often a study report provides multiple results. Many studies report males and females separately. Some reports split their analyses of misclassification by other factors, including race, age, study years (for studies conducted annually) and study arm (pregnant/non-pregnant, cases/controls). Consequently, our dataset of detailed results contains more entries, 294, than there are study reports, 226.

Study characteristics

Details of the main characteristics of each study and of the study reports used in analysis are given in Supplementary File 2. Table 1 gives the number of results analysed for each characteristic that was used as a factor in analysis. Of the 294 results, 11% were from studies considered in the 1995 review, a further 37% being from studies reported before 2003, the remaining 52% being reported later. Most results (83%) related to studies in Europe and North America, with the rest about equally split between Asia and other locations. Fifty-six percent of results were sex-specific, with more for females (40%) than males (16%), due to the large number of results for pregnant women, 19% of the total. The studies in pregnant women also formed a substantial proportion of results classified as “young”. Most results (64%) related to the general population.

Table 1 Distribution of study characteristics among the 294 detailed results.

Factor	Level	No. of results analysed^a
Body fluid	Urine	78^a
	Saliva	90
	Blood	126
Assay method	Chromatography	93
	Spectrometry	72
	Immunoassay	108
	Other	21^a
Age group^b	Young	103
	Not young	35
	All ages	108^a
	Not stated	48
Study type	General population	189^a
	Pregnancy	57
	Diseased or case-control	48
Awareness of validation by cotinine	Yes	22^a
	No	47
	Not specified	225
Time of publication	Studies considered in the 1995 review	31
	Studies reported before 2003	109
	Studies reported later	154^a
Study quality	Good	36
Study quality	Not good	258^a
Pregnancy (women only)	Not pregnant	61
Pregnancy (women only)	Pregnant	57
Tobacco products considered	Cigarettes	108
	Any smoking	160
	Any tobacco	26^a
Sex	Females	118
	Males	48
	Combined	128^a
Location	Canada/United States	115^a
	Europe	128
	Asia	25
	Other	26

^aFactor levels with this superscript are the levels applicable to the very large study by Palmier J, Lanzrath B, Dixon A and Idowu O[26] considered separately in our analyses.

^bStudies varied in how they reported the age range studied, sometimes giving a specific range of ages, sometimes a mean age and sometimes no information.

The categories were based on the available age information as follows: Young: Upper age limit < 50 or mean age < 30 or a pregnancy study; Not young: Lower age limit 30+ or mean age 60+, thus excluding young people; All ages: Lower age limit < 30 and upper age limit 50+ or lower age limit < 30 and mean age 30+; Not stated: All other combinations.

The majority of results (63%) were from studies not specifying whether participants were aware their self-report would be validated, with only 7% (including the very large Palmier et al[26] study) from studies where participants were aware. Of all the results, 43% were based on blood samples, 31% on saliva, and 27% on urine. Self-report related to cigarette smoking specifically for 37% of results, to any smoking for 54% and to any use of tobacco for the remaining 9%. Only 12% of results were classified as “good” quality.

Misclassification rates

Full details of all meta-analyses and meta-regressions are given in Supplementary File 3, while Supplementary File 4 gives a series of tables presenting results for Cut 1 by the levels of each factor, referred to below as Supplementary Tables 1 and 2, etc.

Overall rates

Table 2 presents overall meta-analysis estimates of each misclassification rate, based on both cut points, as well as estimates from the very large study[26]. Using Cut 1 the percentage of reported non-smokers who are true smokers according to cotinine, M1, is 4.96%. The percentage of true smokers is lower for reported never smokers, M2 = 3.00%, and higher for reported ex-smokers, M3 = 10.92%. As expected, these three rates are lower using Cut 2; and M4, the percentage of self-reported current smokers with cotinine level below the cut point, is higher for Cut 2 than Cut 1. Rate M4 is particularly high in the Palmier study[26], where the urine-based cut point was 500 ng/mL.

Table 2 Misclassification rates from Palmier et al[26] and from the other studies combined by meta-analysis (based on weighted analysis, avoiding double-counting).

Rate		Other studies combined				Palmier
		Cut 1^a		Cut 2^b		Cut 2^b
		n	Rate (95%CI)	n	Rate (95%CI)	Rate
M1	% of self-reported non-smokers whose cotinine implies current smoking	209	4.96 (4.32 to 5.60)	65	3.66 (2.68 to 4.65)	2.01
M2	% of self-reported never smokers whose cotinine implies current smoking	86	3.00 (2.45 to 3.54)	22	2.34 (1.28 to 3.41)	-
M3	% of self-reported ex-smokers whose cotinine implies current smoking	88	10.92 (9.23 to 12.61)	24	6.79 (4.60 to 8.98)	-
M4	% of self-reported current smokers whose cotinine implies non-smoking	142	9.67 (7.73 to 11.61)	44	18.48 (14.46 to 22.50)	53.08
M5	% of true current smokers who report being non-smokers	136	14.50 (12.36 to 16.65)	43	10.42 (5.91 to 14.92)	19.31
M6	% of true current smokers who report being never smokers	52	5.70 (3.20 to 8.20)	13	4.34 (0.19 to 8.49)	-
M7	% of true current smokers who report being ex-smokers	52	8.93 (6.57 to 11.29)	13	7.89 (4.07 to 11.71)	-
M8	% of self-reported current smokers (plus misclassified non-smokers) who report being non-smokers	185	11.59 (10.00 to 13.17)	60	7.92 (5.19 to 10.65)	10.10
M9	% of self-reported current smokers (plus misclassified non-smokers) who report being never smokers	66	4.64 (2.73 to 6.54)	21	4.02 (1.68 to 6.35)	-
M10	% of self-reported current smokers (plus misclassified non-smokers) who report being ex-smokers	66	7.72 (5.95 to 9.50)	21	5.69 (3.54 to 7.84)	-
M11	% of true non-smokers who report being current smokers	137	3.65 (2.84 to 4.45)	43	7.67 (6.14 to 9.20)	8.84

^aThe lower cut point.

^bThe higher, more conservative, cut point.

Using Cut 1, rates M5 to M7, which have cotinine-defined current smokers as the base, are again higher for reporting of ex-smoking than of never smoking and, as for rates M1 to M3, are lower using Cut 2. As for M4, the percentage of true current smokers who report being non-smokers (M5) is high in the Palmier[26] study. The pattern of rates for M8 to M10 is similar to that for M5 to M7, though the misclassification rates are somewhat lower. Rate M11, which has cotinine-defined non-smokers as the base, is higher for Cut 2 than for Cut 1, as was noted above for M4.

Table 3 shows, for each rate definition, the distribution of available rate values using Cut 1. This illustrates the variability of the data. Table 3 also indicates where the median value lies. For all eleven rate definitions, some misclassification rate values were lower than 2%, while for all except M2 and M11, some exceeded 50%.

Table 3 Distribution of misclassification rate values from the studies included in Table 2 for the lower cut point (Cut 1).

		< 2%	2% to < 5%	5% to < 10%	10% to < 25%	25% to < 50%	> 50%	Total
M1	% of self-reported non-smokers whose cotinine implies current smoking	45	74¹	51	31	3	5	209
M2	% of self-reported never smokers whose cotinine implies current smoking	44¹	25	13	4	0	0	86
M3	% of self-reported ex-smokers whose cotinine implies current smoking	2	7	31	38¹	7	3	88
M4	% of self-reported current smokers whose cotinine implies non-smoking	16	29	34¹	42	18	3	142
M5	% of true current smokers who report being non-smokers	5	18	28	52¹	22	11	136
M6	% of true current smokers who report being never smokers	18	13¹	10	7	2	2	52
M7	% of true current smokers who report being ex-smokers	4	16	17¹	7	7	1	52
M8	% of self-reported current smokers (plus misclassified non-smokers) who report being non-smokers	11	30	42	65¹	28	9	185
M9	% of self-reported current smokers (plus misclassified non-smokers) who report being never smokers	25	19¹	11	7	2	2	66
M10	% of self-reported current smokers (plus misclassified non-smokers) who report being ex-smokers	7	21	22¹	10	5	1	66
M11	% of true non-smokers who report being current smokers	45	51¹	27	12	2	0	137

¹Includes the median.

Variation in rates by other factors

Table 4 summarizes, for each factor considered, the significance of the differences in rates between the levels of the factor (using Cut 1, univariate analysis). Supplementary File 4 gives full details of these analyses. These findings are discussed in the following sub-sections.

Table 4 For each factor, the significance of the differences in rates between the factor levels: Cut 1, univariate analyses.

Rate		Body fluid	Assay method	Study type	Age group	Aware validated	Time published	Study quality	Pregnancy	Tobacco products	Sex	Location	Sex × location
M1	% of self-reported non-smokers whose cotinine implies current smoking	NS	NS	^c¹	NS	NS	NS	NS	NS	NS	NS	NS	NS
M2	% of self-reported never smokers whose cotinine implies current smoking	NS	NS	NS	^b	NS	NS	^a	NS	^d¹	NS	^b	^d
M3	% of self-reported ex-smokers whose cotinine implies current smoking	^a	^b	^d¹	^d	^c	^a	NS	^d	^a¹	^a	^c	^b
M4	% of self-reported current smokers whose cotinine implies non-smoking	NS	^b	^d¹	^d¹	NS	^d¹	NS¹	NS	NS	NS	NS	NS
M5	% of true current smokers who report being non-smokers	NS	NS	^b	NS	NS	^d¹	NS	NS	NS	NS	NS	^c¹
M6	% of true current smokers who report being never smokers	NS	NS	NS	NS	NS	NS	NS	NS	NS	^b	NS	^d¹
M7	% of true current smokers who report being ex-smokers	NS	NS	^d¹	^c	NS	^b	NS	^a	NS	NS	NS	NS
M8	% of self-reported current smokers (plus misclassified non-smokers) who report being non-smokers	NS	NS	NS	^a	^b	^d¹	^a	NS	^b	NS	NS	^d¹
M9	% of self-reported current smokers (plus misclassified non-smokers) who report being never smokers	^a	NS	NS	NS	NS	NS¹	NS	NS	NS	^b	NS	^d¹
M10	% of self-reported current smokers (plus misclassified non-smokers) who report being ex-smokers	NS	NS	^d¹	^c	NS	^b	NS	^b	NS	^b	NS	NS
M11	% of true non-smokers who report being current smokers	^b	NS	^d¹	^d¹	NS	NS	NS	NS	NS	NS	^a	NS

^dP < 0.001.

^cP < 0.01.

^bP < 0.05.

^aP < 0.1.

NS (not significant): P ≥ 0.1.

¹For each rate (M1-M11) identify the variables that, in multivariate analysis, were independently statistically significantly (P < 0.01) associated with the misclassification rate. NS: Not significant.

Body fluid

There is little evidence that misclassification rates vary by whether cotinine was measured in urine, saliva or blood. Only one rate, the percentage of true non-smokers claiming to be current smokers (M11), showed variation significant at P < 0.05, and then only marginally (P = 0.044), rates being somewhat higher for blood (4.6%) than for urine (2.2%) or saliva (2.9%) (Supplementary Table 1).

Assay method

There is little evidence that misclassification rates varied by assay method. Only rates M3 and M4 showed evidence of variation significant at P < 0.05. Rate M3 was high (19.0%) for the category “other”, representing studies that did not specify their method or used several methods in a single study, compared with 8.8%, 10.1% and 11.5% in the other categories. For M4 the rate was lower using chromatography or immunoassay (6.5% and 9.9% respectively, versus 13.1% and 13.0% for spectrometry and “other” respectively) (Supplementary Table 2).

Study type

Studies were classified as being of the general population, of pregnant women, or “other” (consisting of diseased groups and participants in case-control studies). For some misclassification rates (M2, M6, M7, M9 and M10), there were data from only two studies in pregnancy, most such studies recording cotinine levels in self-reported non-smokers or ex-smokers, not in self-reported never smokers. There were some major sources of variation (P < 0.001) by study type. First, reporting of quitting by true current smokers (M3) was higher in pregnant women (22.7%) than in general population (8.7%) or “other” studies (12.0%). Second, the percentage of self-reported current smokers who were true non-smokers according to cotinine (M4), was over twice as high in the “other” group (21.9%) as in the general population or pregnant women (8.0% and 8.5% respectively). The same is true for the percentage of true non-smokers who report being current smokers (M11; 10.5% vs 2.9% and 3.5%). Third, the percentage of current smokers who report being ex-smokers is about twice as high in the “other” group as in the general population or pregnant women, whether they be true current smokers (M7; 21.3% versus 6.6% and 9.4%) or self-reported current smokers plus misclassified non-smokers (M10; 10.5% versus 2.9% and 3.5%) (Supplementary Table 3).

Age group

Studies were classified according to whether participants were young, not young, all ages or age not specified. Defining these groups was complicated by there being various ways to present age information in study reports. Participants were classified as young if the upper age limit was at most 50 years or the mean age was at most 30 years or the study was of pregnant women. Studies of not young participants had a lower age limit of at least 30 years or a mean age of at least 60 years. Studies classified as of all ages had an age range that included ages 30 to 50 years, with this inferred for studies with a lower age limit of at most 30 years and a mean age over 30. All other studies were classified as age not specified.

The major sources of variation by age were similar to those for study type. Thus, self-reported quitting among true current smokers (M3) was highest (18.8%) in the young group, which included pregnant women, while the other rates showing clearly significant variation (P < 0.01) - M4, M7, M10 and M11 - were all highest in the not young group, to which the “other” study type group would mainly belong (Supplementary Table 4).

Awareness of validation by cotinine

In each analysis, the percentage of studies specifying whether or not participants were told that their self-report would be cotinine-validated was quite low, around 25%. The number of studies specifying the participant was told was at most 12 (M1 and M8), and was often only 1 or 2. While one might imagine knowledge of validation would encourage better self-report, the reverse seemed to be true. For the only analyses where significant variation was seen (M3, P = 0.009 and M8, P = 0.022) the misclassification rate was 20.2% and 16.4% respectively among those told, 13.5% and 7.3% among those not told and 10.0% and 12.4% for studies not specifying this. For those other rates where eight or more participants were told (M1, M4, M5, M8 and M11), the percentage misclassified was generally highest in the group that was told, though never significantly (Supplementary Table 5).

Time of publication

Time of publication, as an approximate indicator of time of study conduct, was divided into three groups: Studies considered in the 1995 review, other studies published before 2003, and studies reported later. For five rates (M4, M5, M7, M8 and M10) there was significant (P < 0.05) variation by time of publication, always due to a higher rate in the most recently reported studies, typically by about twofold. These relate to erroneous claims, as judged by cotinine, relevant to self-reported current smoking (M4, 14.5% versus 6.8% and 6.0%), non-smoking (M5, 19.8% versus 10.1% and 10.4%; and M8, 14.9% versus 6.6% and 9.3%) and ex-smoking (M7, 12.8% versus 4.4% and 7.0%, and M10, 10.7% versus 4.4% and 6.9%) (Supplementary Table 6).

Study quality

Studies were classified as good or not good according to whether they had accounted for other nicotine sources. For some rates (M3, M6, M7, M9 and M10), the number of good studies was four or fewer, not allowing useful analysis. There was no significant (P < 0.05) variation by study quality for any of the other rates studied, nor any consistent evidence that misclassification rates were lower in the good studies (Supplementary Table 7).

Pregnancy

For rates M2, M6, M7, M9 and M10 the analyses were unhelpful, being based on only two studies in pregnant women. For five other rates (M1, M4, M5, M8 and M11), there was no significant difference between pregnant and non-pregnant women. However, the percentage of self-reported ex-smokers who were current smokers according to cotinine (M3) was clearly (P < 0.001) higher in pregnant women (22.7%) than in non-pregnant women (7.9%) (Supplementary Table 8).

Tobacco products considered

Studies were divided by whether cotinine levels were used to check statements made about cigarette smoking, any smoking, or any tobacco use. The number of studies classified under any tobacco use was relatively low, at most 15 (for M1) and was three or fewer for six of the rates. For the misclassification rate (M2) which showed the greatest heterogeneity by group (P < 0.001), the percentage of true current smokers among self-reported never smokers was 4.3% for cigarette smoking and 2.1% for any smoking. This is consistent with some self-reported never smokers of cigarettes using other nicotine-containing products not considered in the study, resulting in high cotinine levels. Other misclassification rates showing some evidence of heterogeneity between groups (P < 0.1, M3, M8) were also higher in studies where the statements checked concerned cigarette smoking (Supplementary Table 9).

Sex

There was no clear heterogeneity by sex, no P values being < 0.001. However, there were some indications of variation, with three P values < 0.05 and some other values close to 0.1. The percentage of true current smokers who reported never having smoked (M6) was higher for studies in females (12.0%) than for studies in males (3.5%), or studies which only reported combined sex results (4.1%), and a similar pattern was seen for rate M9, where the denominator also included misclassified non-smokers. The percentage of self-reported ex-smokers who proved to be true current smokers (M3) was also highest in females, consistent with the earlier results relating to pregnancy. Exceptionally, rate M7, which concerned true current smokers reporting having quit, and the similar rate M10, were highest where results were based on sexes combined (Supplementary Table 10).

Location

The clearest variation by location (P = 0.002) was seen for M3, the percentage of self-reported quitters who were current smokers according to the cotinine test. Here, misclassification rates were 15.2% in Canada/United States, 9.5% in Europe, 5.9% in Asia and 17.8% in other locations. There was also some evidence (P = 0.026) of higher rates in the “other” locations of true current smoking among self-reported never smokers (M2) (Supplementary Table 11).

Interaction between sex and location

It is claimed that some misclassification rates may be particularly high in Asian women, so we looked at the significance of the interaction between the four level location variable and the three level sex variable. Highly significant (P < 0.001) variations were seen for rates M2, M6, M8 and M9, with a significant (P < 0.01) variation also seen for M5.

Looking first at the percentage of true current smokers who reported being never smokers (M6), it was striking that, whereas mean rates varied from 0% to 6.4% in nine of the 12 subsets, they were much higher in Asian females (44.3%, n = 4), in females in “other” countries (40.6%, n = 1) and in females in Canada/United States (17.0%, n = 3). Looking further, it was clear that all four rates meta-analysed for Asian females were high (12.5%, 22.4%, 33.3% and 54.2%). The three rates from studies in Canada/United States females were variable (2.5%, 7.8%, 66.3%), with the last very high. This was from a study[36] conducted in the United States, but concerning Southeast Asian immigrants. The results are consistent with women who smoke in communities where smoking is culturally unacceptable being very likely to deny ever having done so. Notably, the single high rate (40.6%) for “other” countries comes from a study in the Republic of Karelia, Russia[37] the authors commenting on the cultural unacceptability of females smoking in Russia. Essentially similar patterns, based on the same studies with high rates, are seen for rate M9, which also concerns false claims of never smoking.

High rates in Asian females (38.4%, n = 6) and in females in “other” countries (26.7%, n = 5) were also seen for rate M5, which concerns true current smokers reporting that they are non-smokers. While rates were elevated in each Asian study (range 16.1% to 87.5%), rates in the “other” countries were only markedly elevated in the Karelian study (43.6%), in a study of pregnant women in New Zealand[38] (28.0%), and in indigenous females in Australia[39] (22.2%). Similar results, based on the same studies with high rates, are seen for rate M8, which also concerns false claims of non-smoking.

For rate M2, the percentage of reported never smokers who were true current smokers, was 3.0% overall. However, rates were again high in Asia, in “other” countries (based only on the Karelian study), and in Canada/United States due mainly to the study of South East Asian immigrants. In each case, rates were high in males as well as in females (Supplementary Table 12).

Multivariate analyses

Details of these additional analyses are also given in Supplementary File 3, at the end of the section for each rate, under the title “Multivariate analysis”. Factors which remained significant in the multivariate analyses are indicated by underlining the relevant variation in Table 4. As can be seen, some factors do not appear in any final multivariate analysis. These factors (body fluid, assay method, awareness of validation, pregnancy, sex and location) can all be regarded as not clearly associated with any of the 11 rates, as judged by a significance level of P < 0.01. For most of the rates, these factors were not significant (at P < 0.01) in the univariate analysis, though exceptionally, for M3, variations by awareness of validation, pregnancy and location which were significant at P < 0.01 in the univariate analysis were no longer significant at that level in the multivariate analysis.

Table 5 summarizes the results for five factors which showed a significant (P < 0.01) independent association with at least one misclassification rate. For each factor, the direction of the major differences is generally the same for each of these rates. Thus, rates are generally higher for studies of diseased populations and case-control studies than for general population studies, for the youngest age group, for studies published from 2003 onwards than for earlier studies, where the study quality is not good, and where the tobacco product considered is cigarettes only. Exceptionally, for study type, the difference between studies of the general population and studies of pregnant women is not in the same direction for all rates. Thus, reporting of quitting by current smokers (M3) was higher in pregnant women, but the percentage of self-reported current smokers who were true non-smokers according to cotinine (M4) was lower.

Table 5 Factors included in the final model for a rate, with the significant differences in misclassification rates (from base level) by factor level: Cut 1, multivariate analysis.

Factor (base level), rates that included the factor in multivariate analysis^a		Other factor levels: Difference in rate from base level, significance^a
Study type (base level = general population)		Pregnancy	Diseased/case-control
M1	% of self-reported non-smokers whose cotinine implies current smoking		3.92⁺⁺
M3	% of self-reported ex-smokers whose cotinine implies current smoking	14.63⁺⁺⁺	3.89⁺
M4	% of self-reported current smokers whose cotinine implies non-smoking	-14.24^---	10.71⁺⁺
M7	% of true current smokers who report being ex-smokers		14.74⁺⁺⁺
M10	% of self-reported current smokers (plus misclassified non-smokers) who report being ex-smokers		9.11⁺⁺⁺
M11	% of true non-smokers who report being current smokers	-2.48^-	6.92⁺⁺⁺
Age group (base level = young)		Not young	All ages	Not stated
M4	% of self-reported current smokers whose cotinine implies non-smoking	-11.03^-	-16.86^---	-18.28^---
M11	% of true non-smokers who report being current smokers		-3.87^---	-4.58^--
Time of publication (base level = in 1995 review)		Before 2003	2003 onwards
M4	% of self-reported current smokers whose cotinine implies non-smoking		11.17⁺⁺
M5	% of true current smokers who report being non-smokers		9.61⁽⁺⁾
M8	% of self-reported current smokers (plus misclassified non-smokers) who report being non-smokers		6.26⁺
M9	% of self-reported current smokers (plus misclassified non-smokers) who report being never smokers		6.11⁺⁺
Study quality (base level = good)		Not good
M4	% of self-reported current smokers whose cotinine implies non-smoking	6.32⁺⁺
Tobacco products considered (baseline level = cigarettes)		Any smoking	Any tobacco
M2	% of self-reported never smokers whose cotinine implies current smoking	-2.22^---
M3	% of self-reported ex-smokers whose cotinine implies current smoking	-3.61^-
Location in females (base level = Canada/United States)^c		Europe	Asia	Other
M5	% of true current smokers who report being non-smokers	6.73⁽⁺⁾	23.61⁺⁺⁺
M6	% of true current smokers who report being never smokers	-11.06^(-)	27.32⁺⁺
M8	% of self-reported current smokers (plus misclassified non-smokers) who report being non-smokers		23.19⁺⁺⁺
M9	% of self-reported current smokers (plus misclassified non-smokers) who report being never smokers	-8.62^-	30.96⁺⁺⁺	24.27⁺^b

^aFor each factor, the rates shown are those with which the factor showed a significant (P < 0.01) independent association. All differences shown are adjusted for the other factors significant for that rate. They represent the difference in misclassification rate from the rate for the base level. Only statistically significant differences are shown. The significance of the difference is coded as: ⁺⁺⁺P < 0.001; ^---P < 0.001; ⁺⁺P < 0.01; ^--P < 0.01; ⁺P < 0.05; ^-P < 0.05; ⁽⁺⁾P < 0.1; ^(-)P < 0.1.

^bResult based on three estimates or less.

^cVariations between location were generally not significant for males or for sexes combined - see Supplementary File 3 for full results for sex × location.

Independent significant (P < 0.01) variation was also seen for the sex by location interaction for four rates (M5, M6, M8 and M9), all relating to smokers reporting non-smoking or never smoking. The variation was predominantly due to the results for females. As shown in Table 5, rates were substantially higher in Asian women than in women in Europe or North America. Rates were also somewhat higher in women in “other” locations, but less clearly, those results being based on relatively few estimates. It should be noted that, for rate M2, the percentage of self-reported never smokers whose cotinine implies current smoking, highly significant (P < 0.001) variation for the sex by location interaction in univariate analysis was not significant (at P < 0.01) in the multivariate analysis after adjustment for the type of tobacco products considered.

DISCUSSION

We have attempted to obtain estimates of 11 different misclassification rates and relate them to a range of factors. Although the data are complex, a number of clear conclusions can be drawn. First, there is considerable between-study variation in the level of misclassification.

Second, false claims to have quit smoking are more common than false claims of never having smoked, and it is also clear that the proportion of true current smokers (as judged by cotinine) is higher for self-reported ex-smokers than for self-reported never smokers.

Third, many of the rates vary according to different factors. Notably, false claims of being a non-smoker or a never smoker (rates M5, M6, M8 and M9) are particularly high in Asian females, and for females in other populations where smoking by females is not considered acceptable. This is particularly clear for the percentage of true current smokers who report being never smokers (M6) where individual studies provide rates that sometimes exceed 50% as compared to an overall rate of 5.7%. Not included in our analyses, for reasons described in Supplementary File 1, are results reported from the Health Survey for England specifically for Bangladeshi women[40]. Of 227 who reported not being tobacco users, 45 (M1 = 19.8%) had saliva cotinine greater than 15 ng/mL.

There is a clear tendency for many of the rates (particularly M4, M5 and M8) to be higher in more recent studies, though the explanation requires further study. There is also evidence that the percentage of self-reported ex-smokers whose cotinine implies current smoking (M3) is particularly high in pregnant women and younger women, associations which are inter-related, as demonstrated in the multivariate analyses. For a number of the factors (M1, M3, M4, M7, M10 and M11), there is clear evidence that rates are higher in studies of diseased groups and case-control studies than in general population studies, suggesting that circumstances of interview or presence of disease may affect the answers given. Some of these rates (M4 and M11) are also higher in younger populations.

Some other variations in rates also require comment. One is the high percentage of self-reported never smokers whose cotinine implies current smoking (M2) in Asian populations of both sexes, and where only cigarette smoking was considered - to be expected as smoking of other tobacco products may also produce high cotinine levels. An interesting association is the high percentage of self-reported ex-smokers whose cotinine implies current smoking (M3) in populations who were aware they would be tested for cotinine. While this may suggest a tendency for the mention of possible cheating to inadvertently encourage cheating, the multivariate analyses did not include awareness of validation as an independent factor significant at P < 0.01, so more evidence is needed to confirm this. A problem here is that information on awareness was not available for many of the studies.

The conclusions summarized above were drawn from analysis of the rates based on the lower cut point and did not consider results from the study of about 6.2 million life insurance applicants[26] which used a cut point of 500 ng/mL in urine to validate statements made about tobacco use in the knowledge that their responses would be confirmed biochemically. Of 545970 who proved to be cotinine positive, 105,452 (M5 = 19.31%) reported being non-tobacco users, a false-negative self-reporting rate which the authors reported was higher in males and younger participants, and “may be the result of complex interactions among financial incentives, geography and presumptive peer groups, and gender”. It is interesting that the authors did not comment on the very high proportion of cotinine negatives (M4 = 498426/938944 = 53.08%) who self-reported tobacco use. The reason for this is not obvious.

It is worth considering the effects of misclassification on the association of disease rates with both active and passive smoking. We consider first associations with current active smoking. Suppose that, in a given population, the proportion of true current smokers is P_C, the risk of a given disease is 1 unit in true non-smokers and R units in true current smokers, and the rate of misclassification of true current smokers as non-smokers (M5) is M_C. Let us initially ignore the reverse misclassification rate (M11), and assume misclassified and non-misclassified smokers have the same disease risk. Instead of observing the true relative risk of R we will observe a reduced relative risk of R* = R(P_N + M_CP_C)/(P_N + RM_CP_C), where P_N = 1 - P_C. Thus, if P_C is 30%, and R is 10, setting M_C = 10% would be expected to produce observed values of P_C* = 27% and R* = 7.3. The bias in the risk estimate increases with increases in both M_C and P_C.

Misclassified and non-misclassified smokers may not have the same disease risk for various reasons. Misclassified smokers may have smoked less, suggesting a lower risk than smokers who report their smoking. On the other hand, misclassification may be common in participants advised to quit by their doctor as they were considered to be at higher than average risk. However, assuming the risk for misclassified smokers exceeds that for non-smokers, positive misclassification bias will still occur[1].

In the above calculations we assumed the reverse misclassification rate (M11) is zero. Where the true proportions of current smokers and non-smokers are similar, a given value of M11 will bias the relative risk less than will the same value of M5. Thus, with 50% smokers, a rate of M5 of 10% decreases a true relative risk of 10 to an observed 5.5, while a rate of M11 of 10% decreases it to 9.2. However, as the true proportion of current smokers decreases, the biasing effects become more similar.

There are problems in using cotinine data to confirm smoking status. First, cotinine levels do not allow precise estimation of amount smoked, though they are clearly correlated with it. Second, cotinine levels may be increased in a never smoker from environmental tobacco smoke exposure, though in practice this will not produce levels consistent with active smoking. Finally, and most seriously, cotinine levels only relate to current (or quite recent) smoking habits, and do not distinguish never smokers from short, medium or long-term quitters. Those who report never smoking may in fact have smoked until quite recently and have higher risks of smoking-related disease because of this. However, their cotinine levels will not be elevated.

We now consider the effect of misclassification on the relative risk associated with passive smoking. Some years ago, Forey and Lee[1] noted that the relationship of passive smoking to lung cancer risk is commonly studied in never smokers, using marriage to a smoker as the index of exposure, and that, as smokers tend to marry smokers, relative risk estimates will be biased if some current or former smokers are misclassified as never smokers. They described in detail how the “misclassification bias” (the apparent risk from spousal smoking if no true effect existed) depends on various factors. They showed that the bias increased with the misclassification rate of ever smokers as never smokers, the relative risk of disease associated with ever smoking, the proportion of participants who have ever smoked, and the concordance ratio between spouses’ smoking habits, and decreased with the proportion of never smokers whose spouse has ever smoked.

The mathematics presented[1] also apply to smoking-related diseases other than lung cancer, and to other indices of passive smoking where the index of exposure may be associated with an increased likelihood of smoking. Thus, not only is someone married to a smoker more likely than average to be a smoker themselves, but the same is also true for those whose parents smoke, who live with a smoker, and who work with a smoker.

Application of these results to the misclassification data presented here is not straightforward, as they relate to misclassification of ever smokers, whereas the cotinine data relate to misclassification of current smokers. Denial of past smoking can only be checked from statements made on different occasions by the same individual, evidence for this not being considered here. It is also important to realise that misclassified ever smokers are likely to have lower disease risks than typical ever smokers, as they are more likely to smoke less or be ex-smokers. Lee and Forey[1] concluded that the effects of misclassification (taking into account both misclassification of current and ex-smokers as never smokers and the tendency for misclassified ever smokers to have lower risks than non-misclassified ever smokers) were equivalent overall to assuming that about 2.5% of average ever smokers are misclassified as never smokers, though noting that “an appropriate figure is probably in the range 2%-3% but could, not implausibly, be anywhere in the range 1%-4%”.

Recent bias estimations (e.g.,[41]), have used misclassification rates of 2.5% for studies in Western populations and 10% for studies in Asia. The use of higher rates for Asia was supported by evidence, partly referred to earlier[1], suggesting that misclassification rates are very much higher in Asian women. While the evidence presented here confirms the extremely high misclassification rates in Asian women, they do not suggest the same is true for Asian men. However, given the evidence that misclassification rates are higher in more recently published studies than in the studies considered in the 1996 paper, it seems the estimate of 2.5% for studies in Western populations may be too low.

In conclusion, the combined evidence from 205 studies provides extensive information on the extent to which self-reported smoking habits are confirmed by cotinine levels in blood, saliva or urine and the extent to which true smokers deny current smoking. Misclassification rates are heterogeneous, with false claims of never smoking much higher in Asian women, and false claims of having quit higher in pregnant women. A number of the rates are higher in diseased groups likely to have been advised to quit. Misclassification rates are higher in more recent studies, which exacerbates problems in determining true relationships of passive smoking with disease.

ARTICLE HIGHLIGHTS

Research background

Misclassification of smoking habits leads to underestimation of true relationships between diseases and active smoking, but overestimation of true relationships with passive smoking.

Research motivation

We estimated overall misclassification rates weighted on sample size and investigated heterogeneity by various study characteristics.

Research objectives

Research methods

We analysed data from studies using cotinine as a marker which involved at least 200 participants and provided information on high cotinine levels in self-reported non-, never-, or ex-smokers. Information on low levels in self-reported smokers was also analysed.

Research results

There was considerable heterogeneity between misclassification rates. Rates of claiming never smoking were very high in Asian women smokers, the individual studies reporting rates of 12.5%, 22.4%, 33.3%, 54.2% and 66.3%. False claims of quitting were relatively high in pregnant women, in diseased individuals who may recently have been advised to quit, and in studies considering cigarette smoking rather than any smoking. False claims of smoking were higher in younger populations.There was no clear evidence that rates varied by the body fluid used for the cotinine analysis, the assay method used, or whether the respondent was aware their statements would be validated by cotinine - though here many studies did not provide relevant information. Misclassification rates were higher in more recently published studies.

Research conclusions

Our demonstration that rates of misclassification of smoking habits are particularly high in some situations underlines the difficulty that epidemiologists have in accurately estimating the increases in risk of various diseases associated with active and passive smoking.

Research perspectives

Misclassification rates are heterogeneous, with false claims of never smoking much higher in Asian women, and false claims of having quit higher in pregnant women. A number of the rates are higher in diseased groups likely to have been advised to quit. Misclassification rates are higher in more recent studies, which exacerbates problems in determining true relationships of passive smoking with disease.

ACKNOWLEDGEMENTS

We thank Yvonne Cooper and Diane Morris for typing the various drafts of this paper and obtaining many of the references cited. We also thank Japan Tobacco International for financial support and assistance in obtaining some of the references.

Footnotes

Manuscript source: Unsolicited manuscript

Specialty type: Medicine, research and experimental

Country of origin: United Kingdom

Peer-review report classification

Grade A (Excellent): 0

Grade B (Very good): 0

Grade C (Good): C, C

Grade D (Fair): 0

Grade E (Poor): 0

P- Reviewer: He SQ, Tang Y S- Editor: Ji FF L- Editor: A E- Editor: Bian YN

References

1.	Lee PN, Forey BA. Misclassification of smoking habits as a source of bias in the study of environmental tobacco smoke and lung cancer. Stat Med. 1996;15:581-605. [RCA] [PubMed] [DOI] [Full Text] [Cited by in RCA: 1] [Reference Citation Analysis (0)]

2.	Tzonou A, Kaldor J, Smith PG, Day NE, Trichopoulos D. Misclassification in case-control studies with two dichotomous risk factors. Rev Epidemiol Sante Publique. 1986;34:10-17. [PubMed] [DOI]

3.	Benowitz NL. Systemic absorption and effects of nicotine from smokeless tobacco. Adv Dent Res. 1997;11:336-341. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 51] [Cited by in RCA: 52] [Article Influence: 1.9] [Reference Citation Analysis (0)]

Benowitz NL, Schultz KE, Haller CA, Wu AH, Dains KM, Jacob P. Prevalence of smoking assessed biochemically in an urban public hospital: a rationale for routine cotinine screening. Am J Epidemiol. 2009;170:885-891. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 59] [Cited by in RCA: 71] [Article Influence: 4.4] [Reference Citation Analysis (0)]

Jarvis MJ, Fidler J, Mindell J, Feyerabend C, West R. Assessing smoking status in children, adolescents and adults: cotinine cut-points revisited. Addiction. 2008;103:1553-1561. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 100] [Cited by in RCA: 103] [Article Influence: 6.1] [Reference Citation Analysis (0)]

6.	Marsot A, Simon N. Nicotine and Cotinine Levels With Electronic Cigarette: A Review. Int J Toxicol. 2016;35:179-185. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 66] [Cited by in RCA: 88] [Article Influence: 8.8] [Reference Citation Analysis (0)]

7.	Lee PN, Forey BA. Misclassification of smoking habits as determined by cotinine or by repeated self-report - a summary of evidence from 42 studies. J Smoking-Related Dis. 1995;6:109-129. [PubMed] [DOI]

8.	Lee PN. Difficulties in assessing the relationship between passive smoking and lung cancer. Stat Methods Med Res. 1998;7:137-163. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 8] [Cited by in RCA: 8] [Article Influence: 0.3] [Reference Citation Analysis (0)]

9.	Etzel RA. A review of the use of saliva cotinine as a marker of tobacco smoke exposure. Prev Med. 1990;19:190-197. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 144] [Cited by in RCA: 134] [Article Influence: 3.8] [Reference Citation Analysis (0)]

10.

Patrick DL, Cheadle A, Thompson DC, Diehr P, Koepsell T, Kinne S. The validity of self-reported smoking: a review and meta-analysis. Am J Public Health. 1994;84:1086-1093. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1129] [Cited by in RCA: 1218] [Article Influence: 39.3] [Reference Citation Analysis (0)]

11.

Wells AJ, English PB, Posner SF, Wagenknecht LE, Perez-Stable EJ. Misclassification rates for current smokers misclassified as nonsmokers. Am J Public Health. 1998;88:1503-1509. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 125] [Cited by in RCA: 129] [Article Influence: 4.8] [Reference Citation Analysis (0)]

12.

Connor Gorber S, Schofield-Hurwitz S, Hardt J, Levasseur G, Tremblay M. The accuracy of self-reported smoking: a systematic review of the relationship between self-reported and cotinine-assessed smoking status. Nicotine Tob Res. 2009;11:12-24. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 674] [Cited by in RCA: 796] [Article Influence: 49.8] [Reference Citation Analysis (0)]

13.

Jarvis MJ, Tunstall-Pedoe H, Feyerabend C, Vesey C, Saloojee Y. Comparison of tests used to distinguish smokers from nonsmokers. Am J Public Health. 1987;77:1435-1438. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 692] [Cited by in RCA: 723] [Article Influence: 19.0] [Reference Citation Analysis (0)]

14.	Lee PN. Passive smoking and lung cancer association: a result of bias? Hum Toxicol. 1987;6:517-524. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 45] [Cited by in RCA: 28] [Article Influence: 0.7] [Reference Citation Analysis (0)]

15.

Haddow JE, Palomaki GE, Knight GJ. Use of serum cotinine to assess the accuracy of self reported non-smoking. Br Med J (Clin Res Ed). 1986;293:1306. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 24] [Cited by in RCA: 23] [Article Influence: 0.6] [Reference Citation Analysis (0)]

16.

Pirkle JL, Flegal KM, Bernert JT, Brody DJ, Etzel RA, Maurer KR. Exposure of the US population to environmental tobacco smoke: the Third National Health and Nutrition Examination Survey, 1988 to 1991. JAMA. 1996;275:1233-1240. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 103] [Cited by in RCA: 93] [Article Influence: 3.2] [Reference Citation Analysis (0)]

17.

Wald NJ, Boreham J, Bailey A, Ritchie C, Haddow JE, Knight G. Urinary cotinine as marker of breathing other people's tobacco smoke. Lancet. 1984;1:230-231. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 110] [Cited by in RCA: 102] [Article Influence: 2.5] [Reference Citation Analysis (0)]

18.

Lee PN. "Marriage to a smoker" may not be a valid marker of exposure in studies relating environmental tobacco smoke to risk of lung cancer in Japanese non-smoking women. Int Arch Occup Environ Health. 1995;67:287-294. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 30] [Cited by in RCA: 31] [Article Influence: 1.0] [Reference Citation Analysis (0)]

19.	Office of Population Censuses and Surveys. Health survey for England 1994. Volume I: Findings. Volume II: Survey methodology documentation. Series HS no. 4. Colhoun H, Prescott-Clarke P, editors. Vol London: HMSO, 1996: 607. . [PubMed] [DOI]

20.

Suadicani P, Hein HO, Gyntelberg F. Serum validated tobacco use and social inequalities in risk of ischaemic heart disease. Int J Epidemiol. 1994;23:293-300. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 47] [Cited by in RCA: 51] [Article Influence: 1.6] [Reference Citation Analysis (0)]

21.

Baltar VT, Xun WW, Chuang SC, Relton C, Ueland PM, Vollset SE, Midttun Ø, Johansson M, Slimani N, Jenab M, Clavel-Chapelon F, Boutron-Ruault MC, Fagherazzi G, Kaaks R, Rohrmann S, Boeing H, Weikert C, Bueno-de-Mesquita HB, Boshuizen HC, van Gils CH, Peeters PH, Agudo A, Barricarte A, Navarro C, Rodríguez L, Castaño JM, Larrañaga N, Pérez MJ, Khaw KT, Wareham N, Allen NE, Crowe F, Gallo V, Norat T, Tagliabue G, Masala G, Panico S, Sacerdote C, Tumino R, Trichopoulou A, Lagiou P, Bamia C, Rasmuson T, Hallmans G, Roswall N, Tjønneland A, Riboli E, Brennan P, Vineis P. Smoking, secondhand smoke, and cotinine levels in a subset of EPIC cohort. Cancer Epidemiol Biomarkers Prev. 2011;20:869-875. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 31] [Cited by in RCA: 27] [Article Influence: 1.9] [Reference Citation Analysis (0)]

22.

Assaf AR, Parker D, Lapane KL, McKenney JL, Carleton RA. Are there gender differences in self-reported smoking practices? Correlation with thiocyanate and cotinine levels in smokers and nonsmokers from the Pawtucket Heart Health Program. J Womens Health (Larchmt). 2002;11:899-906. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 40] [Cited by in RCA: 46] [Article Influence: 2.0] [Reference Citation Analysis (0)]

23.

Simoni M, Baldacci S, Puntoni R, Pistelli F, Farchi S, Lo Presti E, Pistelli R, Corbo G, Agabiti N, Basso S, Matteelli G, Di Pede F, Carrozzi L, Forastiere F, Viegi G. Respiratory symptoms/diseases and environmental tobacco smoke (ETS) in never smoker Italian women. Respir Med. 2007;101:531-538. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 44] [Cited by in RCA: 45] [Article Influence: 2.4] [Reference Citation Analysis (0)]

24.	Lee CY, Shin S, Lee HK, Hong YM. Validation of self-report on smoking among university students in Korea. Am J Health Behav. 2009;33:540-549. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 6] [Cited by in RCA: 6] [Article Influence: 0.4] [Reference Citation Analysis (0)]

25.

Agewall S, Persson B, Lindstedt G, Fagerberg B. Smoking and use of smokeless tobacco in treated hypertensive men at high coronary risk: utility of urinary cotinine determination. Br J Biomed Sci. 2002;59:145-149. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 10] [Cited by in RCA: 12] [Article Influence: 0.5] [Reference Citation Analysis (0)]

26.	Palmier J, Lanzrath B, Dixon A, Idowu O. Demographic predictors of false negative self-reported tobacco use status in an insurance applicant population. J Insur Med. 2014;44:110-117. [PubMed] [DOI]

27.

Becher H, Zatonski W, Jöckel KH. Passive smoking in Germany and Poland: comparison of exposure levels, sources of exposure, validity, and perception. Epidemiology. 1992;3:509-514. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 25] [Cited by in RCA: 25] [Article Influence: 0.8] [Reference Citation Analysis (0)]

28.

Emmons KM, Abrams DB, Marshall R, Marcus BH, Kane M, Novotny TE, Etzel RA. An evaluation of the relationship between self-report and biochemical measures of environmental tobacco smoke exposure. Prev Med. 1994;23:35-39. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 64] [Cited by in RCA: 64] [Article Influence: 2.1] [Reference Citation Analysis (0)]

29.

Martinez FD, Wright AL, Taussig LM. The effect of paternal smoking on the birthweight of newborns whose mothers did not smoke. Group Health Medical Associates. Am J Public Health. 1994;84:1489-1491. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 78] [Cited by in RCA: 73] [Article Influence: 2.4] [Reference Citation Analysis (0)]

30.

Ogden MW, Davis RA, Maiolo KC, Stiles MF, Heavner DL, Hege RB and Morgan WT. Multiple measures of personal ETS exposure in a population-based survey of nonsmoking women in Columbus, Ohio. Proceedings of the 6th International Conference on Indoor Air Quality and Climate, Indoor Air, 1993 July 4-8; Helsinki, Finland. 1993;523-528.

31.

Slattery ML, Hunt SC, French TK, Ford MH, Williams RR. Validity of cigarette smoking habits in three epidemiologic studies in Utah. Prev Med. 1989;18:11-19. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 26] [Cited by in RCA: 27] [Article Influence: 0.8] [Reference Citation Analysis (0)]

32.

Coultas DB, Howard CA, Peake GT, Skipper BJ, Samet JM. Discrepancies between self-reported and validated cigarette smoking in a community survey of New Mexico Hispanics. Am Rev Respir Dis. 1988;137:810-814. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 67] [Cited by in RCA: 66] [Article Influence: 1.8] [Reference Citation Analysis (0)]

33.

Cummings KM, Markello SJ, Mahoney M, Bhargava AK, McElroy PD, Marshall JR. Measurement of current exposure to environmental tobacco smoke. Arch Environ Health. 1990;45:74-79. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 89] [Cited by in RCA: 79] [Article Influence: 2.3] [Reference Citation Analysis (0)]

34.

Haddow JE, Knight GJ, Palomaki GE, Kloza EM, Wald NJ. Cigarette consumption and serum cotinine in relation to birthweight. Br J Obstet Gynaecol. 1987;94:678-681. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 92] [Cited by in RCA: 87] [Article Influence: 2.3] [Reference Citation Analysis (0)]

35.	Haddow JE, Knight GJ, Palomaki GE, Haddow PK. Estimating fetal morbidity and mortality resulting from cigarette smoke exposure by measuring cotinine levels in maternal serum. Prog Clin Biol Res. 1988;281:289-300. [PubMed] [DOI]

36.

Wewers ME, Dhatt RK, Moeschberger ML, Guthrie RM, Kuun P, Chen MS. Misclassification of smoking status among Southeast Asian adult immigrants. Am J Respir Crit Care Med. 1995;152:1917-1921. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 39] [Cited by in RCA: 38] [Article Influence: 1.3] [Reference Citation Analysis (0)]

37.

Laatikainen T, Vartiainen E, Puska P. Comparing smoking and smoking cessation process in the Republic of Karelia, Russia and North Karelia, Finland. J Epidemiol Community Health. 1999;53:528-534. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 18] [Cited by in RCA: 21] [Article Influence: 0.8] [Reference Citation Analysis (0)]

38.

Ford RP, Tappin DM, Schluter PJ, Wild CJ. Smoking during pregnancy: how reliable are maternal self reports in New Zealand? J Epidemiol Community Health. 1997;51:246-251. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 127] [Cited by in RCA: 128] [Article Influence: 4.6] [Reference Citation Analysis (0)]

39.

Pearce MS, Mann KD, Singh G, Davison B, Sayers SM. Prevalence and validity of self-reported smoking in Indigenous and non-Indigenous young adults in the Australian Northern Territory. BMC Public Health. 2014;14:861. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 7] [Cited by in RCA: 4] [Article Influence: 0.4] [Reference Citation Analysis (0)]

40.

Roth MA, Aitsi-Selmi A, Wardle H, Mindell J. Under-reporting of tobacco use among Bangladeshi women in England. J Public Health (Oxf). 2009;31:326-334. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 11] [Cited by in RCA: 12] [Article Influence: 0.8] [Reference Citation Analysis (0)]

41.

Lee PN, Fry JS, Forey B, Hamling JS, Thornton AJ. Environmental tobacco smoke exposure and lung cancer: a systematic review. World J Meta-Anal. 2016;4:10-43. [RCA] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 12] [Cited by in RCA: 11] [Article Influence: 1.2] [Reference Citation Analysis (0)]