(828) 698-5795 

Sunday Science Lesson: Bad categories, bad science

by Carl V Phillips

[Oops, I forget to click “publish” on Sunday. So here it is on Monday. I am keeping the title as is though. :-)] I was thinking about this topic because I just finished writing a paper in which it comes up, and also I stumbled across a paper from a couple of years ago, by my old friend Miguel Hernán, that goes into depth about some aspects of it (open access link; a wonderful and generally understandable, though slightly technical, presentation). The issue is how far you can go in agglomerating heterogeneous entities (people, behavior, conditions, etc.) into a single category in an analysis and still have meaningful results.

Consider the examples from our paper (which, of course, I will publicize once there is a public version): It is common practice, when calculating health outcomes from smoking in a population, to have a category of “former smokers”. (In person-first language that would be “people who used to smoke”, but “former smokers” is always used.) That is a heterogeneous group of people in several ways. First, it is heterogeneous in the same sense “smokers” is: there are lots of different intensities of smoking, and intensity matters for health outcomes, at least until someone is so “former” that it does not. Second, continuing the latter point, those who have quit smoking recently are at much greater risk of smoking-caused disease than those who quit a long time ago.

It turns out, however, that for purposes of calculating population statistics, just agglomerating “former smokers” is close enough. There is some average excess health risk for former smokers (never mind that this is generally overestimated — that is a topic for another day), and some average mix of the heterogeneous individuals. These stay fairly constant over time, so if an average outcome is all that is needed, then simplifying everyone in the category to being the average member of the category works.

In the paper, we contrast this with the category usually called “dual users”, specifically people who both smoke and vape regularly. This category has more dimensions of heterogeneity. It includes dedicated smokers who regularly vape a bit, dedicated vapers who occasionally smoke a cigarette, smokers who are in the midst of an dedicated taper off of cigarettes, and countless other variations. People who fall into different subcategories have very different health risks from their choice, as well as very different probabilities of transitioning to a different choice (e.g., to exclusive vaping). In this case, using the average creates an utter mess.

We do not really have any data to support estimates for those averages (the health effects or the transition probabilities). The health effects are basically the same as smoking the given quantity, so that could be used. But we do not actually know the average mix. A worse problem is that the mix changes radically over time in populations with vaping uptake. How does it change? We don’t really have any good numbers for that. But we can be sure it does, and a lot. Early adopters are probably transitioning; as vaping becomes popular, the portion of dedicated smokers who do it some increases; etc. So even if an average were measured for transition probabilities, it would not generalize even to a year in the future for the same population. In short, pretty much any result of any calculation that hinges on some estimated characteristic of the “dual user” category is going to be junk science.

In his paper, Miguel looks at the heterogeneity of an exposure category. He drills down through increased specificity of the exposure definition (and thus a reduction in heterogeneity). The 5th step on the drill-down is “Does drinking a swig of water from the Broad Street pump kill?” (For those not familiar, establishing that this water source was contaminated was one of the first examples of published quantitative epidemiology, and probably the most-taught example.) This is a fairly well-defined exposure category — far better than is typical in epidemiology studies — but Miguel then points out that a 6th step (a time period) and a 7th (an “in comparison to what alternative?” statement) are really needed.

The proposed 1st version is “Does water kill?”, which is obviously a cartoonish bad way to define the exposure. The 4th refinement, just before getting to the specification of the Broad Street pump, is “Does drinking a swig of fresh water kill?” This, of course, would have been an utter fail as an attempt to define the exposure of interest. I will offer an additional entry in between, call it #4.5, “Does drinking a swig of water from a London neighborhood pump kill?” An estimate of the effect of that exposure might give us a clue that one or more pumps are contaminated, but would leave us woefully ignorant about what we really wanted to know, and it would obviously not give us any suggestion about what intervention might change the outcome.

Translating this into the vaping context, we can see that most exposure definitions are no better than #4.5, and some are as bad as #4. Few are as good as #5, let alone #7.

Estimates of an exposure called “vapes” (or “tried an e-cigarette” or “has vaped more than 30 times” or whatever) are used to look for associations with quitting smoking, starting smoking, becoming a lifelong vaper, and other outcomes. But the measured exposure is some mix across some or all of the heterogeneities described by: {tried vaping once out of curiosity, buying and regularly using a product, everything in between}, {actively interested in switching from smoking, giving it some passing thought, not thinking that at all}, {consuming nicotine, consuming THC, consuming neither}, {buying an open system, buying a pod system, buying a single disposable, only taking puffs offered from a friend’s vape}, etc. Sometimes a study uses a precise enough measure to narrow the coverage to only part of this space, but it is seldom anywhere close to as precise as specifying “the Broad Street pump”.

Some of these we would expect to be strong predictors of a particular outcome and others not. Throwing together a mix of them means that the observed association is some weighted average of their effects. The weighting factors (how much of each type of vaper comprises the exposed group) are unknown, let alone the level of effect contributed by each version of the heterogeneous exposure.

From the perspective of assessing the effects of interventions, about the best we ever have is the equivalent of version #4.5: Some water pump out there [some version(s) of being a vaper] cause a particular outcome. If we shut down the right pump(s) [encouraged/discouraged the particular version of vaping], we would have an outcome we like better. But we do not know which pump or which version. Blindly advocating for or against vaping in general based on such information is like randomly shutting down some of the pumps.

Thus the only thing that can really be said is “the mix of exposures fitting this definition of the exposure in this particular population at this particular time exhibits the following quantitative associations….” A strict adherence to the basic principles of epidemiology, that population and time matter, is sometimes an affectation and sometimes at attempt at intellectual nihilism (“…and therefore we cannot extrapolate this…”). But in this case it is absolutely necessary because of the huge and consequential heterogeneity.

In his paper, Miguel spins out the list of the exposure definition and notes that you could tighten it a million times and still there would be some ambiguity and thus heterogeneity. You have to know when to stop — that is part of the art of real science. In his list, #5 seems mostly good enough. Sometimes you can mostly ignore which population and time you are talking about and extrapolate a lot. But not for the “vapes” exposure or the “dual user” category.

For vaping, if you do not go clear to something like Miguel’s #7 — identifying the alternative to the exposure — you run into a new set of problems. Every time someone observes something like “but a lot of those kids who take up vaping would have smoked in the absence of vaping” they are recognizing this. Once again, much of the practical problem here is heterogeneity: the heterogeneity of the but-for option. It is easy to estimate the incidence of smoking initiation among young ever-vapers for a population. Then it possible (not easy!) to try to sort out how many were prevented from smoking and how many were caused to smoke because of the vaping exposure. But if vapers in a different population (or cohort in the same population) have a different mix among {would have initiated smoking if vaping had not existed, would not have}, then none of these results will generalize or offer useful predictions.

One of the takeaways from this, a point I have been making for over 20 years in one way or another, is that epidemiology study results that look at “the same” exposure (or the “same” outcome, or population, or covariates) are almost never actually looking at the same relationships. Most often, the common language summary that suggests that those measures are the same hides the fact that they actually measured different versions of related phenomena (or different people, or different whatever). But even if the measurements are really the same, the heterogeneous mix of what counts as part of that will be different across time and place.

More pointedly, this implies that the common claims along the lines of “this study showed that vaping increases your chance of quitting smoking by….” are all nonsense. It is perfectly reasonable to take a guess at that generic figure, based on all available evidence. But that one study only offered an estimate for one particular population, one particular definition of the exposure, etc. Those who had the vaping exposure were a particular mix of the subtypes of vapers. The specific results are meaningless without that context. The do not generalize to a generic statement about the effects of vaping.

Original author: Carl V Phillips
Heads Up – News – Updates 6.25.2019
San Francisco Board of Supervisors to Vote Tuesday...
 

Comments

Already Registered? Login Here
No comments made yet. Be the first to submit a comment

Five Flavor Review

Featured Review

Video Tour of Lab