The travesties that are Glantz, epidemiology modeling, and PubMed Commons
by Carl V Phillips
I was asked to rescue from the memory hole a criticism of a Glantz junk paper from a year ago. I originally covered it in this post, though I do not necessarily recommend going back to read it (it is definitely one of my less elegant posts).
I analyzed this paper from Dutra and Glantz, which claimed to assess the effect of e-cigarette availability on youth smoking. What they did would be a cute first-semester stats homework exercise, but is beyond stupid to present it as informative. It is simple to summarize:
Dutra and Glantz took NYTS data for smoking rates among American minors. They fit a linear trend to the decline in the minor smoking prevalence between 2004 and 2009 (the latter being the start of the e-cigarette era, by their assessment; the former presumably being picked from among all candidate years based on which produced the most preferred result, as per the standard Glantz protocol). They then let the slope of the trend change at 2009 and observed that the fit slope was about the same for 2009 to 2014. From this paltry concoction, they concluded e-cigarettes were not contributing to there being less smoking. They then — again, standard protocol — told the press that this shows that e-cigarettes are causing more smoking.
My previous post includes my somewhat rambling analysis of why this is all wrong. After writing that, I accepted a request to write a tight version of to post to PubMed Commons (presumably that was from Clive Bates, who is a big PMC fan, though I do not recall exactly). What follows, between the section break lines, is what I posted. It relies on more technical background on the part of the reader than I usually assume here, so I have added a few new notes (in brackets) to explain a few points.
Credit to Zvi Herzig for saving a copy of it before it was deleted (more on that story below), else I would no longer have a good copy. I am a year late in properly crediting Zvi for that favor by posting this.
It is dismaying about the state of scientific thinking in this area that commentators on this paper have failed to recognize its glaring fatal flaw. This includes the journal editors and reviewers, but also the many critics of this paper, including the previous comment here and various blog and social media posts (example)
The paper’s analysis and conclusions are based entirely on the modeling assumption that U.S. teenage smoking prevalence would have declined linearly over the period 2004-2014, but for the introduction of e-cigarettes in the population. This assumption is made stronger still [i.e., more constraining and thus more speculative] by the additional assumptions that e-cigarettes introduced an inflection, with linear declines both before and after that introduction [i.e., that it, it consists of a line segments for each of the two periods, forced to meet at a corner, like a drinking straw with a kink folded into it], and that an estimate of whether their slopes are significantly different is a measure of the effect of e-cigarettes on smoking prevalence [i.e., the claim that because the bend in the straw is only slight, e-cigarette availability had no effect]. There is so much wrong with these assumptions that it is difficult to know where to start.
Probably the best place to start is the observation that the authors did not even attempt to justify or defend this body of assumptions. Given that the analysis is wholly dependent on them, this itself is a fatal flaw with the paper even if a reader could guess what that justification would have been. But it very difficult to even guess. There are very few eleven-year periods in the historical NYTS data where smoking rate trends are monotonic (even ignoring the noise from individual years’ measurements), let alone linear. [Note: you can see a graph of the wildly varying historical data in my original post, and it is immediately obvious that it is absurd it is to try to fit a line to it.]
While teenage smoking prevalence is not nearly as unstable as many other of their consumption choices, no choice in this population can be assumed to be in the quasi-equilibrium state of, say, average adult beef consumption, where the shape of a curve fit to the gradual changes is largely immaterial. Unlike a stable adult population, the teenage population is characterized by both substantial cohort replacement between data waves [i.e., it is not the same people — the older kids from one year are no longer kids two years later, but are replaced with a new group of young kids] and fashion trends (rapid changes in collective preferences). Unlike many consumption choices, smoking is characterized by strong social pressures that sometimes rapidly change preferences. Indeed, the authors of this paper are proponents of the belief that marketing and policy interventions substantially change teenage smoking uptake. This makes their modeling assumption that the only impactful change over the course of eleven years was the introduction of e-cigarettes patently disingenuous.
In addition to the shape of the trend line, there are numerous candidates for modeling the impact of e-cigarettes, such as a one-off change in the intercept of the fit line rather than the slope [i.e., as if the straw were snipped apart and the bit after 2009 was allowed to shift up or down, but had to keep the same slope], or making it a function of e-cigarette usage or trialing prevalence [i.e., instead of forcing a linear fit on the latter period which effectively assumes that any effect increases proportional to the mere passage of time, use a curve that models the impact as proportional (or some other function) to actual e-cigarette exposure]. It is telling that the results from none of these alternative models, which are at least as plausible as the one presented, are reported. This either means that the authors ran such models but did not like the results and so suppressed them, or that the authors never even bothered to test the sensitivity of their model.
Put simply, the analysis on this paper depends entirely on a set of assumptions that are never supported and, indeed, clearly unsupportable. While a convenient but clearly inaccurate modeling simplification may sometimes be justified for purposes of making a minor point, it is obviously a fatal flaw when it is the entire basis of a paper’s analysis and conclusions.
Despite this, critics of this paper have almost universally endorsed the authors’ unjustified assumptions rather than pointing out they are fatal flaws. In particular, they have focused on quibbling about the choice of inflection point. The authors chose 2009 as the zero-point for the introduction of e-cigarettes, while the critics have consistently argued that the next data wave (2011), when there was first measurable e-cigarette use, should have been used. They point out that the model then shows a substantially steeper linear decline for the latter half of the period, and suggest this is evidence that e-cigarettes accelerated the decline in smoking contrary to the original authors’ conclusions [i.e., if you use the original method but bend the straw at 2011 instead of 2009, the second half gets steeper after the inflection rather than staying about the same].
If the original model were defensible, one could indeed debate which of these parameterizations was more defensible (the advantage here seems to go to the original authors; generally one would choose a zero-point of the last year of approximately zero exposure, not the first year of substantially nonzero exposure). But the scientific approach is not to dispute parameterization specifics based on one’s political beliefs about e-cigarettes. It is to observe that if a genuinely debatable choice of parameters produces profoundly different model outputs, then the model cannot be a legitimate basis for drawing worldly conclusions. It is far too unstable. In other words, the critics offer an additional clear reason for dismissing the entire analysis, but rather than arguing this should be done, they endorse the core model.
I have written, using this paper as context, more about the tendency of some commentators to get tricked into endorsing underlying erroneous claims while ostensibly criticizing here.
As a final observation, a serious analyst doing a quick-and-dirty fit to a percentage prevalence trend this steep would not choose a line, which predicts there will soon be a departure from the possible range [i.e., a downward linear trend will cross into negative values, which is not possible for a population prevalence]. The standard choices would be a logistic curve or exponential decay. However, it is likely that non-scientist readers (the predominant audience of papers on this topic) will not recognize this. Such readers frequently see fit lines overlaid on time series graphs, and may not recognize that these are just conveniences to help reduce the distractions from the noise in the data, not testaments that every time series can be assumed to be linear. Had the authors chosen a more appropriate shape for their fit line, it would have called more readers’ attention to the fact that they were making unjustified assumptions.
The underlying issue here is much more important than one stupid paper. The reason that Dutra and Glantz could so easily get away with this is that epidemiology (and other associated social science, for those not inclined to call this epidemiology per se) is rife with strong modeling assumptions, which are seldom justified and often patently absurd. The reason several others wrote criticisms of the paper without ever identifying the fatal flaw (and, indeed, implicitly endorsing it) is that the problem is so pervasive that they do not even recognize it as a problem.
These modeling assumptions are sometimes so strong, as in the present case, that the analysis and results are purely artifacts of the assumptions. More often, they affect reported quantitative estimates in unknown ways, though almost certainly moving the result in the direction the authors would prefer. This problem is close to ubiquitous in epidemiology articles. I have written about this extensively, and I will summarize some of it in a Daily Vaper science lesson — riffing off of this material — shortly. So I will stop here.
Finally, there is the issue of PubMed Commons. For those not familiar, it is basically a third-party comments section which PubMed (which indexes the articles from most health science journals) attaches to the bottom of its page for each article. Anyone who has authored an article that is indexed by PubMed can comment there. For example, here is where the above comment used to appear. Yes, used to.
The story: After being asked by Clive (I am pretty sure) to write that, I posted it and shared the link. Zvi quickly wrote me a note suggesting I edit it. In the original version, the last paragraph had some bits about how the authors’ political biases obviously influenced their choice of which among the multitude of candidate models to use. Zvi pointed out that PMC has a rule against “speculation on the motives of authors”, and that they will remove a comment over that. This is a fair enough rule, though it is not actually enforced (every comment from tobacco controllers I recall ever seeing there was just attacking the authors for supposed conflict of interests due to corporate funding, which is speculation about motives and, moreover, an aggressive accusation that someone will lie about results in exchange for funding). As anyone familiar with Glantz et al. will know, my statement was hardly mere speculation. But it was not important, so I was ok with editing the comment to remove those bits.
A few days later, presumably after Glantz complained and pressured PMC about the comment, it was deleted. (There is no chance that one of the other commentators on the Glantz paper, who I also criticized, was the one who complained. They are decent and honest people who would have replied to the comment if they thought I got something wrong, rather than insecure propagandists who know they are wrong and so try to censor disagreement.)
I later got an official notice from PMC that I had violated some unspecified rule and would have to change that if I wanted the comment to appear. I replied to ask what in my comment had violated what rule, because I genuinely had no idea. That kinda seems like a reasonable question. But I got no reply. I followed up. I suggested that perhaps they were basing their assessment on the original version, that did violate their rule, and had missed the edit. I got no response to any query.
Now if I had speculated that Glantz was a sexual harasser who stole credit for his advisees’ work, I could understand why they would have objected. Apparently I would have been right, but it would not have been relevant to the paper. But the content of that comment was fully on-point. Even someone who could not understand why it was entirely valid could at least recognize that it was not some flight-of-fancy.
Clive really wanted some version of the comment to appear. He went so far as to take the time to do a rewrite that he guessed would pass the censors, though it is impossible to know since they refused to explain what bit of it they wanted changed. I could have posted that. Or I could have tried to just repost the above version again, based on my speculation that they were working from the original version that did violate their rule.
But you know what? Fuck that.
On top of everything else, the single message I got from PMC contained a rather rude threat to pull my credentials to comment there if I again violated those vague rules they refused to clarify. That would create a rather difficult situation for someone who wanted to continue to give them free content. Fortunately for me, I am not inclined to gift them any more.
PMC is a ghost town. There are very few comments, and their content is usually little more than a tweet’s worth. Based on what I have seen, the comments are of low quality on average, and are about equally likely to criticize something that is right or inconsequential as they are to identify obvious major flaws. Thus no one ever thinks, “I should check PMC for comments on this paper I am reading.”
PMC needs people to give them free content, and a lot more of it, before it has value. They need force authors to reply to comments, not to comply with takedown demands. So if they are going to blithely delete good content (I really doubt that they get, on any of their million pages, even one analysis per day of this quality), I am not really inclined to help them out. It is not like I or my loyal readers get any benefit from it appearing there, rather than just here or in The Daily Vaper.
During the course of my career, I have thought a lot about how to try to fix some of the problem of health science journals publishing so much obvious junk. Something like PMC, with its relatively large budget and the visibility that PubMed brings, is an approach with some potential. But if the plan is to crowdsource voluntary comments, then it needs to be free and open. Being rude and threatening to your voluntary contributors is definitely contraindicated.
A small operation that picks and chooses a very few targets, like a journal that encourages critical analysis or like Retraction Watch, can operate with tight command-and-control (if those in command are skilled and honest). But that cannot work for a system like PMC.
Trying to overlay command-and-control upon a crowdsourced system is hopeless. It is basically like Soviet planning, where someone tries to control a system (the market; the scientific debate) that is far too complicated to handle from the top down. But that is apparently what PMC is trying to do. They do not have the capacity to assess their content or, apparently, even to respond to requests for clarification. Instead, they just delete something without explanation if an author complains. Needless to say, this is an epic fail.
Moreover, given this arbitrary Soviet system, even someone who really really wanted to post to PMC would be a fool to write an analysis as detailed as mine, let alone the more involved analysis that would be required for a paper that is not so patently stupid. At least if you only write a tweet’s worth of content, you have not lost much if they delete it, and you stand some chance of guessing what to change if they make a vague demand that it be changed.
In summary: The way epidemiology models are created and accepted is a disaster, and the vast majority of the literature is suspect. This also specifically makes it very easy for intentional liars like Glantz and Dutra to concoct models to support their political positions. Not only do the academic and journal community do nothing to resist this, but they actively work to resist any effort to respond to it, out of fear of admitting just how much rot there is. PMC is just one example of the many projects that could theoretically be part of the solution, but that actually remains part of the problem, resisting real scientific criticism.
Oh, and also, we rely on this literature to make behavioral, medical, and public policy decisions. Have a nice day.