I’ve written in the past about the tendency of some researchers to compensate for weak study design or small sample size by over-hyping their research findings, particularly with the news media. Unfortunately, we seem to have another case in point with a recent publication, entitled “Phthalates and attributable mortality: A Population-based longitudinal study and cost analysis.” The paper was published on-line on October 12 and was accompanied by a press release from NYU Langone Health, which later that day was amplified by a CNN article by an author who has, knowingly or not, helped to exacerbate this problem. Noticeably absent from the CNN story is any perspective on this new study from independent scientists, something that once was standard practice among science journalists, but now seems to be rarely done, possibly in order to meet increasingly tight deadlines. The result is less objectivity and balance in media stories about new scientific findings. 

 

For those who are unfamiliar with phthalates they are a family of structurally related, but functionally and toxicologically diverse, chemicals that are added to plastics that are used in a multitude of consumer and industrial products. They can be found in a range of everyday items consumers depend on to function properly, including food packaging, personal care products, medical devices, electrical cables, automobile interiors, flexible hoses, flooring, wall coverings, coated textiles, luggage, sports equipment, roofing membranes, pool liners and footwear. 

 

In reviewing the toxicity of phthalates, a National Academy of Sciences Panelconcluded that the male reproductive system was the most sensitive target for phthalate exposures, but that “… not all phthalates are equivalent in the severity of their effects. The phthalates that are most potent in causing effects on the development of the male reproductive system are generally those with ester chains of four to six carbon atoms; phthalates with shorter or longer chains typically exhibit less severe or no effects”. 

 

The lead author of this new study on phthalates, Dr. Leonardo Trasande, and his colleagues’ findings are provocative, but their recommended policy actions are extreme and unwarranted from the available data or the overall weight of the evidence available in the broader scientific literature. The authors reported finding increased mortality rates from all causes of death combined in relation to high molecular weight (HMW) phthalate metabolites, especially those of one type of phthalate, Di-2-ethylhexylphthalate (DEHP).  They also reported mortality rates for heart disease and stroke combined (CVD) that were significantly increased in relation to only one of four DEHP metabolites, mono (2-ethyl-5-oxohexyl) phthalate (MEOHP) they evaluated.  No associations were found between phthalate exposures and deaths from all cancer types combined. 

Making numerous assumptions, including the most outrageous one, that the associations they found are indeed casual ones, Trasande et al then estimated that phthalates are responsible for causing about 100,000 deaths in the U.S. each year and more than $40 billion in annual lost productivity. They then called for further regulatory limits on the use of phthalates in food contact materials and consumer products, and even reducing or eliminating the use of plastics altogether.  

Although Trasande acknowledges that his latest study does not establish cause and effect, his admission appears disingenuous, as it is buried deep within the press release and the CNN story in smaller font underneath headlines that the authors know all too well will grab the most attention. Moreover, additional quotes from Trasande directly contradict and/or dilute his tacit admission. For example, later in the press release he unabashedly claims “The evidence is undeniably clear that limiting exposure to toxic phthalates can help safeguard Americans’ physical and financial well-being”. For such a statement to be true, it would require that the investigators had conducted a prospective, interventionist-type study whereby they took action to deliberately reduce phthalate exposures in one group and then observed a consequent reduction in negative health impacts. However, Trasande, and indeed no other scientists, have done anything remotely like this. Thus, his proclamation is NOT scientifically supportable.

Over-hyped science: A well-documented phenomenon

That the NYU press release and the CNN story over-hype Trasande’s findings is not a surprise to me. It called to memory a 2014 study published in the British Medical Journal which compared 462 biomedical and health related science press releases issued by 20 leading UK universities to the associated peer reviewed research papers and subsequent news stories, and found that 40% of the press releases contained exaggerated advice and 33% contained exaggerated causal claims. Furthermore, when press releases contained such exaggeration, 58% of the subsequent news stories also contained such exaggeration. Academic scientists and university press offices need to do better for the public since these exaggerations, which later are found to be untrue or misleading, contribute to ever growing public distrust of science and the scientific community at large.

Implicating all phthalates is not warranted

The press release and the CNN story refer to phthalates in general and do not distinguish between individual chemicals that comprise the class1.  They mistakenly imply that the associations observed were either with all phthalate metabolites combined or with every type of phthalate metabolite analyzed.  In fact, the strength of associations between all cause mortality, and CVD mortality actually varied considerably among individual phthalate metabolites and the three groupings devised by the authors, so it is misleading to lump all phthalates together when discussing the findings in the press release and subsequent media reporting.

Observational studies are prone to errors and misleading results

Trasande et al have conducted an observational epidemiology study.  Such studies are prone to providing erroneous results due to chance, measurement errors and confounding by extraneous factors as has been well-chronicled by Dr. Aaron Carroll and numerous others. As a consequence, epidemiologists are usually very cautious in interpreting the results of a single, un-replicated study, particularly if it is the first one that reports a new finding such as this one, because they most often provide wrong results.  As a consequence, readers should exercise some skepticism when interpreting the press release, the CNN story and the study findings.  To provide some context it has taken the medical community decades, and hundreds of research studies, involving hundreds of thousands of patients to amass a sufficient body of scientific evidence in order to gain consensus on what are the half dozen or so established risk factors for CVD.  By contrast, the available evidence on phthalates from this new study, combined with prior studies, pales by a wide margin in comparison both quantitatively and qualitatively. 

I’ve found the following elegant quote from an illuminating article by Julia Belluz and Steven Hoffmann to be worthy of framing and it should be prominently displayed in the office of every practicing scientist and science journalist:

“That science can fail, however, shouldn’t come as a surprise to anyone. It’s a human construct, after all. And if we simply accepted that science often works imperfectly, we’d be better off. We’d stop considering science a collection of immutable facts. We’d stop assuming every single study has definitive answers that should be trumpeted in over-the-top headlines. Instead, we’d start to appreciate science for what it is: a long and grinding process carried out by fallible humans, involving false starts, dead ends, and, along the way, incorrect and unimportant studies that only grope at the truth, slowly and incrementally.”

Scientists and journalists need to exercise greater humility and recognize that a single epidemiology study of a topic contributes only incrementally to our cumulative knowledge base and could, in fact, be misleading.

 

Documented flaws, affirmed by peer-review, with Dr. Trasande’s previous research

Casual readers of this new study by Trasande, especially his efforts to ascribe cost burden estimates to chemical exposures, should be made aware of the numerous flaws noted in published critical reviews of his past work(see publications by Bolt, 2017Bond and Dietrich 2017Gallagher, 2015Middelbeek and Veuger, 2015Swaen, 2016 as well as a March 2015 BBC interview with Professor Richard Sharpe).  Also, see blog post by Bond. As a consequence it's caveat emptor.

Finally, the estimates of annual numbers of additional deaths and of costs of lost economic productivity attributed by Trasande et al to DEHP metabolites are merely speculative and thus for the sake of fairness and transparency estimates of zero additional deaths and costs should have also be included in the discussion.

Below I examine this particular study in greater detail and highlight the Defensible, Less Defensible and the Indefensible, and so invite the more dedicated among you to read on. Trasande and his colleagues may level accusations of “manufacturing doubt”, but as you’ll see such an accusation is entirely baseless as the doubt is inherent to the study methods they employed and limitations of the data they relied upon.  Such accusations are, in fact, antithetical to the practice of science itself.  Rather than trying to intimidate and quash legitimate scientific discussion, the authors should welcome scrutiny, as doubt in scientific enquiry is constructive, no matter how challenged one may feel by it.

What was the Study About?

The authors explored relationships between measured levels of eleven individual and three groups of phthalate metabolites found in single point in time urine specimens collected from about 5300 U.S. residents and subsequent mortality among them from all causes of death, CVD and all types of cancer. Further, restricting their analysis to mortality among 55-64 year-olds, and making the bold assumption that the relationships they observed were indeed causal ones, as well as other assumptions, the authors estimated the numbers of annual deaths in the U.S. which they assert are attributable to phthalates, and the associated annual costs due to lost productivity.

How was the Study Conducted? (Condensed here for the sake of brevity.)

The authors did not collect any new data, but instead relied upon existing publicly available data collected by the Centers for Disease Control and Prevention (CDC) – specifically the National Health and Nutrition Examination Survey (NHANES) conducted between 2001 and 2010.  CDC’s sampling plan followed a complex, stratified, multistage, probability-cluster designed to select a representative sample of the civilian, non-institutionalized population in the U.S. based on age, gender, and race/ethnicity.

NHANES combines participant interviews and physical examinations and thus has data on a host of personal factors from the participants, including age, gender, race/ethnicity, educational level, family income, smoking status, alcohol consumption, physical activity, 24 hour dietary recall, body weight and height, and laboratory analysis of blood and urine specimens.  Levels of phthalate metabolites in urine were measured in one-third of randomly selected NHANES participants by the Division of Laboratory Sciences, National Center for Environmental Health, CDC.

The authors limited their study to the 5303 NHANES participants who were 40 years or olderwhen they were enrolled in the 2001-2010 time frame, and for whom urinary phthalate metabolite measurements were available.  Deaths among the participants through December 31, 2015 were identified from the NHANES Public-Use Linked Mortality File. That file also includes coded underlying causes of death for each deceased participant.  The authors focused their analysis on deaths from all causes combined, deaths from cardiovascular disease (which includes heart disease and stroke), and from all malignant neoplasms combined.

Numerous analyses of the data were conducted principally using Cox proportional hazards regression models to estimate hazard ratios and 95% confidence intervals to explore relationships between various phthalate metabolites, individually and in combinations, and risk of mortality while attempting to sequentially control for potential confounding by multiple covariates (i.e., first age, race/ethnicity, and creatinine; then gender3, educational level, family income level, smoking status, alcohol intake, physical activity, total energy intake and a measure of overall dietary quality; and finally Body Mass Index (BMI)).

The authors also conducted additional multiple analyses in an attempt to investigate the rigor of the associations they found.  This included, among others, analyses to test the possible contribution of potential confounding by risk factors for which they had no or incomplete data. 

To estimate the annual number of U.S. deaths they asserted are due to phthalates, the authors employed several steps.  They first multiplied the age-standardized all causes mortality rate for 55-64 year-olds in the U.S. in 2014 by the hazard ratio for the highest tertileof metabolites for DEHP, and then subtracted the age standardized mortality rate from the total.  The difference was then multiplied by a U.S. Census Bureau estimate (2010) of the total population of 55-64 year-olds to generate the annual estimate of numbers of deaths attributable to phthalates.  To calculate the lifetime economic productivity (LEP) loss due to death, the authors multiplied the hypothetical number of phthalate-attributable deaths by LEP estimates for 2009 published by another investigator for 55-59- and 60-64 year-olds, and then adjusted them to 2014 by using Bureau of Labor trends in general consumer prices.

What were the Authors’ Findings and Conclusions?

Trasande et al reported finding increased mortality rates from all causes in relation to what they defined as HMW phthalate metabolites, especially those of DEHP.  They found no associations between levels of urinary phthalate metabolites and mortality from all cancer types combined.  They reported mortality rates for CVD that were significantly increased in relation to only one of four DEHP metabolites evaluated, that being mono-(2-ethyl-5-oxo-hexyl) phthalate (MEOHP).  Making numerous assumptions, Trasande et al estimated that, among the population of 55-64 year-old Americans, 90,761-107,283 annual deaths and $39.9-47.1 billion dollars in annual lost productivity were attributable to phthalate exposure.  They then called for further regulatory limits on the use of DEHP in food contact materials and consumer products, and even reducing or eliminating the use of plastics altogether.

The Defensible

Much of the published epidemiology literature which has linked chemicals that are purported to be endocrine disruptors to various adverse health outcomes has come from cross-sectional study designs.  In a previous article, I wrote that cross-sectional studies are among the weakest of observational epidemiology study designs and the results they yield should be accorded far less weight than the results generated from studies employing cohort or even case-control designs. Supporting my contention, just recently, the European Food Safety Authority opined that they considered the evidence from cross-sectional studies as too unreliable to give them much weight in their safety assessments.

This new study from Trasande et al is a hybrid that combines some features of a cross-sectional design with others from a cohort, or longitudinal design. The participants, as did all of the data on putative risk factors for mortality which the authors considered, came from a series of cross-sectional surveys conducted by CDC in two year cycles. Although values for a few of those risk factors are fixed and considered immutable over time (e.g., age at enrollment, race/ethnicity and gender (arguably)), values for most of them can and do vary over a person’s lifetime. Moreover, values for some of those factors, including the main ones of interest, i.e.,  levels of urinary phthalate metabolites, vary considerably within an individual throughout a single day. Further on, I’ll discuss how this may have jeopardized the validity of the reported findings.  

The longitudinal component of this study comes from follow-up (retrospective in this case) that was done through 2015 to identify any deaths that occurred among the participants and the associated underlying causes.  In a pure cross sectional design the information on putative risk factors and the health status of the participants are assessed simultaneously which raises a critical and unanswerable question about which came first, the exposure or the adverse health effect.  It is axiomatic that the exposure must precede the health effect in time in order to be a viable cause of it, so therein lies the inherent flaw with cross sectional studies.

The longitudinal design employed by Trasande et al ostensibly aspires to obviate this problem because the deaths could have occurred only after exposures had occurred and were already documented.  In point of fact, however, when dealing with mortality from chronic diseases, such as heart disease, stroke and cancer, the seeds for disease that ultimately led to death are likely to have been planted many years earlier, so establishing the sequence of exposure and onset of disease is not as straightforward as the authors might have us believe.

Atherosclerotic cardiovascular diseases (CVDs) are the biggest causes of death worldwide. Decades of research have determined that atherosclerosis develops insidiously over many years, is often advanced by the time that symptoms occur, but may then kill rapidly. Trasande et al discuss latency periods between exposure and disease, but only in the context of their assessment that they could not evaluate risks associated with metabolites from more recent HMW replacements for DEHP, i.e., including diisononyl (DINP) and diisodecylphthalate (DIDP).  This seems disingenuous on several levels.  First, the authors studiously avoided any discussion about the relevancy of the levels of urinary metabolite levels measured within a few years of death from diseases that are likely to have developed many years earlier.  And second, DINP and DIDP have been in commercial use in the U.S. since the 1970’s, and have been annually consumed in the hundreds of thousands of metric tons since 1990.  Moreover, CDC has been measuring urinary metabolites of DINP since 1999 and DIDP since 2005, so the authors’ decision to exclude them  from their analysis is puzzling to say the least.  It is worth noting, however, that 86.1% of samples CDC analyzed for mono-isononyl phthalate were below the lowest level of detection, so its exclusion on that basis may have been justified, but it wasn’t given as the reason by the authors.

The authors did conduct a “sensitivity analysis” after excluding participants who had evidence of CVD or cancer at the time they participated in the NHANES survey and concluded that it showed no attenuation in the risk of all cause mortality (Table S10). However, conspicuously they did not mention the clear attenuation that is evident in the risk of CVD mortality.  A strong argument could be made that it would have been a better choice to have simply restricted the study to those who did not have either CVD or cancer at baseline enrollment in NHANES, as this is standard practice in most longitudinal studies.

The NHANES sampling plan and the quality control procedures they employ to collect and report data are strengths of this study.  Temporarily suspending concerns about the underlying validity of the data, there was also information available on some of the most well-established risk factors for premature mortality, CVD and cancer, which isn’t always the case in studies such as this. The statistical analyses undertaken by the authors appear to have been thorough and competently performed (although one would like to have seen greater transparency on the rationale for sequencing variables into their models) and their providing links to the many supplemental tables is appreciated. 

 The Less Defensible

Despite these strengths, there are more than a few notable limitations to the study. First, some will question the authors’ decision to include all causes mortality and all cancer mortality as health outcomes for analysis, rather than make CVD the sole focus as it was ostensibly their primary a priori hypothesis.  Deaths from all causes combined includes a variety of common causes (e.g., motor vehicle deaths, suicides, respiratory diseases and many others) that have not been postulated as being plausibly linked to phthalate exposures.  A similar argument can be made for considering all cancer types combined.  And yes, it can be argued that by including a host of causes of death that are not likely linked to phthalates the authors probably only diluted the strength of their findings, but doing so also added numerous extra statistical tests, thus increasing the probability of spurious findings (i.e., the multiple comparison problem) and it has also made the findings more difficult to interpret biologically (the discussion of possible modes of action presented by the authors appeared ill-thought out and haphazard).  Likewise, the decision of the authors to focus exclusively on all causes mortality for estimating phthalate attributable deaths is also suspect, particularly without providing adequate justification for doing so.  Again, a strong argument could be made that they should have focused solely on CVD mortality or that, at the least, they should have also included an estimate of phthalate-attributable CVD mortality.  Had they done the latter it would have allowed the reader to understand the proportion of excess all-cause mortality the authors believed was contributed by excess CVD mortality vs. other causes of death, which would have been useful and enlightening.

 By epidemiology standardsthe associations reported in the study between various measures of urinary phthalate metabolite levels and all causes mortality and cardiovascular disease mortality were often fairly weak (10-15% increase in risk overall and up to 70% for only one association) and were either not statistically significant or were of borderline statistical significance, and thus should be interpreted cautiously in light of the likelihood of measurement errors, chance and confounding.  Moreover, those findings often lacked internal consistency, something epidemiologists look for to help sort the wheat from the chaff.  For instance, the authors state that their results particularly indict metabolites of DHEP in premature death from all causes and CVD, however, a close examination of Table 3 shows no association for either category of death with mono-2-ethyl hexyl phthalate (MEHP), which is considered the principal metabolite of DEHP, and which is the one metabolite Trasande et al focus to cite animal evidence to claim biological plausibility. 

Information on some of the most important potential confounders (e.g., blood pressure, blood and urine chemistries, diet, physical activity, tobacco and alcohol use) came either from measurements done on or interviews with the participants at a single point in time.  As discussed earlier, there is a problem with extrapolating from a single point in time measurement to assume representation over a lifetime or critical time periods responsible for disease initiation.  Furthermore, the accuracy of self-reported information on some of these factors (e.g., physical activity,diet, and alcohol use) has been found by other studies to be suspect, thus measurement errors can be expected for nearly all of these factors which may have diminished the success with which control of confounding was actually achieved in this study.  Such errors become especially important when the associations that were reported are relatively weak, inconsistent and of borderline statistical significance as was the case in this particular study.

The CDC warns that:

“The measurement of an environmental chemical in a person’s blood or urine does not mean, by itself, that the chemical causes disease. Advances in analytical methods allow us to measure low levels of environmental chemicals in people, but separate studies of varying exposure levels and health effects are needed to determine whether blood or urine levels result in disease. These studies must also consider other factors such as duration of exposure.” In fact, Trasande and his colleagues did not have any measures of duration of exposure, so they could not evaluate this important metric.

The CDC has reported that the levels of DEHP metabolites have been steadily declining over the past several decades and that the levels measured are many times lower than those that have produced adverse health effects in controlled animal studies.  This is true for metabolites of other phthalates as well where measured levels in humans have been found to be below those found to be safe in animal studies. US EPA has further reported declining trends over time in levels of DEHP, dibutyl phthalate (di-n-butyl phthalate and di-isobutyl phthalate) (DBP), and butyl benzyl phthalate (BBzP).  Trasande et al never discuss these trends and the implications for their findings going forward, nor do they address the biological plausibility of how such low levels of metabolites (µmol/L), could cause such devastating health effects.

And the Indefensible

One of the most important limitations of the study relates to the measurements of urinary phthalate metabolites.  A limitation of the NHANES data is that urine samples were collected from the participants at a single point in time and thus Trasande et al assumed that the levels of phthalate metabolites measured represent lifetime exposures or perhaps, more accurately, the levels of exposure that triggered the disease processes that led to death..  In fact, phthalates are readily metabolized (often completely within 24 hours) and are typically non-persistent in the body, so that a single point in time urine sample does not necessarily provide an accurate assessment of lifetime exposure.  Day to day urinary phthalate levels can vary widely within individuals and they reflect only recent exposures. (also see Meeker et al) Trasande does acknowledge this as a major limitation of the study, but tries to minimize it. However, the stark reality is that no one knows how this critically important assumption may have affected the results.  It seems antithetical to science for Trasande to have not revealed this critical limitation in the NYU press release.

Unfortunately, Trasande et al presented a one-sided, biased review of the scientific literature that addresses possible links between phthalate exposure and obesity and diabetes and ignored any contradictory evidence.  They implied these links are causal and represented them as established science; however, this is far from the truth.  For example, the CDC has issued some caution about this as they warn that it is very difficult to tease out what may be intractable confounding between high caloric intake, BMI and measures of urinary phthalate levels.  Diet is the main exposure route for phthalates and thus there is a strong correlation between phthalate exposure and high caloric intake.  Higher caloric intake can contribute to a higher BMI.  And  since BMI is an important risk factor for premature mortality and for CVD, it is not at all clear from the study whether the associations reported between some phthalates and mortality are due to phthalates or due to confounding by higher caloric take which contributes to higher BMI.  Owing to the strong correlation between the three, it might not be possible to completely sort this out at all in studies such as this one.

 

Conclusions

 

No single observational epidemiology study, especially the first one to report a new association, is capable of establishing a cause and effect relationship between a putative risk factor and disease.  This new study by Trasande et al is no exception. It  has some strengths, but as has been pointed out, it also has some important drawbacks that must be acknowledged.  The most important limitations relate to the single point in time measurements of urinary phthalate metabolites and of other risk factors for premature mortality, which introduce the possibility of important measurement errors that could have substantially influenced the results. Additionally, as the CDC has pointed out, the strong correlations between high caloric intake, BMI and phthalate exposures make it extremely difficult to sort out the independent contributions of each to mortality from all causes and particularly from CVD in studies such as this one.  All of these potential problems become more acute when, as is the case with this study, the associations that were reported are relatively weak, inconsistent and of borderline statistical significance.  The estimates of phthalate attributable deaths and costs reported by the authors should be considered as highly speculative and one cannot rule out that they may in fact be zero.  The extreme policy recommendations made by the authors are simply not justified based on the available scientific evidence.

1According to the ACC High Phthalate Panel, the term “phthalate” simply refers to a family of chemicals that happen to be structurally similar, but which are functionally and toxicologically distinct from each other. Phthalates are typically categorized as high and low, depending on their molecular weight. High molecular weight phthalates have 7 or more carbon atoms in their chemical backbone that gives them increased permanency and durability. Furthermore, high phthalates have been thoroughly studied and reviewed by a number of government scientific agencies and regulatory bodies worldwide. These agencies have found that high phthalates are safe for existing uses.  High phthalates include: diisononyl phthalate (DINP), diisodecyl phthalate (DIDP) and dipropylheptyl phthalate (DPHP).

2The published article is somewhat confusing as to the age group actually studied.  According to the abstract it included 5303 adults age 20 years or older.  However, in section 2.1 on page 2 the authors claim their analysis included only adults 40 years or older (this makes the most sense as the authors report the median age of the group to be 56.6 years).  Later in section 2.6 on page 3, the authors say they quantified mortality among 55-64 year-olds, but presumably this was solely for the purposes of estimating numbers of annual deaths attributable to phthalates.  Nevertheless, it’s confusing.  One would have thought that this confusion would have been caught and cleared up during the peer-review process.  It would be helpful if the authors could work with the publisher to further edit the paper to resolve it.

3According to the authors’ description in section 2.5 on page 3, they included sex (gender) as a covariate they controlled for in their Model 2; however, the legends below Tables 2-3 and Tables S2-6 and S10-11 consistently omit sex (gender) in the list of covariates that were included, thus leading to confusion as to whether they indeed did control for potential confounding by sex (gender).  Once again, it’s disappointing that this wasn’t caught during peer-review and it would be helpful if the authors could work with the publisher to further edit the paper to resolve the issue.

4According to the authors’ descriptions in section 2.7 on page 3, and Table 4 they used the highest tertile of DEHP metabolites. however, at the bottom of page 5 they say they used the highest quartile, once again confusing the reader.  Presenting yet another opportunity for the authors to further edit their paper to clarify a discrepancy. 

5Several prominent epidemiologists have expressed the view that hazard ratios less than 2 (see Doll and Peto) or even 3-4 (see Temple) should be regarded skeptically given likely contributions of measurement errors, chance and confounding.