Numerous confounding factors reduce the likelihood of replicating psychological studies, which are rarely replicated anyway. For example, the race of participants in an experiment or the geography of where the experiment was run can reduce the likelihood of getting the same result, and if it's a survey of college students, forget about it.

Last year, the Reproducibility Project, a collaborative of psychologists, sought to replicate the findings of 100 previously published psychology studies. However, it was able to do so with only 39 percent of these studies, raising questions about the validity of the original scholarship. In March, a group of psychologists from Harvard University and the University of Virginia published a critique in Science raising doubts about the Reproducibility Project's findings. They concluded that its analysis was statistically flawed and that several replication studies were poorly designed.

No surprise that teams of psychologists disagree on replication and design, psychology has no theoretical foundation to ground it. Statistics can mean anything in that arena. In a new PNAS paper, scholars focused on the nature of the research topic in the original studies. They re-analyzed all 100 papers that the Reproducibility Project sought to replicate.

Specifically, they assessed the extent to which the effects reported in the original studies were likely to be influenced by contextual factors such as time (e.g., pre- vs. post-Recession), culture (e.g., Eastern vs. Western culture), location (e.g., rural vs. urban setting), or population (e.g., a racially diverse population vs. a predominantly white population). In other words, they appraised the contextual sensitivity of the topics in the original 100 studies. The coders were blind to the results of the Reproducibility Project's replication attempts for all the papers they coded.

The researchers then examined the relationship between ratings of contextual sensitivity (i.e., how likely context would affect the chances of replicating a given study) with the findings from the Reproducibility Project.

The results showed that context ratings predicted replication success even after statistically adjusting for methodological factors such as effect size and statistical power. Specifically, studies with higher contextual sensitivity ratings--where, for instance, altering the race or geographical location of study participants could alter the results--were less likely to be reproduced by the Reproducibility Project researchers.

In a second analysis, the team examined which of the 100 replication studies were endorsed by the original authors--prior to the Reproducibility Project's data collection. Here they found that replication studies, which were not endorsed by the original authors, were far less likely to reproduce the results.

 Jay Van Bavel, an associate professor in NYU's Department of Psychology and the paper's lead author, says it's not just psychology, noting that Sir Isaac Newton believed that his contemporaries were unable to replicate his research on the color spectrum of light due to bad prisms. After he was able to direct them to better prisms (ones produced in London, rather than Italy) they were able to reproduce his results. In modern times, studies using mice or rats may be hampered by subtle environmental differences, such as food, bedding, and light, which can affect biological and chemical processes that determine whether experimental treatments succeed or fail. 

And in psychology, survey results vary, like if the college students were only undergraduates in psychology getting extra credit.