Defining Reproducibility

Edit: Nosek and friends had similar thoughts.

A lot of typing and markdown has been written over this issue, so I’ll keep it short.

A recent issue of Science contains a technical comment by Gilbert et al. (2016) on the Open Science Collaboration’s reproducibility project. Among other claims, Gilbert et al. argue that the OSC project was too pessimistic about reproducibility in Psychology and that infidelities in the replication studies could be at the core of the large number of failures to replicate. While Gilbert et al. may have made some mistakes themselves [for a wonderful response see Sanjay Srivastava’s post here], many of their points are well taken, especially about infidelities between research designs.

However, both the original authors in their response in Science and other commentators have pointed out that there is no clear definition for replication or reproducibility. This is a serious problem for the aggregation of knowledge as we do not even have decent agreement upon the metric for repeating results.

While much has already been written about this, I would like to advocate for an important criterion for replication that also focuses on overselling and generalizability of research. Reproducibility must mean that future studies are able to find similar effects in not just the population that the original author’s drew their sample from, but any population that the original authors intended their research to describe, or explain. This essentially means that we have to allow “infidelities” in replication designs so long as the infidelities are testing the same hypothesis in the population of interest.

If a researcher claims that a study done on undergraduates at UCLA describes behavior of all young people with a some higher education, then replication studies should be allowed to draw from non-university samples that fit that description. If a researcher claims that conditional cash transfers work in developing nations when targeted at mothers because of evidence from Mexico, then replication studies should be allowed to replicate using any sample that draws from a population of mothers in developing nations any where else in the world.

It is up to authors to appropriately define the scope of their article, putting their money where their mouths are with respect to who they believe their effect applies to.