The Replication Crisis and Open Science

Source: https://i0.wp.com/scienceandtechblog.com/wp-content/uploads/2019/12/70254/can-a-research-accelerator-solve-the-psychology-replication-crisis.jpg?resize=1000%2C200&ssl=1

In the following, we get to know the replication crisis, its underlying causes and possible ways to tackle them by adhering to open science principles.

The present information addresses all interested readers but especially students planning to write their thesis. Our main goal is to give a short overview and point readers to further information.

What is the Replication Crisis?

Before turning to the nature of the replication crisis, let us take a step back to understand the basic idea of science and the gain of knowledge.

When trying to learn something about the world, we establish theories about it. To validate them, we create a research design which allows testing our derived hypotheses. Since we are seldomly able to test our hypothesis on all entities of interest, we fall back to samples of that entities. This is where inferential statistics gets important. Once we have our research question, know how we want to test it and on which data, we need a paradigm to guide our gain of knowledge. Whereas we are not able to prove that a hypothesis is true for all (future) observations (e.g., all swans on the whole world and for all times are white), we can render it false (e.g., if we see a black swan). This principle is called falsificationism. Therefore, all knowledge only reflects our current empirical findings (“state of the art”).

In order to change the state of the art (i.e., advance our knowledge), we need solid empirical findings. At the very least, results of a study should be reproducible and replicable.

What is the difference between Reproducibility vs. Replicability?

Reproducibility means that another person can get the same results as presented in the original study. This requires data (or code) and knowledge about the processing and analysis. Reproducibility is your responsibility: you have to provide the necessary resources and information so that other people can reproduce your results.

Replicability means that another person can get similar results from a new sample following the research design and analysis of the original study. This requires materials, procedures and knowledge about the processing and analysis. Replicability is not directly your responsibility: An effect may not replicate despite best scientific practice.

Both concepts are basically about transparency which is essential for good scientific practice and collaboration. You find more detailed information on reproducibility policies and guidelines of certain institutions on OSF.

Oftentimes, however, previous results cannot be replicated, which happens for various reasons:

differences between the original and the replication study
the original finding was a false positive finding (i.e., despite the positive finding, there actually exists no difference) or
the replicated finding was a false negative finding (i.e., despite the negative finding, there actually exists a difference).

Trying to replicate 100 experimental and correlational studies, the Open Science Collaboration (2015) found that only 36% of the replications had statistically significant results although a substantially higher percentage of successful replications was expected. For this reason, some people speak of a replication crisis. If we cannot replicate findings, how can we trust our knowledge that is based on these findings?

Consequences of false positive findings also spread to the general public. Fabrication of significant results is not only damaging the scientific progress and but also the general trust in science. Additionally, the public does not learn about failed replications as quickly as about the findings of the initial study. Non-replicated knowledge may persist for a long time.

Now that we got a short overview of the very basics of science and the replication crisis, let us turn to possible reasons for the replication crisis.

Possible Reasons for the Replication Crisis

In general, researchers have many degrees of freedom when it comes to the measurement, the processing, and the analysis of data. Consequently, results often differ dramatically. There exist some general attempts to overcome these problems (e.g., by optimizing retrodictive validity; for more details see here) but these may not always be feasible in practice (e.g., outside of experimental research). Much worse, beyond these general problems, there exist questionable research practices (QRP) which we should be aware of. In the following, we get to know some of them. Note that they are not mutually exclusive.

p-hacking

This may be the most obvious example of a bad research practice. Generally, p-hacking describes the practice of performing a large number of tests within a given data set to obtain significant results. The more tests we perform (without correction), the more likely it is that we get a statistically significant result purely by chance. Furthermore, p-hacking may also include practices like deliberately dropping data from a specific analysis.

Selective Reporting of (Dependent) Variables

Although dropping (dependent) variables from the analysis may seem like a harmless practice, it decreases the likelihood of a successful replication. For example, by selecting certain items “trending in the right direction” when the complete scale failed to yield significant results, researchers are likely to capitalize on chance findings, which may not replicate (besides invalidating given test criteria e.g. reliability and validity of the measurement instrument).

Hypothesizing After the Results are Known (HARKING)

Formulating your hypotheses after you know the results is a particularly bad practice. Post-Hoc hypothesizing blurs the distinction between exploratory and confirmatory analysis. Whereas the former is to generate hypotheses, the latter is to test them. Testing hypotheses on a data set which was also used to generate them in the first place yields misleadingly positive results.

Only Reporting Statistically Significant Results

Like mentioned before, performing multiple tests (p-hacking) or using plenty different analysis methods until any result turns out significant is a bad practice. Moreover, we should always report all statistical analysis undertaken.

Collect more Data after the Initial Data failed to yield Positive Results

Likewise, collecting more and more data until we finally get a significant result is not good scientific practice. Data acquisition and analysis (i.e., inferential statistics) should be separated.

Publication Bias

But it’s not all only about individual practices. To have a prosperous career as researcher, one has to publish a lot and fellow researchers have to cite one’s work (i.e., you need a high impact factor). Unfortunately, these factors foster the publication of particularly unexpected or novel findings without making sure the findings will also replicate.

For all that, what can we do to change the conditions?

Possible Solutions to the Replication Crisis

With respect to the reasons mentioned here, there is an important quote that you should keep in mind:

“The first principle is that you must not fool yourself – and you are the easiest person to fool.” (R. Feynman)

This means that although you really want to get it right, that is, you really want to get the best and most accurate result from your data, it is very easy to betray yourself.

Think about it like a pilot: Surely, they know how to fly a plane. Yet, whenever they start a plane, they go through a checklist: Is the airplane fueled? Are the lights turned on, etc? Surely, that is trivial. But like a pilot who starts an airplane several times per day, maybe, it is easy to just forget about some important things.

Luckily, those checklists also exist for psychology. For example see the article in nature or frontiers in Psychology. Be transparent and record what you did, so if your plane crashes, at least others can learn from your mistakes.

Furthermore, there are a number of other tools that make it possible to help yourself not to fool yourself. We will get to know some of them in the following.

Open Science Framework

OSF is a free, open-source web app that manages research projects at all stages of the research lifecycle and connects the tools researchers use. In the following, we will get to know relevant features for advancing open science offered by OSF.

Preregistration

Preregistration means making your research endeavor openly accessible before starting data acquisition. This facilitates connection of research projects. Further benefits are a clear outline which makes you think ahead, easier to publish non-results and „protection“ against reviewers asking for an adaptation of your hypotheses. Furthermore, it makes the distinction between exploratory and confirmatory research explicit.

OSF provides thorough documentation on how to pre-register (you find more information here). If you still need help preregistering your study, you can register for our methods consulting.

Open Data & Open Materials

Sharing your data and materials is an integral part of open science. You should make sure to provide a codebook, document your workflow (data processing and analysis) and license your data (eg., under Creative Commons License). Try to pre-register as much information, material and data as possible. For detailed information on the implementation visit the OSF site of Prof. Dr. Kai T. Horstmann (e.g., for creating a codebook and a reproducible workflow).

Badges

Despite the growing open science community, adhering to the guidelines is usually not mandatory. OSF created badges to incentive individual behavior. Badges can be awarded to your research article by journals. You can get badges for Preregistration, Open Data and Open Materials. Here you find more detailed information.