Position Paper: Reproducibility and Research Integrity
In 2016 a Nature survey found that 70% of respondents had ‘tried and failed to reproduce another scientist’s experiments,’ although ‘73% said that they think that at least half of the papers in their field can be trusted’ (Baker 2016).
There is a clear crisis in reproducibility, but how deep it is is open to debate. There is ‘no generally accepted, scientific standard for determining whether previous research is reproducible/replicable’ (Duvendack et al 2017, p47). However, it is useful to understand the difference between research that is reproducible, replicable and generalizable.
- Reproducible research is that for which results can be duplicated using the same materials and procedures as were used by the original investigator;
- Replicable research is that for which results can be duplicated using the same procedures but new data;
- Generalizable research is that for which results can be applied to other populations, contexts, and time frames (Bollen 2015).
These terms are sometimes used interchangeably and there are different conceptions of what these terms mean with a lack of consensus in the literature on an agreed definition. In addition, attempts to reproduce research can be ‘narrow’ or ‘wide’ (Pesaran 2003), pure, statistical or scientific (Hamermesh 2007).
Issues of definition and the breadth of any survey ‘makes it difficult to determine replication rates within a discipline’ (Duvendack et al 2017, p47). However, psychology has been cited as particularly problematic (Stanley et al 2018), as ‘studies of a given psychological phenomenon can never be direct or exact replications of one another’ (McShane et al 2017). Economics has also been under the spotlight: Duvendack, Palmer-Jones and Reed (2015), as well as Chang and Li (2015), measured replication rates in economics, reporting low success rates of replication, with Camerer et al (2016) being slightly more successful attempting to replicate 18 studies in experimental economics.
Underlying issues that have led to the reproducibility crisis
There are a number of reasons for the publication of findings that cannot be reproduced. Broadly, they fall into two categories: those that are instrumental and those that are accidental.
Instrumental: The number of global research outputs is doubling every nine years (Bornmann and Mutz 2015). At the same time, academic careers are increasingly precarious (OECD 2021). As a result there is a strong drive to get research noticed, funded and published, and thereby raise an academic’s profile. An important way of doing so is to produce research with innovative or interesting results that has the potential to disrupt accepted paradigms. Such work is more likely to be accepted for publication, chosen for funding, and picked up by mainstream media.
This ‘publication bias’ has led to statistically significant and often positive results taking precedence. At the same time, research that disproves such research tends to be less-well regarded (Dewald, Thursby and Anderson 1986). Not only that, but there are fewer publications that publish replication studies (see Duvendack, Palmer-Jones and Reed 2015 & 2017 for a discussion on which economics journals publish replications), and researchers that undertake such work may be seen as distrustful and/or malevolent (Duvendack et al 2017), and have been characterized as “research parasites” (Longo and Drazen 2016).
That is not to say that researchers who seek to publish research emphasizing statistically significant findings are necessarily conscious of falling prey to the ‘cult of statistical significance’ (Ziliak and McCloskey 2008). However, they are working in an environment where statistically significant and thereby innovative results are more likely to be viewed with interest, which leads to ‘HARK-ing’, or hypothesizing after the results are known (Kerr 1998).
Accidental: Not all unreproducible results are deliberately fostered and favoured. The ATCC (n.d.) highlighted a number of issues around training, resources and supervision that may have led to such results. These included a lack of access to methodological details, raw data, and research materials; use of misidentified, cross-contaminated, or over-passaged cell lines and microorganisms; an inability to manage complex datasets; and poor research practices and experimental design.
The role of different organisations and agents in addressing the reproducibility crisis
Research funders, including public funding bodies
Research funders can play a central part in addressing the crisis, and many are already making important steps to doing so. The open access and open data movements have helped increase transparency in the research process, and the introduction of ‘data management plans’ (DMPs) by UKRI has enabled others to attempt to more fully interrogate findings and reproduce results.
However, as with publishers, there is a danger of favouring counter-intuitive research proposals in the initial selection process. There is also little interest in funding studies that attempt to reproduce results as perceived to be less novel than original research, thus dis-incentivising researchers to engage in reproducibility. This feeds into how publishers view reproducibility; they often fear that replications will be less frequently cited thus having an impact on a journal’s impact factor which is calculated on the basis of the number of citations of articles published in a given journal.
In addition, there is often an unconscious bias in the peer review process for particular individuals, research groups, universities, fields of research or methodologies that may mean that some proposals get less rigorous scrutiny. Wellcome and UKRI are making good steps in addressing this in its people and culture strategies, as well as the increasingly wide adoption of the San Francisco Declaration on Research Assessment (DORA) that identifies the increased ‘momentum toward more sophisticated and meaningful approaches to research evaluation that can now be built upon and adopted by all of the key constituencies involved.’
There is also the problem of enforcement. Although all applicants have to complete DMPs, the checks on whether data has been deposited in appropriate repositories are weak. There is also a need to deposit code as well as data; the latter is of limited value without the former.
Research institutions and groups
The higher education sector has become increasingly marketised. As a result, research institutions are having to position themselves positively against the competition, including their performance in league tables. This has had a number of unintended consequences, including a push to get research funding and increase citation rates.
This is particularly apparent in the way that institutions position themselves for the Research Excellence Framework (REF). Addressing the perverse incentives that result from this is partly the responsibility of government (see below), but it is also the responsibility of individual institutions to resist these pressures and give people the space and freedom to undertake robust research that may be slow in producing results, citations and grants.
As Duvendack et al (2017) point out, ‘while there are increasing calls for journals to improve data sharing and transparency, there is also significant resistance among researchers, as evidenced by opposition to the adoption of a [data access policy] at top finance journals…and the online petition against the data access and research transparency (DA-RT) initiative in political science’ (p48).
Such attitudes are changing, and it is important that individual researchers are willing to change their behaviours to support openness and transparency in their research. We outline some of the ways they can do this in our answer to Q4, below.
As with funders, publishers are central to changing their framework and processes to embed good practice in reproducibility. In some areas, there are few journals that publish replications, or make data and code available (Duvendack et al 2015 & 2017). However, there are significant changes happening, such as pre-registration of research and, in some cases, journals accepting articles for publication before they have been written, based on an outline of the proposed research. This helps to address the issue of ‘publication bias’, discussed above.
It should be recognised that some progress has been made in these areas. For instance, a ‘TOP Factor’ has been created that scores journals on ten different criteria, including availability of data and policies on pre-registration.
Governments and the need for a unilateral response / action
Government policy and resultant legislation sets the framework in which research takes place, and it is therefore crucial for putting in place measures that will necessitate a change of culture. Most current policies, strategies, white papers and roadmaps, including the R&D Roadmap, Innovation Strategy, Plan for Growth, Integrated Review and Data Saves Lives Strategy, do not address this.
Beyond policy and legislation, the government can set the tone and encourage good behaviour. The current consultation on the future of research assessment needs to recognise the unintended consequences and perverse incentives of the existing system, and consider working with individual universities and research organisations, with publishers and with funders to develop a system that will change the framework and embed a culture that favours transparent and robust research.
The policies or schemes that could have a positive impact on academia’s approach to reproducible research
There is a wide range of possible policies, actions, schemes and incentives to help change academia’s approach to reproducible research. These include:
- Funders providing discrete funding for reproduction studies. Some funders are already doing so, such as the International Initiative for Impact Evaluation (3ie), but there needs to be a wider adoption of such policies.
- Funders developing a database of underused software and hardware, which may be necessary for the analysis of specific data as part of a reproduction study;
- Funders and publishers making the reviewing and enforcement of DMPs and DAPs more robust;
- Publishers mandating pre-registration and accepting articles for publication based on an outline of research;
- Publishers employing staff and/or students to routinely run the code on data submitted, as is currently undertaken by a small number of journals, such as The American Economic Review, and some journals exist entirely to replicate findings, such as The International Journal for Re-Views in Empirical Economics.
- Institutions being encouraged to work together to produce common policies and monitoring, and be required to integrate open and reproducible research practices into their incentive structures at all career levels.This should be embedded into their research ethics and also involve other staff, including technicians and data managers.
- Individuals changing the way that postgraduate students and early career researchers are training in research methodologies and publication strategies. The Berkeley Initiative for Transparency in the Social Sciences (BITSS) has developed a textbook that is intended to train people in undertaking open science, and other resources exist to support those teaching students about replication.
- Government working to improve the scientific literacy of politicians, policymakers and civil servants. Myers and Coffé (2021) highlight the fact that, ‘of the 541 MPs with higher education degrees in the 2015-2017 Parliament, only 93 (17%) held degrees in STEM subjects; for comparison, 46% of UK students in 2019 graduated in STEM subjects.’ Such a STEM grounding would enable a better understanding of the ‘science’ underlying research fundings; without it there is a tendency to accept the results at face value and act accordingly. There is a need to ‘embrace uncertainty’ and accept that results are not necessarily clear cut;
- Representative bodies for research disciplines mandating the use of systematic reviews, such as the Cochrane or Campbell Collaborations, or the Open Synthesis group, to review, rate, synthesise and publish best available evidence.
This position paper is based on Eastern Arc’s response to the S&T Select Committee’s call for evidence on research integrity and reproducibility, submitted in September 2021.
Photo by Pawel Czerwinski on Unsplash