Reproducible Research
Motivation and Rationale
If you are going to take the time to build computational tools for humanistic research, you want others to be able to reproduce what you did. You want others to be able to verify your results, but also to use the tools you built to further their own projects.
Reproducible research may be an unfamiliar concept to many humanists, but its motivation in the Digital Humanities is to ensure that other people will be able to use and modify the computational tools you have taken the time to build. Reproducible research is motivated by several questions:
- Will others be able to run my code?
- Will others know that my code works and does what I say it does?
- Will others be able to adapt, modify and contribute to my code?
There are several factors that contribute to producing reproducible research. They include:
- Well-organized projects that others can understand
- Using a version control system to facilitate collaboration on the project
- Recording your environment so others can run your code on their machine
Reproducible research
Reproducible research is a moving target in the humanities. While the social sciences have recently reckoned with a so-called “replication crisis,” the humanities are only beginning to think about how their research can be reproducible. As the humanities increasingly works with large data sets and computational tools that exceed what can be manually verified by a third-party observer, we need to agree upon best practices that will ensure our peers can trust the validity of our results.
As digital humanists, we can learn several lessons from the social sciences and hard sciences to avoid a “replication crisis” in the humanities. A big step towards producing more reproducible research is writing better code that others can reuse to produce the same results.
Reproducibility comes in multiple forms:
- Someone else wants to download my data and code to verify my results independently
- Someone else wants to use my code on new data to produce their own research
- Someone wants to modify my data and code to test edge cases in my results
Some key aspects of reproducible research include:
- publication of the raw underlying data used to achieve the results;
- clear documentation of the steps taken to achieve the results;
- open source release of the code used for data gathering, analysis, and other steps;
- separating code based on function (i.e. modular code development) so others can interpret and reuse your code;
- documents key decisions and changes in the project (i.e. version control);
- means to ensure the tools do what they are supposed to (i.e. tests, code review).
In addition, reproducible research follows a set of community-defined best practices to ensure that your project can be understood and used by others. These practices may evolve and change over time, but these sets of lessons contribute a set of basic principles that can guide the development of reproducible research in the Digital Humanities.
Resources
- Rik Peels, “Replicability and replication in the humanities.” Research Integrity and Peer Review 4 (2019). https://doi.org/10.1126/science.aac4716.
- Joseph Flanagan, “Reproducible research: Strategies, tools, and workflows.” Studies in Variation, Contacts and Change in English, eds. Turo Hiltunen, Joe McVeigh, Tanja Säily (Helsinki: Research Unit for Variation, Contacts and Change in English, 2017). https://varieng.helsinki.fi/series/volumes/19/flanagan/