Proceedings paper

Title:
R4R: Reproducibility for R
Authors:
P. Donat-Bouillud, F. Křikava, S. Krynski, J. Vitek
Publication:
Proceedings of the 3rd ACM Conference on Reproducibility and Replicability
DOI:
Year:
2025
ISBN:
9798400719585
Link:

Abstract:
Ensuring reproducibility is a fundamental challenge in computational research. Reproducing results often requires reconstructing complex software environments involving data files, external tools, system libraries, and language-specific packages. While various tools aim to simplify this process, they often rely on user-provided metadata, overlook system dependencies, or produce unnecessarily large environments. We present r4r, a tool that automates the creation of minimal, user-inspectable, self-contained execution environments through dynamic program analysis techniques. r4r captures all runtime dependencies of a data analysis pipeline and produces a Docker image capable of reproducing the original execution. Although designed with first-class support for the R programming language, r4r also includes a generic fallback mechanism applicable to other languages. We evaluate r4r on a collection of R Markdown notebooks from Kaggle and find that it achieves exact reproducibility for 97.5% of deterministic notebooks.

BibTeX:
@inproceedings{donatbouillud_r4r_2025,
    title = {{R4R: Reproducibility for R}},
    author = {Donat-Bouillud, Pierre and Křikava, Filip and Krynski, Sebastian and Vitek, Jan},
    year = {2025},
    booktitle = {{Proceedings of the 3rd ACM Conference on Reproducibility and Replicability}},
    publisher = {Association for Computing Machinery},
    series = {{ACM REP '25}},
    location = {New York, NY, USA},
    doi = {10.1145/3736731.3746156},
    isbn = {9798400719585},
    pages = {132--142},
    url = {https://doi.org/10.1145/3736731.3746156},
}