Reproducibility and Traceability

A note about how bbr.bayes enables reproducible and traceable research.

bbr
model management

Reproducibility

We consider modeling results to be reproducible if (when running a model repeatedly with the same data, software, and hardware) we obtain the same results. Modeling using MCMC methods is inherently stochastic and precise reproducibility is difficult. However, the Stan developers have built a system that enables reproducible research under certain conditions Stan reproducibility. bbr.bayes facilitates specifying the model in a reproducible way.

Aspects beyond the model specification (e.g., R package versions and hardware environment) are also important to control and track for reproducibility purposes, but these are considered outside the scope of bbr.bayes. There are many possible approaches to this. At MetrumRG, we use our Metworx platform as our validated and stable high-performance computing environment and Metrum Package Network (MPN) / pkgr Create and manage curated, reproducible R package environments. for managing R packages.

bbr.bayes does the following to promote the generation of reproducible posterior samples:

  1. Requires the random seed be specified in order to run a model.
  2. Requires that the model definition, cmdstanr method arguments, data preparation code, and initial values specification be tracked in dedicated files on disk.
  3. Provides special handling to ensure reproducibility when the user specifies a function value to generate the initial values because passing this as is to CmdStanModel$sample() can compromise reproducibility.
  4. Records the hashes of these inputs. (The check_up_to_date helper can be used to determine if any of these files have changed since the last run of the model.)

Traceability

We consider a model to be ‘traceable’ if the provenance of a model can be tracked back to its source. bbr and bbr.bayes allow (and encourage) basing one model on another by using the copy_model_from() function. In addition to copying the model files, this function updates the models yaml file to include a based_on field which tracks the model copied to create the current model.

The bbr and bbr.bayes packages enable traceable modeling by:

  1. Making it possible to define one model as the child of another (copy_model_from).
  2. Providing various helpers (e.g., get_based_on, get_model_ancestry, and run_log) to inspect the modeling history and development.
  3. Recording the inputs that produced the outputs by storing the hashes of inputs as mentioned above.

Key principle: write the essential elements for reproducing and tracking models to disk

To facilitate reproducible and traceable modeling, bbr.bayes writes to disk all of the model elements which are required to reproduce a model run and track its provenance.

More information about the essential elements of a bbr.bayes model object can be found in the documentation on the bbr.bayes Getting Started vignette.