Reproducibility
We consider modeling results to be reproducible if (when running a model repeatedly with the same data, software, and hardware) we obtain the same results. Modeling using MCMC methods is inherently stochastic and precise reproducibility is difficult. However, the Stan developers have built a system that enables reproducible research under certain conditions Stan reproducibility. bbr.bayes
facilitates specifying the model in a reproducible way.
Aspects beyond the model specification (e.g., R package versions and hardware environment) are also important to control and track for reproducibility purposes, but these are considered outside the scope of bbr.bayes
. There are many possible approaches to this. At MetrumRG, we use our Metworx platform as our validated and stable high-performance computing environment and Metrum Package Network (MPN) / pkgr Create and manage curated, reproducible R package environments. for managing R packages.
bbr.bayes
does the following to promote the generation of reproducible posterior samples:
- Requires the random seed be specified in order to run a model.
- Requires that the model definition,
cmdstanr
method arguments, data preparation code, and initial values specification be tracked in dedicated files on disk. - Provides special handling to ensure reproducibility when the user specifies a function value to generate the initial values because passing this as is to
CmdStanModel$sample()
can compromise reproducibility. - Records the hashes of these inputs. (The
check_up_to_date
helper can be used to determine if any of these files have changed since the last run of the model.)
Traceability
We consider a model to be ‘traceable’ if the provenance of a model can be tracked back to its source. bbr
and bbr.bayes
allow (and encourage) basing one model on another by using the copy_model_from()
function. In addition to copying the model files, this function updates the models yaml
file to include a based_on
field which tracks the model copied to create the current model.
The bbr
and bbr.bayes
packages enable traceable modeling by:
- Making it possible to define one model as the child of another (
copy_model_from
). - Providing various helpers (e.g.,
get_based_on
,get_model_ancestry
, andrun_log
) to inspect the modeling history and development. - Recording the inputs that produced the outputs by storing the hashes of inputs as mentioned above.
Key principle: write the essential elements for reproducing and tracking models to disk
To facilitate reproducible and traceable modeling, bbr.bayes
writes to disk all of the model elements which are required to reproduce a model run and track its provenance.
More information about the essential elements of a bbr.bayes
model object can be found in the documentation on the bbr.bayes Getting Started vignette.