Bootstrap generate and collect

Create bootstrapped data sets to obtain parameter estimates

bbr

1 Introduction


During the model evaluation phase of this Expo we run a non-parametric bootstrap procedure to quantify uncertainty of the parameter estimates from a model. This page walks through

  1. Resampling the data to create bootstrapped data sets
  2. Creating & submitting models using bbr that will re-estimate the model on the bootstrap data
  3. Reviewing & documenting raw bootstrap parameter estimates

2 Tools used

2.1 MetrumRG packages

bbr Manage, track, and report modeling activities, through simple R objects, with a user-friendly interface between R and NONMEM®.

mrgmisc Format, manipulate and summarize data in the field of pharmacometrics.

2.2 CRAN packages

dplyr A grammar of data manipulation.

3 Outline


We’re provided with a control file for a model we derived on Model Management, 106.ctl, and the data set on which it was originally estiamted.

We’ll resample the original estimation data set 1000 times and write the data sets out. At the same time, we’ll use a control stream template derived from 106.ctl to create 1000 control streams, one for each bootstrap data set.

Once the bootstrap models are estimated, we will collect the bootstrap estimates along with some run diagnostics to assess how the runs finished.

Our final step will be to save the estimates and relevant information from each of the bootstrap model runs.

4 Set up


Before we begin, we’ll take a look at the required packages, set up model directories and create a control file template.

4.1 Required packages

library(data.table)
library(bbr)
library(tidyverse)
library(glue)
library(mrgmisc)
library(whisker)
library(here)

4.2 Model directories

We’ll specify our pk model directory pk_model_dir, where the control files, yaml files and outputs are located for each model.

We can specify a bootstrap directory path boot_dir within the pk model directory.

pk_model_dir <- here("model/pk")
boot_dir <- here(pk_model_dir,"boot")
boot_data_dir <- here(boot_dir, "data")

As we make the boot directory using the path we created above, we can check to make sure it does not already exist before creating it.

if(!dir.exists(boot_dir)) dir.create(boot_dir)
if(!dir.exists(boot_data_dir)) dir.create(boot_data_dir)

4.3 Control file template

As we will be creating models with bootstrap data sets, we’ll need a control file.

template_ctl <- readLines(
  file.path(pk_model_dir, "boot-106-template.ctl")
)

The boot-106-template.ctl is located in the pk model directory and is read into template_ctl.

The control file was derived from 106.ctl which was created on Model Management.

A few important modifications were made to derive boot-106-template.ctl. To start, the $DATA section was changed to

$DATA ../data/{{run_num}}.csv

We set it to this as we will create many bootstrap data sets. The run_num variable will be replaced with a unique number for each one.

The other modification is optional, but suggested. We add OID at the end of the $INPUT section to track the original subject ID number.

Finally, we removed the $COVARIANCE and $TABLE blocks since they are not needed to collect the bootstrap estimates.

5 Create bootstrap runs

For each bootstrap run, we need to sample the original data set with replacement by ID and also generate a modified control file for re-estimating the model on that data.

For demonstration purposes we will be creating 1000 bootstrap data sets. In a real project the number of bootstrap runs could be larger.

RUN <- seq(1000)

We create bbr models for each of the 1000 bootstrap runs and store them all in the models list. Every model will be created from the code shown below as the RUN variable is a integer sequence from 1 to 1000.

set.seed(12345)

models <- map(
  RUN, 
  data = nmdata, 
  .f = make_boot_run, 
  template = template_ctl, 
  boot_dir = boot_dir, 
  strat_cols = c("RF", "CP"),
  overwrite = TRUE
)

In order to create the models list, we provide the:

  • full data set as nmdata
  • template control file as template_ctl
  • boot directory as boot_dir (created in the setup section of this page)
  • variables for stratification (for this example, we are stratifying on renal function and Child-Pugh score)

The make_boot_run function collects this information in its arguments

make_boot_run <- function(n, data, template, boot_dir, strat_cols = NULL, 
                          overwrite = FALSE,  seed = 21181)

and generates the specified number of bootstrap runs with it.

5.1 make_boot_run function

The function:

  • creates a specified number of bootstrap data sets using stratified resampling
  • writes out each bootstrap data set to a csv file on disk
  • creates control files from the template to match each bootstrap data set

When we call the function, it first creates a model path for a singular bootstrap run. mod_name will update for each bootstrap run.

mod_path <- glue("{boot_dir}/{mod_name}")

The function then resamples the data to create a bootstrap data set data_new, while implementing the specified stratified columns.

data_new <- resample_df(
  data,
  key_cols = "ID",
  strat_cols = strat_cols
)

Now that we have the sampled data, a few modifications need to be made before writing it to a csv file.

data_new <- rename(data_new, OID = ID, ID = KEY)
data_new <- select(data_new, unique(c("C", names(data), "OID")))

csv_file <- glue("{boot_dir}/data/{mod_name}.csv")
fwrite(data_new, csv_file , na = '.', quote = FALSE)

Each bootstrap data set is saved to a unique model path inside the boot directory. A modified control file (which points to the bootstrap data set) is saved to was model path as well.

new_ctl <- whisker.render(template, list(run_num = mod_name))
write_file(new_ctl, file = paste0(mod_path, ".ctl"))

The function then creates and returns a new model object with new_model which points to the model path and provides a description of the model with the unique bootstrap ID.

mod <- new_model(
  mod_path,
  .description = glue("bootstrap {mod_name}"), 
  .overwrite = overwrite
)

5.2 Submit models

After running our data through the make_run_boot function, we have 1000 resampled data sets and corresponding control streams Each pair has a unique model path, which allows us to leverage bbr to submit the models.

out <- submit_models(
  models,
  .config_path = here("model/pk/bbi.yaml"),
  .bbi_args = list(overwrite = TRUE, threads = 1)
)

With the models submitted, we can now focus on examining the parameter estimates.

6 Parameter estimates

With the models completed, we now have parameter estimates for each bootstrap run. We want to organize these into an easily digestible and interactive format. Since bootstrap runs can involve hundreds to thousands of runs, a tibble is an ideal format for the estimates.

Fortunately, using the param_estimates_batch function from bbr makes doing this easy and efficient.

boot <- param_estimates_batch(boot_dir)

The function parses each of the model outputs to pull together the estimates. We can check that each model’s output was obtained by checking if bbr reported any error messages.

These error messages would indicate it failed to parse a model’s outputs.

err <- filter(boot, !is.na(error_msg))
nrow(err)
[1] 0

Using the filter above, we can output the model numbers where error messages were reported. In this case, the output is 0 meaning that we were able to capture all model outputs. If there were errors, this process makes it easier to identify which models to manually investigate.

6.1 Compare bootstrap to original

With the parameter estimates in tibble format, another bbr function (param_estimates_compare) makes it easy to compare the new estimates to the original model estimates.

We indicate the original model ID with mod_id and use the read_model function to parse the estimates from it.

mod_id <- "106"

orig_mod <- read_model(here("model", "pk", mod_id))

param_estimates_compare(boot, orig_mod)
# A tibble: 15 × 5
   parameter_names original_estimate   `50%`  `2.5%` `97.5%`
   <chr>                       <dbl>   <dbl>   <dbl>   <dbl>
 1 THETA1                     0.443   0.449   0.328   0.578 
 2 THETA2                     4.12    4.12    4.07    4.18  
 3 THETA3                     1.17    1.17    1.12    1.23  
 4 THETA4                     4.21    4.21    4.17    4.25  
 5 THETA5                     1.28    1.28    1.22    1.35  
 6 THETA6                     0.485   0.484   0.408   0.558 
 7 THETA7                    -0.0378 -0.0386 -0.167   0.0876
 8 THETA8                     0.419   0.420   0.294   0.587 
 9 OMEGA(1,1)                 0.219   0.218   0.130   0.331 
10 OMEGA(2,1)                 0.0668  0.0656  0.0328  0.108 
11 OMEGA(2,2)                 0.0824  0.0821  0.0643  0.101 
12 OMEGA(3,1)                 0.121   0.121   0.0805  0.173 
13 OMEGA(3,2)                 0.0704  0.0696  0.0525  0.0882
14 OMEGA(3,3)                 0.114   0.112   0.0896  0.140 
15 SIGMA(1,1)                 0.0399  0.0400  0.0376  0.0424

The second column of the output (original_estimate) shows the parameter estimates from the original model. The remaining columns provide the bootstrap confidence intervals for the parameter estimates.

6.2 Write estimates to file

After the comparison, we can save the bootstrap parameter estimates by writing them to a csv file. The name of the file indicates the values were generated by the bootstrap (boot-) and the original model name (in this case 106).

boot %>%
  select(run, starts_with(c("THETA", "OMEGA", "SIGMA"))) %>%
  write_csv(here("data", "boot", glue("boot-{mod_id}.csv")))

This file can be found at the following path (data/boot/boot-106.csv).

7 Extract NONMEM® heuristics

Now, we can run a full summary for each model to better understand if the models had any problems in estimation. We use the summary_log function from bbr and run it on the entire boot directory (boot_dir).

The function extracts the full summary for each model within the directory.

smry <- summary_log(boot_dir)
heu <- smry %>% 
  filter(any_heuristics) %>% 
  select(run, where(is.logical), -needed_fail_flags)

We can also examine how each heuristic was obtained.

heu %>%
  select(-run, -any_heuristics) %>%
  summarise_all(sum, na.rm = TRUE) %>% 
  pivot_longer(everything())

After examining these, we can save them to a csv file.

write_csv(heu, here("data", "boot", glue("boot-{mod_id}-heuristics.csv")))

Saving model heuristics could be useful for including in a parameter table footer if any runs are filtered out or we need to catalog runs that didn’t minimize.

7.1 Save termination codes

It’s also valuable to save the termination codes for the bootstrap models. Saving these to a csv file will make it easier to review them in the future.

# Get termination codes and save
term <- select(boot, run, termination_code)

count(term, termination_code)

write_csv(term, here("data", "boot", glue("boot-{mod_id}-term.csv")))

8 Other Resources

The following scripts from the Github repository are discussed on this page. If you’re interested running this code, visit the About the Github Repo page first.

Bootstrap generate script: boot-generate.R
Bootstrap collect script: boot-collect.R