Introduction to yspec

An overview of yspec and the creation of data specification objects.

yspec

1 Introduction

The yspec package is designed to help you document your analysis data sets and then use this documentation throughout your project to:

  • Guide and document the data assembly process
  • Efficiently annotate figures and tables
  • Manage decode values for numerically encoded discrete data items
  • Make submission-ready define documents

yspec assists in managing these activities and more by acting as a single, central location for maintaining the metadata for your analysis data sets.

2 Basic idea

The yspec workflow starts with writing data set metadata in a standalone file in YAML format. This file contains a list of the columns in a derived data set, with relevant attributes about each column. YAML files are advantageous for data specfication because they are easily written and read, and they can be used programatically in R.

The yspec package reads the data specification file contents and creates a representation of that information as an R object (referred to as the data specification object). This list-like object is structured so its contents can be easily extracted.

For example, a spec object can be loaded in with the ys_load() function.

spec <- yspec::ys_load(here("data/derived/pk.yml"))
head(spec)
   name info  unit                 short source
1     C  cd-     .        Commented rows lookup
2   NUM  ---     .            Row number lookup
3    ID  ---     .      NONMEM ID number lookup
4  TIME  ---  hour Time after first dose lookup
5   SEQ  -d-     .             Data type lookup
6   CMT  -d-     .           Compartment lookup
7  EVID  -d-     .      Event identifier lookup
8   AMT  ---    mg           Dose amount lookup
9    DV  ---     .    Dependent variable lookup
10  AGE  --- years                   Age lookup

The attributes for the AGE column (a continuous variable) can be obtained as follows.

spec$AGE
 name  value  
 col   AGE    
 type  numeric
 short Age    
 unit  years  
 range .      

The attributes that appear such as type, short and unit are all examples of the type of information included in the data specification YAML file.

The attributes for the CP (Child-Pugh score; a discrete variable) can similarly be obtained.

spec$CP
 name  value           
 col   CP              
 type  numeric         
 short Child-Pugh score
 value 0 : normal      
       1 : score=1     
       2 : score=2     
       3 : score=3     

3 Create define.pdf

Using this data specification object, we can create a define.pdf file with a simple command.

yspec::ys_document(spec, type = "regulatory")

The resulting define.pdf document is suitable for regulatory submission.

4 Other resources

  • Definitons: a detailed description of the data specification used for this project
  • yspec: creating and customizing a spec object
  • YAML: information on YAML files