Using valaddin

Eugene Ha

2017-08-10

valaddin is a lightweight R package that enables you to transform an existing function into a function with input validation checks. It does so without requiring you to modify the body of the function, in contrast to doing input validation using stop or stopifnot, and is therefore suitable for both programmatic and interactive use.

This document illustrates the use of valaddin, by example. For usage details, see the main documentation page, ?firmly.

Use cases

The workhorse of valaddin is the function firmly, which applies input validation to a function, in situ. It can be used to:

Enforce types for arguments

For example, to require that all arguments of the function

f <- function(x, h) (sin(x + h) - sin(x)) / h

are numerical, apply firmly with the check formula ~is.numeric1:

ff <- firmly(f, ~is.numeric)

ff behaves just like f, but with a constraint on the type of its arguments:

ff(0.0, 0.1)
#> [1] 0.9983342

ff("0.0", 0.1)
#> Error: ff(x = "0.0", h = 0.1)
#> FALSE: is.numeric(x)

Enforce constraints on argument values

For example, use firmly to put a cap on potentially long-running computations:

fib <- function(n) {
  if (n <= 1L) return(1L)
  Recall(n - 1L) + Recall(n - 2L)
}

capped_fib <- firmly(fib, list("n capped at 30" ~ ceiling(n)) ~ {. <= 30L})

capped_fib(10)
#> [1] 89
capped_fib(50)
#> Error: capped_fib(n = 50)
#> n capped at 30

The role of each part of the value-constraining formula is evident:

Warn about pitfalls

If the default behavior of a function is problematic, or unexpected, you can use firmly to warn you. Consider the function as.POSIXct, which creates a date-time object:

Sys.setenv(TZ = "CET")
(d <- as.POSIXct("2017-01-01 09:30:00"))
#> [1] "2017-01-01 09:30:00 CET"

The problem is that d is a potentially ambiguous object (with hidden state), because it’s not assigned a time zone, explicitly. If you compute the local hour of d using as.POSIXlt, you get an answer that interprets d according to your current time zone; another user—or you, in another country, in the future—may get a different result.

```r
Sys.setenv(TZ = "EST")
d <- as.POSIXct("2017-01-01 09:30:00")
as.POSIXlt(d, tz = "EST")$hour
#> [1] 9
```

To warn yourself about this pitfall, you can modify as.POSIXct to complain when you’ve forgotten to specify a time zone:

as.POSIXct <- firmly(as.POSIXct, .warn_missing = "tz")

Now when you call as.POSIXct, you get a cautionary reminder:

as.POSIXct("2017-01-01 09:30:00")
#> Warning: Argument(s) expected but not specified in call as.POSIXct(x =
#> "2017-01-01 09:30:00"): `tz`
#> [1] "2017-01-01 09:30:00 CET"

as.POSIXct("2017-01-01 09:30:00", tz = "CET")
#> [1] "2017-01-01 09:30:00 CET"

NB: The missing-argument warning is implemented by wrapping functions. The underlying function base::as.POSIXct is called unmodified.

Use loosely to access the original function

Though reassigning as.POSIXct may seem risky, it is not, for the behavior is unchanged (aside from the extra precaution), and the original as.POSIXct remains accessible:

  • With a namespace prefix: base::as.POSIXct
  • By applying loosely to strip input validation: loosely(as.POSIXct)
loosely(as.POSIXct)("2017-01-01 09:30:00")
#> [1] "2017-01-01 09:30:00 CET"

identical(loosely(as.POSIXct), base::as.POSIXct)
#> [1] TRUE

Decline handouts

R tries to help you express your ideas as concisely as possible. Suppose you want to truncate negative values of a vector w:

w <- {set.seed(1); rnorm(5)}

ifelse(w > 0, w, 0)
#> [1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078

ifelse assumes (correctly) that you intend the 0 to be repeated 5 times, and does that for you, automatically.

Nonetheless, R’s good intentions have a darker side:

z <- rep(1, 6)
pos <- 1:5
neg <- -6:-1

ifelse(z > 0, pos, neg)
#> [1] 1 2 3 4 5 1

This smells like a coding error. Instead of complaining that pos is too short, ifelse recycles it to line it up with z. The result is probably not what you wanted.

In this case, you don’t need a helping hand, but rather a firm one:

chk_length_type <- list(
  "'yes', 'no' differ in length" ~ length(yes) == length(no),
  "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
) ~ isTRUE

ifelse_f <- firmly(ifelse, chk_length_type)

ifelse_f is more pedantic than ifelse. But it also spares you the consequences of invalid inputs:

ifelse_f(w > 0, w, 0)
#> Error: ifelse_f(test = w > 0, yes = w, no = 0)
#> 'yes', 'no' differ in length
ifelse_f(w > 0, w, rep(0, length(w)))
#> [1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078

ifelse(z > 0, pos, neg)
#> [1] 1 2 3 4 5 1
ifelse_f(z > 0, pos, neg)
#> Error: ifelse_f(test = z > 0, yes = pos, no = neg)
#> 'yes', 'no' differ in length

ifelse(z > 0, as.character(pos), neg)
#> [1] "1" "2" "3" "4" "5" "1"
ifelse_f(z > 0, as.character(pos), neg)
#> Error: ifelse_f(test = z > 0, yes = as.character(pos), no = neg)
#> 1) 'yes', 'no' differ in length
#> 2) 'yes', 'no' differ in type

Reduce the risks of a lazy evaluation-style

When R make a function call, say, f(a), the value of the argument a is not materialized in the body of f until it is actually needed. Usually, you can safely ignore this as a technicality of R’s evaluation model; but in some situations, it can be problematic if you’re not mindful of it.

Consider a bank that waives fees for students. A function to make deposits might look like this2:

deposit <- function(account, value) {
  if (is_student(account)) {
    account$fees <- 0
  }
  account$balance <- account$balance + value
  account
}

is_student <- function(account) {
  if (isTRUE(account$is_student)) TRUE else FALSE
}

Suppose Bob is an account holder, currently not in school:

bobs_acct <- list(balance = 10, fees = 3, is_student = FALSE)

If Bob were to deposit an amount to cover an future fee payment, his account balance would be updated to:

deposit(bobs_acct, bobs_acct$fees)$balance
#> [1] 13

Bob goes back to school and informs the bank, so that his fees will be waived:

bobs_acct$is_student <- TRUE

But now suppose that, somewhere in the bowels of the bank’s software, the type of Bob’s account object is converted from a list to an environment:

bobs_acct <- list2env(bobs_acct)

If Bob were to deposit an amount to cover an future fee payment, his account balance would now be updated to:

deposit(bobs_acct, bobs_acct$fees)$balance
#> [1] 10

Becoming a student has cost Bob money. What happened to the amount deposited?

The culprit is lazy evaluation and the modify-in-place semantics of environments. In the call deposit(account = bobs_acct, value = bobs_acct$fee), the value of the argument value is only set when it’s used, which comes after the object fee in the environment bobs_acct has already been zeroed out.

To minimize such risks, forbid account from being an environment:

err_msg <- "`acccount` should not be an environment"
deposit <- firmly(deposit, list(err_msg ~ account) ~ Negate(is.environment))

This makes Bob a happy customer, and reduces the bank’s liability:

bobs_acct <- list2env(list(balance = 10, fees = 3, is_student = TRUE))

deposit(bobs_acct, bobs_acct$fees)$balance
#> Error: deposit(account = bobs_acct, value = bobs_acct$fees)
#> `acccount` should not be an environment

deposit(as.list(bobs_acct), bobs_acct$fees)$balance
#> [1] 13

Prevent self-inflicted wounds

You don’t mean to shoot yourself, but sometimes it happens, nonetheless:

x <- "An expensive object"
save(x, file = "my-precious.rda")

x <- "Oops! A bug or lapse has tarnished your expensive object"

# Many computations later, you again save x, oblivious to the accident ...
save(x, file = "my-precious.rda")

firmly can safeguard you from such mishaps: implement a safety procedure

# Argument `gear` is a list with components:
# fun: Function name
# ns : Namespace of `fun`
# chk: Formula that specify input checks

hardhat <- function(gear, env = .GlobalEnv) {
  for (. in gear) {
    safe_fun <- firmly(getFromNamespace(.$fun, .$ns), .$chk)
    assign(.$fun, safe_fun, envir = env)
  }
}

gather your safety gear

protection <- list(
  list(
    fun = "save",
    ns  = "base",
    chk = list("Won't overwrite `file`" ~ file) ~ Negate(file.exists)
  ),
  list(
    fun = "load",
    ns  = "base",
    chk = list("Won't load objects into current environment" ~ envir) ~
      {!identical(., parent.frame(2))}
  )
)

then put it on

hardhat(protection)

Now save and load engage safety features that prevent you from inadvertently destroying your data:

x <- "An expensive object"
save(x, file = "my-precious.rda")

x <- "Oops! A bug or lapse has tarnished your expensive object"
#> Error: save(x, file = "my-precious.rda")
#> Won't overwrite `file`

save(x, file = "my-precious.rda")

# Inspecting x, you notice it's changed, so you try to retrieve the original ...
x
#> [1] "Oops! A bug or lapse has tarnished your expensive object"
load("my-precious.rda")
#> Error: load(file = "my-precious.rda")
#> Won't load objects into current environment

# Keep calm and carry on
loosely(load)("my-precious.rda")

x
#> [1] "An expensive object"

NB: Input validation is implemented by wrapping functions; thus, if the arguments are valid, the underlying functions base::save, base::load are called unmodified.

Toolbox of input checkers

valaddin provides a collection of over 50 pre-made input checkers to facilitate typical kinds of argument checks. These checkers are prefixed by vld_, for convenient browsing and look-up in editors and IDE’s that support name completion.

For example, to create a type-checked version of the function upper.tri, which returns an upper-triangular logical matrix, apply the checkers vld_matrix, vld_boolean (here “boolean” is shorthand for “logical vector of length 1”):

upper_tri <- firmly(upper.tri, vld_matrix(~x), vld_boolean(~diag))

# upper.tri assumes you mean a vector to be a column matrix
upper.tri(1:2)
#>       [,1]
#> [1,] FALSE
#> [2,] FALSE

upper_tri(1:2)
#> Error: upper_tri(x = 1:2, diag = FALSE)
#> Not matrix: x

# But say you actually meant (1, 2) to be a diagonal matrix
upper_tri(diag(1:2))
#>       [,1]  [,2]
#> [1,] FALSE  TRUE
#> [2,] FALSE FALSE

upper_tri(diag(1:2), diag = "true")
#> Error: upper_tri(x = diag(1:2), diag = "true")
#> Not boolean: diag

upper_tri(diag(1:2), TRUE)
#>       [,1] [,2]
#> [1,]  TRUE TRUE
#> [2,] FALSE TRUE

Check anything with vld_true

Any input validation can be expressed as an assertion that “such and such must be true”; to apply it as such, use vld_true (or its complement, vld_false).

For example, the above hardening of ifelse can be redone as:

chk_length_type <- vld_true(
  "'yes', 'no' differ in length" ~ length(yes) == length(no),
  "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
)
ifelse_f <- firmly(ifelse, chk_length_type)

z <- rep(1, 6)
pos <- 1:5
neg <- -6:-1

ifelse_f(z > 0, as.character(pos), neg)
#> Error: ifelse_f(test = z > 0, yes = as.character(pos), no = neg)
#> 1) 'yes', 'no' differ in length
#> 2) 'yes', 'no' differ in type
ifelse_f(z > 0, c(pos, 6), neg)
#> Error: ifelse_f(test = z > 0, yes = c(pos, 6), no = neg)
#> 'yes', 'no' differ in type
ifelse_f(z > 0, c(pos, 6L), neg)
#> [1] 1 2 3 4 5 6

Make your own input checker with localize

A check formula such as ~ is.numeric (or "Not number" ~ is.numeric, if you want a custom error message) imposes its condition “globally”:

difference <- firmly(function(x, y) x - y, ~ is.numeric)

difference(3, 1)
#> [1] 2
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
#> Error: difference(x = as.POSIXct("2017-01-01", "UTC"), y = as.POSIXct("2016-01-01", "UTC"))
#> 1) FALSE: is.numeric(x)
#> 2) FALSE: is.numeric(y)

With localize, you can concentrate a globally applied check formula to specific expressions. The result is a reusable custom checker:

chk_numeric <- localize("Not numeric" ~ is.numeric)
secant <- firmly(function(f, x, h) (f(x + h) - f(x)) / h, chk_numeric(~x, ~h))

secant(sin, 0, .1)
#> [1] 0.9983342
secant(sin, "0", .1)
#> Error: secant(f = sin, x = "0", h = 0.1)
#> Not numeric: x

(In fact, chk_numeric is equivalent to the pre-built checker vld_numeric.)

Conversely, apply globalize to impose your localized checker globally:

difference <- firmly(function(x, y) x - y, globalize(chk_numeric))

difference(3, 1)
#> [1] 2
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
#> Error: difference(x = as.POSIXct("2017-01-01", "UTC"), y = as.POSIXct("2016-01-01", "UTC"))
#> 1) Not numeric: `x`
#> 2) Not numeric: `y`

  1. The inspiration to use ~ as a quoting operator came from the vignette Non-standard evaluation, by Hadley Wickham.↩︎

  2. Adapted from an example in Section 6.3 of Chambers, Extending R, CRC Press, 2016. For the sake of the example, ignore the fact that logic to handle fees does not belong in a function for deposits!↩︎