Introduction to multibias

Multibias makes it easy to simultaneously adjust for multiple biases in causal inference research. This document walks you through the key steps to performing the analysis.

1. Create data_observed

Represent your observed data as a data_observed object. Here you include the dataframe, specify the key variables in the data, and identify the bias impacting the data. All epidemiological biases can be generalized into four main groups:

Multibias is capable of handling bias adjustment for most of the combinations of the above four biases.

For purposes of demonstration, multibias includes datasets with different bias combinations. For a given bias or biases, it includes the biased data (e.g., df_uc_sel: missing a confounder and not including the un-selected subjects) and the source used to derive the biased data (e.g., df_uc_sel_source: including data on the missing confounder and subjects who were not selected).

Each dataset has variables defined similarly:

df_observed <- data_observed(
  df_uc_sel,
  bias = c("uc", "sel"),
  exposure = "X",
  outcome = "Y",
  confounders = c("C1", "C2", "C3")
)

2. Create source for bias adjustment

Next, you need to have some information that can be used to quantify the assumed bias or biases. There are two options here.

Option 1: Bias parameters

One option is to list the assumed parameters for the corresponding bias equations. The necessary bias equations are provided in the documentation of each adjust() function. These equations can be derived using the methods outlined in the article here. In our example, we have bias parameters to predict the missing binary confounder U (as the vector u_coefs) and study selection indicator S (as the vector s_coefs).

bp <- bias_params(
  coef_list = list(
    u = c(-0.19, 0.61, 0.72, -0.09, 0.10, -0.15),
    s = c(-0.01, 0.92, 0.94)
  )
)

Option 2: Validation data

A second option is to specify a validation data source as a data_validation object. In order to adjust for a given bias, the validation data must have the corresonding missing data. In the example here, the validation data has additional columns for the missing binary confounder U and indicator S for whether a given individual was selected into the study.

df_validation <- data_validation(
  df_uc_sel_source,
  true_exposure = "X",
  true_outcome = "Y",
  confounders = c("C1", "C2", "C3", "U"),
  selection = "S"
)

3. Run the bias adjustment

Finally, we can run the multi-bias adjustment!

Single run

multibias_adjust(
  data_observed = df_observed,
  bias_params = bp
)
#> $estimate
#> [1] 2.001141
#> 
#> $ci
#> [1] 1.948436 2.055271

or

multibias_adjust(
  data_observed = df_observed,
  data_validation = df_validation
)
#> $estimate
#> [1] 2.009484
#> 
#> $ci
#> [1] 1.956566 2.063833

We observe that the bias-adjusted odds ratio of the effect of X on Y is approximately 2. This effect can be compared to the effect observed in the biased data to understand whether systematic error is bringing the observed effect closer to the null or further from the null.

Bootstrapping

Multibias performs bias adjustment via a combination of imputation and/or regression weighting. When imputation is involved, there will be some inherent randomness in the sampling. A single run, therefore, will not produce an exactly consistent result unless a seed is specified. Bootstrapping is thus recommended to quantify the random error. Computational performance can be improved via parallelization.

n <- nrow(df_uc_sel)
est <- vector()
nreps <- 100

for (i in 1:nreps) {
  df_bootstrap <- df_uc_sel[sample(seq_len(n), n, replace = TRUE), ]
  df_observed <- data_observed(
    df_bootstrap,
    bias = c("uc", "sel"),
    exposure = "X",
    outcome = "Y",
    confounders = c("C1", "C2", "C3")
  )
  results <- multibias_adjust(
    df_observed,
    df_validation
  )
  est[i] <- results$estimate
}

# odds ratio estimate
round(median(est), 2)
#> [1] 1.99

# confidence interval
round(quantile(est, c(.025, .975)), 2)
#>  2.5% 97.5% 
#>  1.94  2.04

When adjusting via bias parameters, the user can alternatively provide each bias parameter value as a probability distribution. The resulting confidence interval from bootstrapping can then quantify uncertainty in both the random error and the systematic error. Check out the vignette article “Multibias Validation” for a demonstration of this approach.