Multibias makes it easy to simultaneously adjust for multiple biases in causal inference research. This document walks you through the key steps to performing the analysis.
Represent your observed data as a data_observed
object.
Here you include the dataframe, specify the key variables in the data,
and identify the bias impacting the data. All epidemiological biases can
be generalized into four main groups:
Multibias is capable of handling bias adjustment for most of the combinations of the above four biases.
For purposes of demonstration, multibias includes datasets with
different bias combinations. For a given bias or biases, it includes the
biased data (e.g., df_uc_sel
: missing a confounder and not
including the un-selected subjects) and the source used to derive the
biased data (e.g., df_uc_sel_source
: including data on the
missing confounder and subjects who were not selected).
Each dataset has variables defined similarly:
X
= binary exposureY
= binary outcomeC1-C3
= binary confoundersNext, you need to have some information that can be used to quantify the assumed bias or biases. There are two options here.
One option is to list the assumed parameters for the corresponding
bias equations. The necessary bias equations are provided in the
documentation of each adjust()
function. These equations
can be derived using the methods outlined in the article here. In our example, we
have bias parameters to predict the missing binary confounder
U
(as the vector u_coefs
) and study selection
indicator S
(as the vector s_coefs
).
A second option is to specify a validation data source as a
data_validation
object. In order to adjust for a given
bias, the validation data must have the corresonding missing data. In
the example here, the validation data has additional columns for the
missing binary confounder U
and indicator S
for whether a given individual was selected into the study.
Finally, we can run the multi-bias adjustment!
multibias_adjust(
data_observed = df_observed,
bias_params = bp
)
#> $estimate
#> [1] 2.001141
#>
#> $ci
#> [1] 1.948436 2.055271
or
multibias_adjust(
data_observed = df_observed,
data_validation = df_validation
)
#> $estimate
#> [1] 2.009484
#>
#> $ci
#> [1] 1.956566 2.063833
We observe that the bias-adjusted odds ratio of the effect of X on Y is approximately 2. This effect can be compared to the effect observed in the biased data to understand whether systematic error is bringing the observed effect closer to the null or further from the null.
Multibias performs bias adjustment via a combination of imputation and/or regression weighting. When imputation is involved, there will be some inherent randomness in the sampling. A single run, therefore, will not produce an exactly consistent result unless a seed is specified. Bootstrapping is thus recommended to quantify the random error. Computational performance can be improved via parallelization.
n <- nrow(df_uc_sel)
est <- vector()
nreps <- 100
for (i in 1:nreps) {
df_bootstrap <- df_uc_sel[sample(seq_len(n), n, replace = TRUE), ]
df_observed <- data_observed(
df_bootstrap,
bias = c("uc", "sel"),
exposure = "X",
outcome = "Y",
confounders = c("C1", "C2", "C3")
)
results <- multibias_adjust(
df_observed,
df_validation
)
est[i] <- results$estimate
}
# odds ratio estimate
round(median(est), 2)
#> [1] 1.99
# confidence interval
round(quantile(est, c(.025, .975)), 2)
#> 2.5% 97.5%
#> 1.94 2.04
When adjusting via bias parameters, the user can alternatively provide each bias parameter value as a probability distribution. The resulting confidence interval from bootstrapping can then quantify uncertainty in both the random error and the systematic error. Check out the vignette article “Multibias Validation” for a demonstration of this approach.