Bootstrap confidence intervals represent a major advancement in
Relative Weights Analysis, addressing a long-standing methodological
limitation. This vignette provides comprehensive guidance on using
bootstrap methods with the rwa package for statistical
significance testing of predictor importance.
As noted by Tonidandel et al. (2009):
“The difficulty in determining the statistical significance of relative weights stems from the fact that the exact (or small sample) sampling distribution of relative weights is unknown.”
Traditional RWA provides point estimates of relative importance but lacks a framework for statistical inference. Bootstrap methods solve this by empirically estimating the sampling distribution of relative weights.
Bootstrap resampling: 1. Creates multiple samples
from your original data 2. Calculates RWA for each
bootstrap sample
3. Estimates confidence intervals from the distribution
of bootstrap results 4. Enables significance testing by
examining whether CIs include zero
# Bootstrap analysis with 1000 samples
result_bootstrap <- mtcars %>%
rwa(outcome = "mpg",
predictors = c("cyl", "disp", "hp", "gear"),
bootstrap = TRUE,
n_bootstrap = 1000,
conf_level = 0.95)
# View results with confidence intervals
result_bootstrap$result
#> Variables Raw.RelWeight Rescaled.RelWeight Sign Raw.RelWeight.CI.Lower
#> 1 hp 0.2321744 29.79691 - 0.17792555
#> 2 cyl 0.2284797 29.32274 - 0.18093071
#> 3 disp 0.2221469 28.50999 - 0.15732865
#> 4 gear 0.0963886 12.37037 + 0.04372106
#> Raw.RelWeight.CI.Upper Raw.Significant
#> 1 0.2796847 TRUE
#> 2 0.2828046 TRUE
#> 3 0.2750820 TRUE
#> 4 0.1749625 TRUEThe bootstrap analysis enhances the standard RWA output with:
# Bootstrap-specific information
cat("Bootstrap samples used:", result_bootstrap$bootstrap$n_bootstrap, "\n")
#> Bootstrap samples used: 1000
# Detailed CI information
print(result_bootstrap$bootstrap$ci_results$raw_weights)
#> # A tibble: 4 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 cyl 1 0.181 0.283 bca raw
#> 2 disp 2 0.157 0.275 bca raw
#> 3 hp 3 0.178 0.280 bca raw
#> 4 gear 4 0.0437 0.175 bca raw
# Identify significant predictors
significant_vars <- result_bootstrap$result %>%
filter(Raw.Significant == TRUE) %>%
pull(Variables)
cat("Significant predictors:", paste(significant_vars, collapse = ", "))
#> Significant predictors: hp, cyl, disp, gearFor detailed analysis including focal variable comparisons:
# Comprehensive bootstrap with focal variable comparison
result_comprehensive <- mtcars %>%
rwa(outcome = "mpg",
predictors = c("cyl", "disp", "hp", "gear", "wt"),
bootstrap = TRUE,
comprehensive = TRUE,
focal = "wt", # Compare other variables to weight
n_bootstrap = 500) # Fewer samples for speed
# Access all bootstrap results
names(result_comprehensive$bootstrap$ci_results)
#> [1] "raw_weights" "random_comparison" "focal_comparison"Key parameters for bootstrap analysis:
n_bootstrap: Number of bootstrap
samples (default: 1000)conf_level: Confidence level (default:
0.95)focal: Focal variable for comparative
analysiscomprehensive: Enable additional
bootstrap tests# Example with different parameters
custom_bootstrap <- mtcars %>%
rwa(outcome = "mpg",
predictors = c("cyl", "disp"),
bootstrap = TRUE,
n_bootstrap = 2000, # More samples for precision
conf_level = 0.99) # 99% confidence intervals
custom_bootstrap$result
#> Variables Raw.RelWeight Rescaled.RelWeight Sign Raw.RelWeight.CI.Lower
#> 1 cyl 0.3837012 50.51586 - 0.2944963
#> 2 disp 0.3758646 49.48414 - 0.2309624
#> Raw.RelWeight.CI.Upper Raw.Significant
#> 1 0.4553279 TRUE
#> 2 0.4596469 TRUERescaled weight confidence intervals should be interpreted with caution due to compositional data constraints. They are not recommended for formal statistical inference.
# Rescaled CIs (use with caution)
result_rescaled_ci <- mtcars %>%
rwa(outcome = "mpg",
predictors = c("cyl", "disp", "hp"),
bootstrap = TRUE,
include_rescaled_ci = TRUE,
n_bootstrap = 500)
# Note the warning message about interpretation
result_rescaled_ci$result
#> Variables Raw.RelWeight Rescaled.RelWeight Sign Raw.RelWeight.CI.Lower
#> 1 disp 0.2793550 36.37966 - 0.2143641
#> 2 cyl 0.2723144 35.46279 - 0.2103480
#> 3 hp 0.2162184 28.15755 - 0.1469883
#> Raw.RelWeight.CI.Upper Raw.Significant Rescaled.RelWeight.CI.Lower
#> 1 0.3521790 TRUE 29.66272
#> 2 0.3283791 TRUE 29.55396
#> 3 0.2650235 TRUE 20.72190
#> Rescaled.RelWeight.CI.Upper
#> 1 42.46211
#> 2 42.58727
#> 3 36.93194Rescaled weights are compositional data (they sum to 100%), which creates dependencies between variables. This violates assumptions needed for independent confidence intervals.
Recommendation: Focus on raw weight confidence intervals for statistical inference.
# Analyze diamond price drivers
diamonds_subset <- diamonds %>%
select(price, carat, depth, table, x, y, z) %>%
sample_n(1000) # Sample for faster computation
diamond_rwa <- diamonds_subset %>%
rwa(outcome = "price",
predictors = c("carat", "depth", "table", "x", "y", "z"),
bootstrap = TRUE,
applysigns = TRUE,
n_bootstrap = 500)
print(diamond_rwa$result)
#> Variables Raw.RelWeight Rescaled.RelWeight Sign Sign.Rescaled.RelWeight
#> 1 carat 0.275798498 32.3605816 + 32.3605816
#> 2 x 0.226692851 26.5988124 + 26.5988124
#> 3 z 0.215436341 25.2780395 + 25.2780395
#> 4 y 0.127891838 15.0060798 + 15.0060798
#> 5 depth 0.003592696 0.4215459 - -0.4215459
#> 6 table 0.002854589 0.3349408 - -0.3349408
#> Raw.RelWeight.CI.Lower Raw.RelWeight.CI.Upper Raw.Significant
#> 1 0.2426626893 0.316240918 TRUE
#> 2 0.2022522207 0.256183433 TRUE
#> 3 0.2011090617 0.234801281 TRUE
#> 4 0.0479587661 0.188165367 TRUE
#> 5 0.0009965771 0.004973382 TRUE
#> 6 0.0002526790 0.004088556 TRUE# Focus on significant predictors (results are already sorted by importance)
significant_drivers <- diamond_rwa$result %>%
filter(Raw.Significant == TRUE) %>%
select(Variables, Rescaled.RelWeight, Sign.Rescaled.RelWeight)
cat("Significant diamond price drivers (sorted by importance):\n")
#> Significant diamond price drivers (sorted by importance):
print(significant_drivers)
#> Variables Rescaled.RelWeight Sign.Rescaled.RelWeight
#> 1 carat 32.3605816 32.3605816
#> 2 x 26.5988124 26.5988124
#> 3 z 25.2780395 25.2780395
#> 4 y 15.0060798 15.0060798
#> 5 depth 0.4215459 -0.4215459
#> 6 table 0.3349408 -0.3349408
cat("\nModel R-squared:", round(diamond_rwa$rsquare, 3))
#>
#> Model R-squared: 0.852# Check your sample size
n_obs <- mtcars %>%
select(mpg, cyl, disp, hp, gear) %>%
na.omit() %>%
nrow()
cat("Sample size:", n_obs)
#> Sample size: 32
cat("\nRecommended bootstrap samples:", min(2000, n_obs * 10))
#>
#> Recommended bootstrap samples: 320
# Rule of thumb: At least 1000 bootstrap samples, more for smaller datasets# Examine CI characteristics
ci_data <- result_bootstrap$bootstrap$ci_results$raw_weights
print(head(ci_data))
#> # A tibble: 4 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 cyl 1 0.181 0.283 bca raw
#> 2 disp 2 0.157 0.275 bca raw
#> 3 hp 3 0.178 0.280 bca raw
#> 4 gear 4 0.0437 0.175 bca raw
# Assess precision
ci_analysis <- ci_data %>%
mutate(
significant = ci_lower > 0 | ci_upper < 0,
ci_width = ci_upper - ci_lower,
precision = case_when(
ci_width < 0.05 ~ "High precision",
ci_width < 0.15 ~ "Medium precision",
TRUE ~ "Low precision"
)
)
print(ci_analysis)
#> # A tibble: 4 × 9
#> variable weight_index ci_lower ci_upper ci_method ci_type significant ci_width
#> <chr> <int> <dbl> <dbl> <chr> <chr> <lgl> <dbl>
#> 1 cyl 1 0.181 0.283 bca raw TRUE 0.102
#> 2 disp 2 0.157 0.275 bca raw TRUE 0.118
#> 3 hp 3 0.178 0.280 bca raw TRUE 0.102
#> 4 gear 4 0.0437 0.175 bca raw TRUE 0.131
#> # ℹ 1 more variable: precision <chr>The package automatically selects the best available bootstrap CI method:
# For large datasets or many predictors, consider:
# 1. Reduce bootstrap samples for initial exploration
quick_result <- mtcars %>%
rwa(outcome = "mpg",
predictors = c("cyl", "disp"),
bootstrap = TRUE,
n_bootstrap = 500) # Faster
# 2. Use comprehensive analysis only when needed
# comprehensive = TRUE adds computational overhead
# 3. Consider parallel processing for very large analyses
# (not currently implemented but could be future enhancement)# Bootstrap objects can be large - access specific components
str(result_bootstrap$bootstrap, max.level = 1)
#> List of 6
#> $ boot_object :List of 11
#> ..- attr(*, "class")= chr "boot"
#> ..- attr(*, "boot_type")= chr "boot"
#> $ ci_results :List of 1
#> $ n_bootstrap : num 1000
#> $ conf_level : num 0.95
#> $ comprehensive: logi FALSE
#> $ focal : NULL
# For memory efficiency, extract only needed results
ci_summary <- result_bootstrap$bootstrap$ci_results$raw_weights %>%
select(variable, ci_lower, ci_upper, ci_method)
print(ci_summary)
#> # A tibble: 4 × 4
#> variable ci_lower ci_upper ci_method
#> <chr> <dbl> <dbl> <chr>
#> 1 cyl 0.181 0.283 bca
#> 2 disp 0.157 0.275 bca
#> 3 hp 0.178 0.280 bca
#> 4 gear 0.0437 0.175 bca# 1. Check for perfect multicollinearity
cor_check <- mtcars %>%
select(cyl, disp, hp, gear) %>%
cor()
# Look for correlations = 1.0 (excluding diagonal)
perfect_cor <- which(abs(cor_check) == 1 & cor_check != diag(diag(cor_check)), arr.ind = TRUE)
if(length(perfect_cor) > 0) {
cat("Perfect multicollinearity detected - remove redundant variables")
} else {
cat("No perfect multicollinearity detected")
}
#> No perfect multicollinearity detected
# 2. Ensure adequate sample size
min_sample_size <- 5 * length(c("cyl", "disp", "hp", "gear")) # 5 obs per predictor
actual_sample_size <- nrow(na.omit(mtcars[c("mpg", "cyl", "disp", "hp", "gear")]))
cat("\nMinimum recommended sample size:", min_sample_size)
#>
#> Minimum recommended sample size: 20
cat("\nActual sample size:", actual_sample_size)
#>
#> Actual sample size: 32When reporting bootstrap RWA results, include:
# Generate a summary report
report_data <- result_bootstrap$result %>%
filter(Raw.Significant == TRUE) %>%
arrange(desc(Rescaled.RelWeight)) %>%
select(Variables, Rescaled.RelWeight, Raw.RelWeight.CI.Lower, Raw.RelWeight.CI.Upper)
cat("Relative Weights Analysis Results\n")
#> Relative Weights Analysis Results
cat("=================================\n")
#> =================================
cat("Sample size:", result_bootstrap$n, "\n")
#> Sample size: 32
cat("Bootstrap samples:", result_bootstrap$bootstrap$n_bootstrap, "\n")
#> Bootstrap samples: 1000
cat("Model R-squared:", round(result_bootstrap$rsquare, 3), "\n\n")
#> Model R-squared: 0.779
cat("Significant Predictors:\n")
#> Significant Predictors:
print(report_data)
#> Variables Rescaled.RelWeight Raw.RelWeight.CI.Lower Raw.RelWeight.CI.Upper
#> 1 hp 29.79691 0.17792555 0.2796847
#> 2 cyl 29.32274 0.18093071 0.2828046
#> 3 disp 28.50999 0.15732865 0.2750820
#> 4 gear 12.37037 0.04372106 0.1749625Bootstrap Methods in RWA:
General Bootstrap Theory:
Compositional Data Analysis:
Bootstrap confidence intervals provide a robust solution for statistical inference in Relative Weights Analysis. By following the guidelines in this vignette, researchers can:
The bootstrap functionality in the rwa package
represents a significant advancement in making RWA a complete tool for
both exploratory analysis and confirmatory research.