Assume you trained a dimensionality-reduction model (PCA, PLS …) on p variables but, at prediction time,
You still want the latent scores in the same component space, so downstream models, dashboards, alarms, … keep running.
That’s exactly what
partial_project(model, new_data_subset, colind = which.columns)
does:
new_data_subset (n × q) ─► project into latent space (n
× k)
with q ≤ p. If the loading vectors are orthonormal this is a simple dot product; otherwise a ridge-regularised least-squares solve is used.
set.seed(1)
n <- 100
p <- 8
X <- matrix(rnorm(n * p), n, p)
# Fit a centred 3-component PCA (via SVD)
# Manually center the data and create fitted preprocessor
Xc <- scale(X, center = TRUE, scale = FALSE)
svd_res <- svd(Xc, nu = 0, nv = 3)
# Create a fitted centering preprocessor
preproc_fitted <- fit(center(), X)
pca <- bi_projector(
v = svd_res$v,
s = Xc %*% svd_res$v,
sdev = svd_res$d[1:3] / sqrt(n-1), # Correct scaling for sdev
preproc = preproc_fitted
)Suppose columns 7 and 8 are unavailable for a new batch.
X_miss <- X[, 1:6] # keep only first 6 columns
col_subset <- 1:6 # their positions in the **original** X
scores_part <- partial_project(pca, X_miss, colind = col_subset)
# How close are the results?
plot_df <- tibble(
full = scores_full[,1],
part = scores_part[,1]
)
ggplot(plot_df, aes(full, part)) +
geom_point() +
geom_abline(col = "red") +
coord_equal() +
labs(title = "Component 1: full vs. partial projection") +
theme_minimal()Even with two variables missing, the ridge LS step recovers latent scores that lie almost on the 1:1 line.
If you expect many rows with the same subset of features, create a specialised projector once and reuse it:
# Assuming partial_projector is available
pca_1to6 <- partial_projector(pca, 1:6) # keeps a reference + cache
# project 1000 new observations that only have the first 6 vars
new_batch <- matrix(rnorm(1000 * 6), 1000, 6)
scores_fast <- project(pca_1to6, new_batch)
dim(scores_fast) # 1000 × 3
#> [1] 1000 3Internally, partial_projector() stores the mapping
v[1:6, ] and a pre-computed inverse, so calls to
project() are as cheap as a matrix multiplication.
For multiblock fits (created with
multiblock_projector()), project_block()
provides a convenient wrapper around partial_project():
# Create a multiblock projector from our PCA
# Suppose columns 1-4 are "Block A" (block 1) and columns 5-8 are "Block B" (block 2)
block_indices <- list(1:4, 5:8)
mb <- multiblock_projector(
v = pca$v,
preproc = pca$preproc,
block_indices = block_indices
)
# Now we can project using only Block 2's data (columns 5-8)
X_block2 <- X[, 5:8]
scores_block2 <- project_block(mb, X_block2, block = 2)
# Compare to full projection
head(round(cbind(full = scores_full[,1], block2 = scores_block2[,1]), 2))
#> full block2
#> [1,] -0.02 -0.30
#> [2,] -2.77 -3.28
#> [3,] 0.57 -0.09
#> [4,] -0.36 0.03
#> [5,] 0.91 1.24
#> [6,] 1.81 1.85This is equivalent to calling
partial_project(mb, X_block2, colind = 5:8) but reads more
naturally when working with block structures.
Partial projection is handy even when all measurements exist:
project_block().Assume columns 1–5 (instead of 50 for brevity) of X form
our ROI.
roi_cols <- 1:5 # pretend these are the ROI voxels
X_roi <- X[, roi_cols] # same matrix from Section 2
roi_scores <- partial_project(pca, X_roi, colind = roi_cols)
# Compare component 1 from full vs ROI
df_roi <- tibble(
full = scores_full[,1],
roi = roi_scores[,1]
)
ggplot(df_roi, aes(full, roi)) +
geom_point(alpha = .6) +
geom_abline(col = "red") +
coord_equal() +
labs(title = "Component 1 scores: full data vs ROI") +
theme_minimal()Interpretation: If the two sets of scores align tightly, the ROI variables are driving this component. A strong deviation would reveal that other variables dominate the global pattern.
Using the multiblock projector from Section 4, we can see how individual observations score when viewed through just one block:
# Get scores for observation 1 using only Block 1 variables (columns 1-4)
subject1_block1 <- project_block(mb, X[1, 1:4, drop = FALSE], block = 1)
# Get scores for the same observation using only Block 2 variables (columns 5-8)
subject1_block2 <- project_block(mb, X[1, 5:8, drop = FALSE], block = 2)
# Compare: do both blocks tell the same story about this observation?
cat("Subject 1 scores from Block 1:", round(subject1_block1, 2), "\n")
#> Subject 1 scores from Block 1: 3.43 0.72 -1.06
cat("Subject 1 scores from Block 2:", round(subject1_block2, 2), "\n")
#> Subject 1 scores from Block 2: -0.3 1.36 0.9
cat("Subject 1 scores from full data:", round(scores_full[1,], 2), "\n")
#> Subject 1 scores from full data: -0.02 1.2 0.49This lets you assess whether an observation’s position in the latent space is consistent across blocks, or whether one block tells a different story.
partial_project()| Scenario | What you pass | Typical call |
|---|---|---|
| Sensor outage / missing features | matrix with observed cols only | partial_project(mod, X_obs, colind = idx) |
| Region of interest (ROI) | ROI columns of the data | partial_project(mod, X[, ROI], ROI) |
| Block-specific latent scores | full block matrix | project_block(mb, blkData, block = b) |
| “What-if”: vary a single variable set | varied cols + zeros elsewhere | partial_project() with matching
colind |
The component space stays identical throughout, so downstream analytics, classifiers, or control charts continue to work with no re-training.
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.3
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
#>
#> time zone: America/Toronto
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] glmnet_4.1-10 Matrix_1.7-3 knitr_1.51 tibble_3.3.1
#> [5] dplyr_1.1.4 ggplot2_4.0.1 multivarious_0.3.1
#>
#> loaded via a namespace (and not attached):
#> [1] GPArotation_2025.3-1 utf8_1.2.6 sass_0.4.10
#> [4] future_1.68.0 generics_0.1.4 shape_1.4.6.1
#> [7] lattice_0.22-7 listenv_0.10.0 digest_0.6.39
#> [10] magrittr_2.0.4 evaluate_1.0.5 grid_4.5.1
#> [13] RColorBrewer_1.1-3 iterators_1.0.14 fastmap_1.2.0
#> [16] foreach_1.5.2 jsonlite_2.0.0 ggrepel_0.9.6
#> [19] RSpectra_0.16-2 survival_3.8-3 mgcv_1.9-3
#> [22] scales_1.4.0 pls_2.8-5 codetools_0.2-20
#> [25] jquerylib_0.1.4 cli_3.6.5 crayon_1.5.3
#> [28] rlang_1.1.7 chk_0.10.0 parallelly_1.45.1
#> [31] future.apply_1.20.0 splines_4.5.1 withr_3.0.2
#> [34] cachem_1.1.0 yaml_2.3.12 otel_0.2.0
#> [37] tools_4.5.1 parallel_4.5.1 corpcor_1.6.10
#> [40] globals_0.18.0 rsvd_1.0.5 assertthat_0.2.1
#> [43] vctrs_0.7.0 R6_2.6.1 matrixStats_1.5.0
#> [46] proxy_0.4-27 lifecycle_1.0.5 MASS_7.3-65
#> [49] irlba_2.3.5.1 pkgconfig_2.0.3 pillar_1.11.1
#> [52] bslib_0.9.0 geigen_2.3 gtable_0.3.6
#> [55] glue_1.8.0 Rcpp_1.1.1 xfun_0.55
#> [58] tidyselect_1.2.1 svd_0.5.8 farver_2.1.2
#> [61] nlme_3.1-168 htmltools_0.5.9 labeling_0.4.3
#> [64] rmarkdown_2.30 compiler_4.5.1 S7_0.2.1