% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cor_df.R
\name{cor_df}
\alias{cor_df}
\title{Compute signed pairwise correlations dataframe}
\usage{
cor_df(df = NULL, predictors = NULL, quiet = FALSE, ...)
}
\arguments{
\item{df}{(required; dataframe, tibble, or sf) A dataframe with responses
(optional) and predictors. Must have at least 10 rows for pairwise
correlation analysis, and \code{10 * (length(predictors) - 1)} for VIF.
Default: NULL.}

\item{predictors}{(optional; character vector or NULL) Names of the
predictors in \code{df}. If NULL, all columns except \code{responses} and
constant/near-zero-variance columns are used. Default: NULL.}

\item{quiet}{(optional; logical) If FALSE, messages are printed. Default: FALSE.}

\item{...}{(optional) Internal args (e.g. \code{function_name} for
\code{\link{validate_arg_function_name}}, a precomputed correlation matrix
\code{m}, or cross-validation args for \code{\link{preference_order}}).}
}
\value{
dataframe with columns:
\itemize{
\item \code{x}: character, first predictor name.
\item \code{y}: character, second predictor name.
\item \code{correlation}: numeric, Pearson correlation (numeric vs. numeric and numeric vs. categorical) or Cramer's V (categorical vs. categorical).
}
}
\description{
Computes pairwise correlations between predictors using appropriate methods for different variable types:
\itemize{
\item \strong{Numeric vs. Numeric}: Pearson correlation via \code{stats::cor()}.
\item \strong{Numeric vs. Categorical}: Target-encodes the categorical variable  using the numeric variable as reference via \code{\link[=target_encoding_lab]{target_encoding_lab()}} with leave-one-out method, then computes Pearson correlation.
\item \strong{Categorical vs. Categorical}: Cramer's V via \code{\link[=cor_cramer]{cor_cramer()}} as a measure of association. See \code{\link[=cor_cramer]{cor_cramer()}} for important notes on mixing Pearson correlation and Cramer's V in multicollinearity analysis.
}

Parallelization via \code{future::plan()} and progress bars via \code{progressr::handlers()} are supported but only beneficial for large datasets with categorical predictors. Numeric-only correlations do not use parallelization or progress bars. Example: With 16 workers, 30k rows (dataframe \link{vi}), 49 numeric and 12 categorical predictors (see \link{vi_predictors}), parallelization achieves a 5.4x speedup (147s → 27s).
}
\examples{
data(vi_smol)

## OPTIONAL: parallelization setup
## irrelevant when all predictors are numeric
## only worth it for large data with many categoricals
# future::plan(
#   future::multisession,
#   workers = future::availableCores() - 1
# )

## OPTIONAL: progress bar
# progressr::handlers(global = TRUE)

#predictors
predictors = c(
  "koppen_zone", #character
  "soil_type", #factor
  "topo_elevation", #numeric
  "soil_temperature_mean" #numeric
)

x <- cor_df(
  df = vi_smol,
  predictors = predictors
)

x

## OPTIONAL: disable parallelization
#future::plan(future::sequential)
}
\seealso{
Other multicollinearity_assessment: 
\code{\link{collinear_stats}()},
\code{\link{cor_clusters}()},
\code{\link{cor_cramer}()},
\code{\link{cor_matrix}()},
\code{\link{cor_stats}()},
\code{\link{vif}()},
\code{\link{vif_df}()},
\code{\link{vif_stats}()}
}
\concept{multicollinearity_assessment}
