% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/disparity_resample.R
\name{disparity_resample}
\alias{disparity_resample}
\title{Resampling-based estimates (bootstrap or rarefaction) of disparity
/ morphospace occupation}
\usage{
disparity_resample(
  Data,
  group = NULL,
  n_resamples = 1000,
  statistic = "multivariate_variance",
  CI = 0.95,
  bootstrap_rarefaction = "bootstrap",
  sample_size = NULL
)
}
\arguments{
\item{Data}{A data frame, matrix, vector, or 3D array.
Observations (specimens) must be in rows (if a 3D array is
supplied, the third dimension is assumed to index specimens).}

\item{group}{A factor or a vector indicating group membership (will
be coerced to factor). If `NULL` (default) a single analysis is
performed on the full dataset.}

\item{n_resamples}{Number of resampling replicates (default 1000).}

\item{statistic}{Character string identifying the statistic to
compute. One of `"multivariate_variance"`,
`"mean_pairwise_euclidean_distance"`, `"convex_hull_volume"`,
`"claramunt_proper_variance"`. Default is
`"multivariate_variance"`. Ignored for univariate data (vector
input).}

\item{CI}{Desired two-sided confidence interval level (default 0.95)
used for percentile confidence intervals. Use CI=1 to display the
full range (minimum to maximum) of resampled values.}

\item{bootstrap_rarefaction}{Either `"bootstrap"` (default) for
resampling with replacement or `"rarefaction"` for resampling
without replacement.}

\item{sample_size}{Either `NULL` (default), a positive integer
indicating the number of rows to sample, or the character
`"smallest"` to use the size of the smallest group (all groups 
resampled to that size). For `bootstrap_rarefaction=="rarefaction"`,
sampling is without replacement and this parameter is required
(not `NULL`). For `bootstrap_rarefaction=="bootstrap"`, sampling
is with replacement; if `NULL`, uses original group sizes,
otherwise uses the specified sample size. If `"smallest"` is
supplied but no groups are defined, an error is returned.}
}
\value{
A list containing:
 \describe{
   \item{chosen_statistic}{Character vector of length 1 with the
   human-readable name of the statistic used.}
   \item{results}{A data frame with columns `group`, `average`,
   `CI_min`, `CI_max`. One row per group.
     When CI=1, `CI_min` and `CI_max` represent the minimum and
     maximum of resampled values rather than confidence intervals.}
   \item{resampled_values}{If a single group: numeric vector of
   length `n_resamples` with the resampled values.
     If multiple groups: a named list with one numeric vector
     (length `n_resamples`) per group.}
   \item{CI_level}{The CI level used (between 0 and 1). When CI=1,
   ranges are computed instead of confidence intervals.}
}

The returned object has class "disparity_resample" and comes with
associated S3 methods for convenient display and visualization:
\itemize{
  \item \code{\link{print.disparity_resample}}: Prints a formatted
  summary of results including confidence interval overlap
  assessment for multiple groups
  \item \code{\link{plot.disparity_resample}}: Creates a confidence
  interval plot using ggplot2
}
}
\description{
Provides a unified interface to obtain resampled (bootstrap or
rarefied) estimates of several disparity / morphospace occupation
statistics.
}
\details{
The function allows choosing among the following multivariate
statistics:
\itemize{
 \item Multivariate variance (trace of the covariance matrix; sum
 of variances)
 \item Mean pairwise Euclidean distance
 \item Convex hull volume (n-dimensional)
 \item Claramunt proper variance (Claramunt 2010) based on a linear
 shrinkage covariance matrix
}

If the input `Data` is univariate (i.e., a vector), the analysis
defaults to computing the (univariate) variance within each group
(the attribute `statistic` is ignored).

If `bootstrap_rarefaction=="bootstrap"`, the function performs
resampling with replacement (i.e., classical bootstrap) by sampling
rows of the data matrix / data frame.
Optionally, the user can specify a custom sample size via the
`sample_size` argument.
This allows comparison of bootstrap confidence intervals at the
same sample size (essentially, this is rarefaction sampling with
replacement), which can be useful to compare bootstrapped
confidence intervals across different groups for statistics which
are sensitive to sample size (at the expense of broader than
necessary confidence intervals for groups that are larger).

If `bootstrap_rarefaction=="rarefaction"`, the function performs
resampling without replacement at the sample size indicated in
`sample_size` (numeric) or, if `sample_size=="smallest"`, at the
size of the smallest group (all groups are resampled to that size).
Rarefaction requires specifying a valute to the attribute
`sample_size`; an error is returned otherwise.


This function uses the future framework for parallel processing,
requiring packages future and future.apply.
Users should set up their preferred parallelization strategy using
\code{future::plan()} before calling this function.
For example:
\itemize{
  \item \code{future::plan(future::sequential)} for sequential
  processing
  \item \code{future::plan(future::multisession, workers = 4)} for
  parallel processing with 4 workers (works on all platforms
  including Windows)
  \item \code{future::plan(future::multicore, workers = 4)} for
  forked processes (Unix-like systems)
  \item \code{future::plan(future::cluster, workers =
  c("host1", "host2"))} for cluster computing
}
If no plan is set or the future packages are not available,
the function will use sequential processing.
}
\note{
Because of how the computation works, convex hull volume
computation requires the number of observations (specimens) to be
(substantially) greater than the number of variables (dimensions).
In case of shape or similar, consider using the scores along the
first (few/several) principal components.
Sometimes errors are thrown due to near-zero components, in this
case try reducing the number of principal components used.
Examples of use of this statistic with geometric morphometric data
include Drake & Klingenberg 2010 (American Naturalist), Fruciano
et al. 2012 (Environmental Biology of Fishes) and Fruciano et al.
2014 (Biological Journal of the Linnean Society).
Because of the sensitivity of this statistic to outliers, usually
rarefaction is preferred to bootstrapping.

"Multivariate variance" is also called "total variance",
"Procrustes variance" (in geometric morphometrics) and "sum of
univariate variances".
Note how the computation here does not divide variance by sample
size (other than the normal division performed in the computation
of variances).
}
\section{Parallelization}{

This function automatically uses parallel processing via the future
framework, when the packages future and future.apply are installed.
This is particularly useful for large datasets, large number of
resamples or computationally intensive statistics (e.g., convex
hull volume).
The parallelization strategy is determined by the user's choice of
future plan, providing flexibility across different computing
environments (local multicore, cluster, etc.). 
The function performs parallelization at the level of individual
bootstrap/rarefaction replicates within each group. The future plan
should be set up by the user before calling this function using
\code{future::plan()} (see examples). If no plan is set or the
future packages are not available, the function will use sequential
processing.
}

\section{Average estimate}{

For bootstrap resampling, the average estimate reported is the mean
of the bootstrap resampled values (for consistency across all
bootstrap scenarios). For rarefaction, because the purpose is to
make groups comparable at a common (smaller) sample size, the
average estimate reported is the mean of the rarefied resampled
values for that group (i.e., the mean across all rarefaction
replicates).
}

\section{Input data types}{

`Data` can be a data frame, a matrix, a vector, or a 3D array of
landmark coordinates (e.g., p landmarks x k dimensions x n
specimens). In the latter case, the array is converted internally
to a 2D matrix with specimens in rows and (landmark * dimension)
variables in columns.
}

\examples{
set.seed(123)
# Simulate two groups with different means but same covariance
if (requireNamespace("MASS", quietly = TRUE)) {
  X1 = MASS::mvrnorm(20, mu=rep(0, 10), Sigma=diag(10))
  X2 = MASS::mvrnorm(35, mu=rep(2, 10), Sigma=diag(10))
  Data = rbind(X1, X2)
  grp = factor(c(rep("A", nrow(X1)), rep("B", nrow(X2))))

  # Sequential processing
  # future::plan(future::sequential)  # Default sequential processing
  
  # Parallel processing (uncomment to use)
  # future::plan(future::multisession, workers = 2)  # Use 2 workers

  # Bootstrap multivariate variance
  boot_res = disparity_resample(
    Data, group=grp, n_resamples=200,
    statistic="multivariate_variance",
    bootstrap_rarefaction="bootstrap"
  )
  # Direct access to results table
  boot_res$results
  
  # Using the print method for formatted output
  print(boot_res)
  
  # Using the plot method to visualize results
  # plot(boot_res)  # Uncomment to create confidence interval plot

  # Rarefaction (to the smallest group size) of mean pairwise
  # Euclidean distance
  rar_res = disparity_resample(
    Data, group=grp, n_resamples=200,
    statistic="mean_pairwise_euclidean_distance",
    bootstrap_rarefaction="rarefaction", sample_size="smallest"
  )
# Now simulate a third group with larger variance
 X3 = MASS::mvrnorm(15, mu=rep(0, 10), Sigma=diag(10)*1.5)
 grp2 = factor(
   c(rep("A", nrow(X1)), rep("B", nrow(X2)), rep("C", nrow(X3)))
 )
 boot_res2 = disparity_resample(
   Data=rbind(X1, X2, X3), group=grp2, n_resamples=1000,
   statistic="multivariate_variance",
   bootstrap_rarefaction="bootstrap"
 )
 print(boot_res2)
 # plot(boot_res2)
 # Plot of the obtained (95\%) confidence intervals (uncomment to plot)
 
 # Reset to sequential processing when done (optional)
 # future::plan(future::sequential)
}

}
\references{
Drake AG, Klingenberg CP. 2010. Large-scale
diversification of skull shape in domestic dogs: disparity and
modularity. American Naturalist 175:289-301.

Claramunt S. 2010. Discovering exceptional
diversifications at continental scales: the case of the endemic
families of Neotropical Suboscine passerines. Evolution
64:2004-2019.

Fruciano C, Tigano C, Ferrito V. 2012. Body shape
variation and colour change during growth in a protogynous fish.
Environmental Biology of Fishes 94:615-622.

Fruciano C, Pappalardo AM, Tigano C, Ferrito V. 2014.
Phylogeographical relationships of Sicilian brown trout and the
effects of genetic introgression on morphospace occupation.
Biological Journal of the Linnean Society 112:387-398.

Fruciano C, Franchini P, Raffini F, Fan S, Meyer A.
2016. Are sympatrically speciating Midas cichlid fish special?
Patterns of morphological and genetic variation in the closely
related species Archocentrus centrarchus. Ecology and Evolution
6:4102-4114.
}
\seealso{
\code{\link{disparity_test}},
\code{\link{print.disparity_resample}},
\code{\link{plot.disparity_resample}}
}
