% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/impute_randomforest.R
\name{impute_randomforest}
\alias{impute_randomforest}
\title{Imputation of Missing Values Using Random Forest Imputation}
\usage{
impute_randomforest(
  data,
  sample,
  grouping,
  intensity_log2,
  retain_columns = NULL,
  ...
)
}
\arguments{
\item{data}{A data frame that contains the input variables. This should include columns for
the sample names, precursor or peptide identifiers, and intensity values.}

\item{sample}{A character column in the \code{data} data frame that contains the sample names.}

\item{grouping}{A character column in the \code{data} data frame that contains the precursor or
peptide identifiers.}

\item{intensity_log2}{A numeric column in the \code{data} data frame that contains the intensity
values.}

\item{retain_columns}{A character vector indicating which columns should be retained from the
input data frame. These columns will be preserved in the output alongside the imputed values.
By default, no additional columns are retained (\code{retain_columns = NULL}), but specific
columns can be retained by providing their names as a vector.}

\item{...}{Additional parameters to pass to the \code{missForest} function. These parameters
can control aspects such as the number of trees (\code{ntree}) and the stopping criteria
(\code{maxiter}).}
}
\value{
A data frame that contains an \code{imputed_intensity} column with the imputed values
and an \code{imputed} column indicating whether each value was imputed (\code{TRUE}) or not
(\code{FALSE}), in addition to any columns retained via \code{retain_columns}.
}
\description{
\code{impute_randomforest} performs imputation for missing values in the data using the random
forest-based method implemented in the \code{missForest} package.
}
\details{
The function imputes missing values by building random forests, where missing values are
predicted based on other available values within the dataset. For each variable with missing
data, the function trains a random forest model using the available (non-missing) data in
that variable, and subsequently predicts the missing values.

In addition to the imputed values, users can choose to retain additional columns from the
original input data frame that were not part of the imputation process.

This function allows passing additional parameters to the underlying \code{missForest} function,
such as controlling the number of trees used in the random forest models or specifying the
stopping criteria. For a full list of parameters, refer to the \code{missForest} documentation.

To enable parallelisation, ensure that the \code{doParallel} package is installed and loaded:

\if{html}{\out{<div class="sourceCode">}}\preformatted{install.packages("doParallel")
library(doParallel)
}\if{html}{\out{</div>}}

Then register the desired number of cores for parallel processing:

\if{html}{\out{<div class="sourceCode">}}\preformatted{registerDoParallel(cores = 6)
}\if{html}{\out{</div>}}

To leverage parallelisation during the imputation, pass \code{parallelize = "variables"}
as an argument to the \code{missForest} function.
}
\examples{
set.seed(123) # Makes example reproducible

# Create example data
data <- create_synthetic_data(
  n_proteins = 10,
  frac_change = 0.5,
  n_replicates = 4,
  n_conditions = 2,
  method = "effect_random",
  additional_metadata = FALSE
)

head(data, n = 24)

# Perform imputation
data_imputed <- impute_randomforest(
  data,
  sample = sample,
  grouping = peptide,
  intensity_log2 = peptide_intensity_missing
)

head(data_imputed, n = 24)
}
\references{
Stekhoven, D.J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation
for mixed-type data. Bioinformatics, 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597
}
\author{
Elena Krismer
}
