% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ddbs_interpolate_aw.R
\name{ddbs_interpolate_aw}
\alias{ddbs_interpolate_aw}
\title{Areal-Weighted Interpolation using DuckDB}
\usage{
ddbs_interpolate_aw(
  target,
  source,
  tid,
  sid,
  extensive = NULL,
  intensive = NULL,
  weight = "sum",
  output = "sf",
  keep_NA = TRUE,
  na.rm = FALSE,
  join_crs = NULL,
  conn = NULL,
  name = NULL,
  crs = NULL,
  crs_column = "crs_duckspatial",
  overwrite = FALSE,
  quiet = FALSE
)
}
\arguments{
\item{target}{An \code{sf} object or the name of a persistent table in the DuckDB connection
representing the destination geometries.}

\item{source}{An \code{sf} object or the name of a persistent table in the DuckDB connection
containing the data to be interpolated.}

\item{tid}{Character. The name of the column in \code{target} that uniquely identifies features.}

\item{sid}{Character. The name of the column in \code{source} that uniquely identifies features.}

\item{extensive}{Character vector. Names of columns in \code{source} to be treated as
spatially extensive (e.g., population counts).}

\item{intensive}{Character vector. Names of columns in \code{source} to be treated as
spatially intensive (e.g., population density).}

\item{weight}{Character. Determines the denominator calculation for extensive variables.
Either \code{"sum"} (default) or \code{"total"}. See \strong{Mass Preservation} in Details.}

\item{output}{Character. One of \code{"sf"} (default) or \code{"tibble"}.
\itemize{
\item \code{"sf"}: The result includes the geometry column of the target.
\item \code{"tibble"}: The result \strong{excludes the geometry column}. This is significantly faster
and consumes less storage.
}
\strong{Note:} This argument also controls the schema of the created table if \code{name} is provided.}

\item{keep_NA}{Logical. If \code{TRUE} (default), returns all features from the target,
even those that do not overlap with the source (values will be NA). If \code{FALSE},
performs an inner join, dropping non-overlapping target features.}

\item{na.rm}{Logical. If \code{TRUE}, source features with \code{NA} values in the
interpolated variables are completely removed from the calculation (area calculations
will behave as if that polygon did not exist). Defaults to \code{FALSE}.}

\item{join_crs}{Numeric or Character (optional). EPSG code or WKT for the CRS to use
for area calculations. If provided, both \code{target} and \code{source} are transformed
to this CRS within the database before interpolation.}

\item{conn}{A connection object to a DuckDB database. If \code{NULL}, the function
runs on a temporary DuckDB database.}

\item{name}{A character string of length one specifying the name of the table,
or a character string of length two specifying the schema and table
names. If \code{NULL} (the default), the function returns the result as an
\code{sf} object}

\item{crs}{The coordinates reference system of the data. Specify if the data
doesn't have a \code{crs_column}, and you know the CRS.}

\item{crs_column}{a character string of length one specifying the column
storing the CRS (created automatically by \code{\link{ddbs_write_vector}}).
Set to \code{NULL} if absent.}

\item{overwrite}{Boolean. whether to overwrite the existing table if it exists. Defaults
to \code{FALSE}. This argument is ignored when \code{name} is \code{NULL}.}

\item{quiet}{A logical value. If \code{TRUE}, suppresses any informational messages.
Defaults to \code{FALSE}.}
}
\value{
\itemize{
\item If \code{name} is \code{NULL} (default): Returns an \code{sf} object (if \code{output="sf"})
or a \code{tibble} (if \code{output="tibble"}).
\item If \code{name} is provided: Returns \code{TRUE} invisibly and creates a persistent table
in the DuckDB database.
\itemize{
\item If \code{output="sf"}, the table \strong{includes} the geometry column.
\item If \code{output="tibble"}, the table \strong{excludes} the geometry column (pure attributes).
}
}
}
\description{
Transfers attribute data from a source spatial layer to a target spatial layer based
on the area of overlap between their geometries. This function executes all spatial
calculations within DuckDB, enabling efficient processing of large datasets without
loading all geometries into R memory.
}
\details{
Areal-weighted interpolation is used when the source and target geometries are incongruent (they do not align). It relies on the assumption of \strong{uniform distribution}: values in the source polygons are assumed to be spread evenly across the polygon's area.

\strong{Coordinate Systems:}
Area calculations are highly sensitive to the Coordinate Reference System (CRS).
While the function can run on geographic coordinates (lon/lat), it is strongly recommended
to use a \strong{projected CRS} (e.g., EPSG:3857, UTM, or Albers) to ensure accurate area measurements.
Use the \code{join_crs} argument to project data on-the-fly during the interpolation.

\strong{Extensive vs. Intensive Variables:}
\itemize{
\item \strong{Extensive} variables are counts or absolute amounts (e.g., total population,
number of voters). When a source polygon is split, the value is divided proportionally
to the area.
\item \strong{Intensive} variables are ratios, rates, or densities (e.g., population density,
cancer rates). When a source polygon is split, the value remains constant for each piece.
}

\strong{Mass Preservation (The \code{weight} argument):}
For extensive variables, the choice of weight determines the denominator used in calculations:
\itemize{
\item \code{"sum"} (default): The denominator is the sum of all overlapping areas
for that source feature. This preserves the "mass" of the variable \emph{relative to the target's coverage}.
If the target polygons do not completely cover a source polygon, some data is technically "lost"
because it falls outside the target area. This matches \code{areal::aw_interpolate(weight="sum")}.
\item \code{"total"}: The denominator is the full geometric area of the source feature.
This assumes the source value is distributed over the entire source polygon. If the target
covers only 50\% of the source, only 50\% of the value is transferred. This is strictly
mass-preserving relative to the source. This matches \code{sf::st_interpolate_aw(extensive=TRUE)}.
}
\emph{Note:} Intensive variables are always calculated using the \code{"sum"} logic (averaging
based on intersection areas) regardless of this parameter.
}
\examples{
\donttest{
library(sf)

# 1. Prepare Data
# Load NC counties (Source) and project to Albers (EPSG:5070)
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
nc <- st_transform(nc, 5070)
nc$sid <- seq_len(nrow(nc)) # Create Source ID

# Create a target grid
g <- st_make_grid(nc, n = c(10, 5))
g_sf <- st_as_sf(g)
g_sf$tid <- seq_len(nrow(g_sf)) # Create Target ID

# 2. Extensive Interpolation (Counts)
# Use weight = "total" for strict mass preservation (e.g., total births)
res_ext <- ddbs_interpolate_aw(
  target = g_sf, source = nc,
  tid = "tid", sid = "sid",
  extensive = "BIR74",
  weight = "total"
)

# Check mass preservation
sum(res_ext$BIR74, na.rm = TRUE) / sum(nc$BIR74) # Should be ~1

# 3. Intensive Interpolation (Density/Rates)
# Calculates area-weighted average (e.g., assumption of uniform density)
res_int <- ddbs_interpolate_aw(
  target = g_sf, source = nc,
  tid = "tid", sid = "sid",
  intensive = "BIR74"
)

# 4. Quick Visualization
par(mfrow = c(1, 2))
plot(res_ext["BIR74"], main = "Extensive (Total Count)", border = NA)
plot(res_int["BIR74"], main = "Intensive (Weighted Avg)", border = NA)
}

}
\references{
Prener, C. and Revord, C. (2019). \emph{areal: An R package for areal weighted interpolation}.
\emph{Journal of Open Source Software}, 4(37), 1221.
Available at: \doi{10.21105/joss.01221}
}
\seealso{
\code{\link[areal:aw_interpolate]{areal::aw_interpolate()}} — reference implementation.
}
