% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sdid.R
\name{sdid}
\alias{sdid}
\title{Fit a staggered difference-in-differences model}
\usage{
sdid(
  formula,
  df,
  weights = NULL,
  cohort_var = NULL,
  cohort_ref = NULL,
  cohort_time_refs = NULL,
  time_var = NULL,
  time_ref = NULL,
  intervention_var,
  .vcov = stats::vcov,
  ...
)
}
\arguments{
\item{formula}{An object of class "formula" (or one that can be coerced to
that class): a symbolic description of the model to be fitted. The details of
model specification are given under 'Details'.}

\item{df}{A data frame containing the variables in the model.}

\item{weights}{An optional vector of weights to be passed to \code{stats::lm()} to
be used in the fitting process. Should be NULL or a numeric vector.}

\item{cohort_var}{Name of the variable in \code{df} that contains cohort
assignments. If NULL, this is assumed to be the first column named in the
right hand side of \code{formula}.}

\item{cohort_ref}{Value of \code{cohort_var} that serves as the referent for main
effects for cohorts. If NULL, this is assumed to the be the first value in
the set of values for \code{cohort_var}.}

\item{cohort_time_refs}{A list, whose elements are named to match levels of
\code{cohort_var}, specifying the value of \code{time_var} that serves as the referent
for each time interaction with values of \code{cohort_var}. See 'Details.'}

\item{time_var}{Name of the variable in \code{df} that contains time periods. If
NULL, this is assumed to be the second column named in the right hand side of
\code{formula}.}

\item{time_ref}{Value of \code{time_var} that serves as the referent for main
effects for time periods. If NULL, this is assumed to the be the first value
in the set of values for \code{time_var}.}

\item{intervention_var}{Name of the cohort-level variable in \code{df} that
specifies which values in \code{time_var} correspond to the first
post-intervention time period for each cohort.}

\item{.vcov}{Function to be used to estimate the variance-covariance matrix.
Defaults to stats::vcov.}

\item{...}{Additional arguments to be passed to \code{.vcov}.}
}
\value{
Returns an object of class \code{sdid}, which is a list containing the
following components:

mdl
: The \code{lm} object returned from the call to \code{stats::lm()} in \code{sdid()}

formula
: A list object containing both the original formula specified in the call to \code{sdid()} and the generated formula, with all cohort-time interactions, passed to \code{stats::lm()} to fit the model

vcov
: The variance-covariance matrix used to estimate standard errors

tsi
: The time-since-intervention dataset used to enumerate time periods relative to the intervention period for each cohort

obs_cnt
: Counts of observations within each cohort-time interaction
cohort
: A list object containing details about cohorts. \code{var} contains the name of the column in \code{df} that identifies cohorts; \code{ref} contains the value of the cohort column that functions as the referent for main effects; and \code{time_refs} contains the referent time values within each cohort for each set of cohort-time interactions.

time
: A list object containing \code{var}, which is the name of the column in \code{df} identified by the \code{sdid()} argument \code{time_var}, and \code{ref}, the referent value of \code{time_var} for main effects.

intervention_var
: Name of the column in \code{df} that contains the time period during which each cohort implemented the intervention of interest

covariates
: A character vector containing the terms in \code{formula} other than those corresponding to cohorts and time periods
}
\description{
Fits a linear staggered difference-in-differences model, following the
Abraham and Sun (2018) approach. It facilitates optional weighting and
user-specified variance-covariance function.
}
\details{
Fitting a staggered difference-in-differences model requires deliberate
attention to two specific independent variables:
\itemize{
\item The intervention cohort column assigns a cohort name to all individuals or groups having the the intervention during the same time period. For example, if the longitudinal data is at the year level, ranging from 2010 to 2020, and it contains 15 counties, 3 of whom implemented the intervention of interest in 2015, those 3 counties would be assigned to the same cohort. Similarly, if 2 more counties implemented the intervention in 2016, those 2 counties would be assigned to the next cohort.
\item The time period column assigns each observation to a time period at the most granular level of the longitudinal data. In the example described above, these values would correspond to the years 2010, ..., 2020.
}

To specify a model, a formula is passed following the format \code{response ~ cohort_var + time_var + covariates}. This, however, is not the formula use to fit the model; \code{sdid()} expands this formula to include main effects and every possible interaction between \code{cohort_var} and \code{time_var}, excluding referents for identification:
\itemize{
\item Referents for main effects are either the first levels \code{cohort_var} and \code{time_var} or the referents specified in \code{cohort_ref} and \code{time_ref}.
\item Referents for cohort-time interactions are either the factor level of \code{time_var} that immediately precedes the value of \code{intervention_var} within each cohort or the referents specified in \code{cohort_time_refs}.
}

\code{sdid()} also accommodates aggregated data through the \code{weights} argument.
}
\examples{
# Fit a staggered difference-in-differences model
sdid_hosp <- sdid(hospitalized ~ cohort + yr + age + sex + comorb,
                  df = hosp,
                  intervention_var  = "intervention_yr")
summary(sdid_hosp)
}
\references{
Abraham S, Sun L. Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects. MIT; 2018.
}
