| Type: | Package | 
| Title: | Near-Far Matching | 
| Version: | 1.3 | 
| Date: | 2024-01-22 | 
| Author: | Joseph Rigdon <jrigdon@wakehealth.edu> | 
| Maintainer: | Joseph Rigdon <jrigdon@wakehealth.edu> | 
| Imports: | GenSA, MASS, car, stats | 
| Description: | Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. Methods outlined in further detail in Rigdon, Baiocchi, and Basu (2018) <doi:10.18637/jss.v086.c05>. | 
| License: | GPL-3 | 
| Depends: | nbpMatching | 
| NeedsCompilation: | no | 
| Packaged: | 2024-01-22 14:18:48 UTC; joerigdon | 
| Repository: | CRAN | 
| Date/Publication: | 2024-01-23 13:00:02 UTC | 
Near-Far Matching
Description
Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable.
Details
| Package: | nearfar | 
| Type: | Package | 
| Version: | 1.3 | 
| Date: | 2024-01-15 | 
| License: | GPL-3 | 
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Rigdon J, Baiocchi M, Basu S (2018). Near-far matching in R: The nearfar package. Journal of Statistical Software, 86(5), 1-21.
Baiocchi M, Small D, Lorch S, Rosenbaum P (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association, 105(492), 1285-1296.
Baiocchi M, Small D, Yang L, Polsky D, Groeneveld P (2012). Near-far matching: a study design approach to instrumental variables. Health Services and Outcomes Research Methodology, 12(4), 237-253.
Angrist data set for education and wages
Description
A random sample of 1000 observations from the data set used by Angrist and Krueger in their investigation of the impact ' of education on future wages.
Format
A data frame with 1000 observations on the following 7 variables.
wagea numeric vector
educa numeric vector
qoba numeric vector
IVa numeric vector
agea numeric vector
marrieda numeric vector
racea numeric vector
Details
This data set is a random sample of 1000 observations from the URL listed below.
Source
https://economics.mit.edu/people/faculty/josh-angrist/angrist-data-archive
References
Angrist JD, Krueger AB (1991). Does Compulsory School Attendance Affect Schooling and Earnings? The Quarterly Journal of Economics, 106(4), 979-1014.
Examples
library(nearfar)
str(angrist)
## maybe str(angrist) ; plot(angrist) ...
Matching priority function
Description
Updates given distance matrix to prioritize specified measured
confounders in a pair match.  Used in consort with
matches function to prioritize specific measured
confounders in a near-far match in the opt_nearfar function.
Usage
calipers(distmat, variable, tolerance = 0.2)
Arguments
distmat | 
 An object of class distance matrix  | 
variable | 
 Named variable from list of measured confounders  | 
tolerance | 
 Penalty to apply to mismatched observations; values near 0 penalize mismatches more  | 
Value
Returns an updated distance matrix
See Also
Examples
dd = mtcars[1:4, 2:3]
cc = calipers(distmat=smahal(dd), variable=dd$cyl, tolerance=0.2)
cc
Inference for effect ratio
Description
Conducts inference on effect ratio as described in Section 3.3 of Baiocchi (2010), resulting in an estimate and a permutation based confidence interval for the effect ratio.
Usage
eff_ratio(dta, match, outc, trt, alpha)
Arguments
dta | 
 The name of the data frame object  | 
match | 
 Data frame where first column contains indices for those
individuals encouraged into treatment by instrumental variable and
second column contains indices for those individuals discouraged
from treatment by instrumental variable; returned by both
  | 
outc | 
 The name of the outcome variable in quotes, e.g., “wages”  | 
trt | 
 The name of the treatment variable, e.g., “educ”  | 
alpha | 
 Level of confidence interval  | 
Value
est.emp | 
 Empirical estimate of effect ratio  | 
est.HL | 
 Hodges-Lehmann type estimate of effect ratio  | 
lower | 
 Lower limit to 1-alpha/2 confidence interval for effect ratio  | 
upper | 
 Upper limit to 1-alpha/2 confidence interval for effect ratio  | 
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Baiocchi M, Small D, Lorch S, Rosenbaum P (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association, 105(492), 1285-1296.
Examples
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
eff_ratio(dta=mtcars, match=k2, outc="wt", trt="gear", alpha=0.05)
Function to find pair matches using a distance matrix.  Called by
opt_nearfar to discover optimal near-far matches.
Description
Given values of percent sinks and cutpoint, this function will find the corresponding near-far match
Usage
matches(dta, covs, iv = NA, imp.var = NA, tol.var = NA, sinks = 0,
    cutpoint = NA)
Arguments
dta | 
 The name of the data frame on which to do the matching  | 
covs | 
 A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race")  | 
iv | 
 The name of the instrumental variable, e.g., iv="QOB"  | 
imp.var | 
 A list of (up to 5) named variables to prioritize in the “near” matching  | 
tol.var | 
 A list of (up to 5) tolerances attached to the prioritized variables where 0 is highest penalty for mismatch  | 
sinks | 
 Percentage of the data to match to sinks (and thus remove) if desired; default is 0  | 
cutpoint | 
 Value below which individuals are too similar on iv; increase to make individuals more “far” in match  | 
Details
Default settings yield a "near" match on only observed confounders in X; add IV, sinks, and cutpoint to get near-far match.
Value
A two-column matrix of row indices of paired matches
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Lu B, Greevy R, Xu X, Beck C (2011). Optimal nonbipartite matching and its statistical applications. The American Statistician, 65(1), 21-30.
See Also
Examples
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
k2[1:5, ]
Finds optimal near-far match
Description
Discovers optimal near-far matches using the partial F statistic (for continuous treatments) or partial deviance (for binary and treatments)
Usage
opt_nearfar(dta, trt, covs, iv, trt.type = "cont", imp.var = NA,
tol.var = NA, adjust.IV = TRUE, sink.range = c(0, 0.5), cutp.range = NA,
max.time.seconds = 300)
Arguments
dta | 
 The name of the data frame on which matching was performed  | 
trt | 
 The name of the treatment variable, e.g., “educ”  | 
iv | 
 The name of the instrumental variable, e.g., iv="QOB"  | 
covs | 
 A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race")  | 
trt.type | 
 Treatment variable type: “cont” for continuous, or “bin” for binary  | 
imp.var | 
 A list of (up to 5) named variables to prioritize in the “near” matching  | 
tol.var | 
 A list of (up to 5) tolerances attached to the prioritized variables where 0 is highest penalty for mismatch  | 
adjust.IV | 
 if TRUE, include measured confounders in treatment~IV model that is optimized; if FALSE, exclude  | 
sink.range | 
 A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed  | 
cutp.range | 
 a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV)  | 
max.time.seconds | 
 How long to let the optimization algorithm run; default is 300 seconds = 5 minutes  | 
Value
n.calls | 
 Number of calls made to the objective function  | 
sink.range | 
 A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed  | 
cutp.range | 
 a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV)  | 
pct.sink | 
 Optimal percent sinks  | 
cutp | 
 Optimal cutpoint  | 
maxF | 
 Highest value of partial F-statistic (continuous treatment) or residual deviance (binary treatment) found by simulated annealing optimizer  | 
match | 
 A two column matrix where the first column is the index of an “encouraged” individual and the second column is the index of the corresponding “discouraged” individual from the pair matching  | 
summ | 
 A table of mean variable values for both the “encouraged” and “discouraged” groups across all variables plus absolute standardized differences for each variable  | 
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Lu B, Greevy R, Xu X, Beck C (2011). Optimal nonbipartite matching and its statistical applications. The American Statistician, 65(1), 21-30.
Xiang Y, Gubian S, Suomela B, Hoeng J (2013). Generalized Simulated Annealing for Efficient Global Optimization: the GenSA Package for R. The R Journal, 5(1). URL http://journal.r-project.org/.
Examples
k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=2)
summary(k)
Compute rank-based Mahalanobis distance matrix between each pair
Description
This function computes the rank-based Mahalanobis distance matrix
between each pair of observations in the data set.  Called by
matches (and ultimately opt_nearfar)
function to set up a distance matrix used to create pair matches.
Usage
smahal(X)
Arguments
X | 
 A matrix of observed confounders with n rows (observations) and p columns (variables)  | 
Value
Returns the rank-based Mahalanobis distance matrix between every pair of observations
Examples
smahal(mtcars[1:4, 2:3])
Computes table of absolute standardized differences
Description
Computes absolute standardized differences for both
continuous and binary variables.  Called by opt_nearfar to
summarize results of near-far match.
Usage
summ_matches(dta, iv, covs, match)
Arguments
dta | 
 The name of the data frame on which matching was performed  | 
iv | 
 The name of the instrumental variable, e.g., iv="QOB"  | 
covs | 
 A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race")  | 
match | 
 A two-column matrix of row indices of paired matches  | 
Value
A table of mean variable values for both the “encouraged” and “discouraged” groups across all variables plus absolute standardized differences for each variable
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
See Also
Examples
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
     cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
summ_matches(dta=mtcars, iv="carb", covs=c("cyl", "disp"), match=k2)
Summary method for object of class “nf”
Description
Displays key information, e.g., number of matches tried,
and post-match balance, for opt_nearfar function
Usage
## S3 method for class 'nf'
summary(object, ...)
Arguments
object | 
 Object of class “nf” returned by   | 
... | 
 additional arguments affecting the summary produced  | 
Value
Returns a summary of results from opt_nearfar function
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
See Also
Examples
k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=1)
summary(k)