Type: | Package |
Title: | Bootstrap Algorithms for Finite Population Inference |
Version: | 0.4.6 |
Date: | 2024-03-08 |
Description: | Finite Population bootstrap algorithms to estimate the variance of the Horvitz-Thompson estimator for single-stage sampling. For a survey of bootstrap methods for finite populations, see Mashreghi et Al. (2016) <doi:10.1214/16-SS113>. |
License: | GPL-3 |
Encoding: | UTF-8 |
BugReports: | https://github.com/rhobis/bootstrapFP/issues |
RoxygenNote: | 7.3.1 |
Imports: | sampling |
NeedsCompilation: | no |
Packaged: | 2024-03-08 22:29:45 UTC; Roberto |
Author: | Roberto Sichera [aut, cre] |
Maintainer: | Roberto Sichera <rob.sichera@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-03-08 23:00:02 UTC |
bootstrapFP: Bootstrap Algorithms for Finite Population Inference
Description
Perform bootstrap variance estimation of the Horvitz-Thompson total estimator in finite population sampling with equal or unequal probabilities.
Author(s)
Maintainer: Roberto Sichera rob.sichera@gmail.com
References
Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.
See Also
Useful links:
Report bugs at https://github.com/rhobis/bootstrapFP/issues
Antal and Tillé (2011) Bootstrap for Unequal Probability Sampling without replacement
Description
Draw B bootstrap samples according to Antal and Tillé (2011) direct bootstap method for Unequal Probability Sampling. Note that this method does not need a double bootstrap.
Usage
AntalTille2011_ups(
ys,
pks,
B,
smplFUN,
approx_method = c("Hajek", "DevilleTille")
)
Arguments
ys |
values of the variable of interest for the original sample |
pks |
vector of first-order inclusion probabilities for sampled units |
B |
integer scalar, number of bootstrap resamples to draw from the pseudo-population |
smplFUN |
a function that takes as input a vector of length N of
inclusion probabilities and return a vector of length N, either logical or a
vector of 0s and 1s, where |
approx_method |
method used to approximate the variance Dkk. |
Value
a list of two elements, a vector of K average bootstrap totals and a vector of K variance estimates.
References
Antal, E.; Tillé, Y., 2011. A Direct Bootstrap Method for Complex Sampling Designs From a Finite Population. Journal of the American Statistical Association, 106:494, 534-543, doi: 10.1198/jasa.2011.tm09767
Antal, E.; Tillé, Y., 2014. A new resampling method for sampling designs without replacement: the doubled half bootstrap. Computational Statistics, 29(5), 1345-1363. doi: 10.10007/s00180-014-0495-0
Bootstrap algorithms for Finite Population sampling
Description
Bootstrap variance estimation for finite population sampling.
Usage
bootstrapFP(
y,
pik,
B,
D = 1,
method,
design,
x = NULL,
s = NULL,
distribution = "uniform"
)
Arguments
y |
vector of sample values |
pik |
vector of sample first-order inclusion probabilities |
B |
scalar, number of bootstrap replications |
D |
scalar, number of replications for the double bootstrap (when applicable) |
method |
a string indicating the bootstrap method to be used, see Details for more |
design |
sampling procedure to be used for sample selection. Either a string indicating the name of the sampling design or a function; see section "Details" for more information. |
x |
vector of length N with values of the auxiliary variable for all population units, only required if method "ppHotDeck" is chosen |
s |
logical vector of length N, TRUE for units in the sample, FALSE otherwise. Alternatively, a vector of length n with the indices of the sample units. Only required for "ppHotDeck" method. |
distribution |
required only for |
Details
Argument design
accepts either a string indicating the sampling design
to use to draw samples or a function.
Accepted designs are "brewer", "tille", "maxEntropy", "poisson",
"sampford", "systematic", "randomSystematic".
The user may also pass a function as argument; such function should take as input
the parameters passed to argument design_pars
and return either a logical
vector or a vector of 0 and 1, where TRUE
or 1
indicate sampled
units and FALSE
or 0
indicate non-sample units.
The length of such vector must be equal to the length of x
if units
is not specified, otherwise it must have the same length of units
.
method
must be a string indicating the bootstrap method to use.
A list of the currently available methods follows, the sampling design they
they should be used with is indicated in square brackets.
The prefix "pp" indicates a pseudo-population method, the prefix "d"
represents a direct method, and the prefix "w" inicates a weights method.
For more details on these methods see Mashreghi et al. (2016).
"ppGross" [SRSWOR]
"ppBooth" [SRSWOR]
"ppChaoLo85" [SRSWOR]
"ppChaoLo94" [SRSWOR]
"ppBickelFreedman" [SRSWOR]
"ppSitter" [SRSWOR]
"ppHolmberg" [UPSWOR]
"ppChauvet" [UPSWOR]
"ppHotDeck" [UPSWOR]
"dEfron" [SRSWOR]
"dMcCarthySnowden" [SRSWOR]
"dRaoWu" [SRSWOR]
"dSitter" [SRSWOR]
"dAntalTille_UPS" [UPSWOR]
"wRaoWuYue" [SRSWOR]
"wChipperfieldPreston" [SRSWOR]
"wGeneralised" [any]
Value
The bootstrap variance of the Horvitz-Thompson estimator.
References
Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.
Examples
library(bootstrapFP)
### Generate population data ---
N <- 20; n <- 5
x <- rgamma(N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )
pik <- n * x/sum(x)
### Draw a dummy sample ---
s <- sample(N, n)
### Estimate bootstrap variance ---
bootstrapFP(y = y[s], pik = n/N, B=100, method = "ppSitter")
bootstrapFP(y = y[s], pik = pik[s], B=10, method = "ppHolmberg", design = 'brewer')
bootstrapFP(y = y[s], pik = pik[s], B=10, D=10, method = "ppChauvet")
bootstrapFP(y = y[s], pik = n/N, B=10, method = "dRaoWu")
bootstrapFP(y = y[s], pik = n/N, B=10, method = "dSitter")
bootstrapFP(y = y[s], pik = pik[s], B=10, method = "dAntalTille_UPS", design='brewer')
bootstrapFP(y = y[s], pik = n/N, B=10, method = "wRaoWuYue")
bootstrapFP(y = y[s], pik = n/N, B=10, method = "wChipperfieldPreston")
bootstrapFP(y = y[s], pik = pik[s], B=10, method = "wGeneralised", distribution = 'normal')
Bootstrap with Adjusted Weights
Description
Compute bootstrap estimates according to Bootstrap Weights procedures by Rao et Al. (1992) and Chipperfield and Preston (2007).
Usage
bootstrap_weights(ys, N, B, method = c("RaoWuYue", "ChipperfieldPreston"))
Arguments
ys |
values of the variable of interest for the original sample |
N |
scalar, representing the population size |
B |
integer scalar, number of bootstrap resamples to draw from the pseudo-population |
method |
a string indicating the bootstrap method to be used; available methods are "RaoWuYue" and "ChipperfieldPreston". |
Value
a list of two elements, a vector of K average bootstrap totals and a vector of K variance estimates.
References
Rao J. N. K.; Wu C. F. J.; Yue K. (1992). Some recent work on resampling methods for complex surveys. Journal of the American Statistical Association, 83(401), 620-630.
Chipperfield J.; Preston J. (2007).Efficient bootstrap for business surveys. Survey Methodology, 33(2), 167-172.
Define the phi vector
Description
Define the phi vector used to select the first sample in Antal & Tillé (2011)
bootstrap (algorithm 4, first step).
If the sum of the elements of \phi
is not an integer, phi is decomposed
in a convex combination of two vectors \phi_1
and \phi_2
,
such that the sum of \phi_1i
is the integer part of \sum phi_i
and the sum of \phi_2i
is the integer part of \sum phi_i
plus 1
[see Antal and Tille' (2011) bootstrap procedure
for unequal probability sampling, p. 539 - Algorithm 4, Case 1]
The procedure used to decompose the vector \phi
is described in the
answer to this question: https://math.stackexchange.com/questions/2700483/vector-decomposition-into-a-convex-combination-of-two-vectors-with-constraints-o
Usage
define_phi(phi)
Arguments
phi |
vector of inclusion probabilities for Antal and Tillé (2011) bootstrap, given by 1 - D_kk |
Value
a list with the two vectors in which phi
is decomposed
Direct bootstrap methods for simple random sampling
Description
Direct bootstrap methods for simple random sampling
Usage
directBS_srs(y, N, B, method)
Arguments
y |
vector of sample values |
N |
scalar, representing the population size |
B |
scalar, number of bootstrap replications |
method |
a string indicating the bootstrap method to be used, available methods are: 'Efron', 'McCarthySnowden', 'RaoWu', 'Sitter'. |
Details
See Mashreghi et al. (2016) for details about the algorithm.
References
Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.
Select a doubled-half sampling (Antal and Tille', 2014)
Description
Select a doubled-half sampling (Antal and Tille', 2014)
Usage
doubled_half(n)
Arguments
n |
integer scalar representing sample size |
Value
an integer vector of size n
, indicating how many times each unit is
present in the sample
Generalised Bootstrap
Description
Compute bootstrap estimates according to Generalised Bootstrap procedure by Beaumont and Patak (2012)
Usage
generalised(
ys,
pks,
B,
distribution = c("uniform", "normal", "exponential", "lognormal")
)
Arguments
ys |
values of the variable of interest for the original sample |
pks |
inclusion probabilities for units in the sample |
B |
integer scalar, number of bootstrap resamples to draw from the pseudo-population |
distribution |
the distribution from which to generate the weights
adjustments. One of |
Value
a list of two elements, a vector of K average bootstrap totals and a vector of K variance estimates.
References
Bertail, P., & Combris, P. (1997). Bootstrap généralisé d'un sondage. Annales d'Economie et de Statistique, 49-83.
Beaumont, J. F., & Patak, Z. (2012). On the generalized bootstrap for sample surveys with special attention to Poisson sampling. International Statistical Review, 80(1), 127-148.
Check if a number is integer
Description
Check if x
is an integer number, differently from is.integer
,
which checks the type of the object x
Usage
is_wholenumber(x, tol = .Machine$double.eps^0.5)
Arguments
x |
a scalar or a numeric vector |
tol |
a scalar, indicating the tolerance |
Note
From the help page of function is.integer
Select a one-one sampling
Description
A one-one sampling is a design for which the random variables Sk, representing the number of times unit k is included in the sample, have expectation and variance equal to 1. Proposed by Antal and Tille' (2011, 2014).
Usage
one_one(n, method = c("doubled-half", "over-replacement"))
Arguments
n |
integer, the sample size |
method |
algorithm to be used, either doubled half sampling or srs with over-replacement. See the Details section. |
Details
Antal and Tillé proposed two procedures that lead to one-one samplings.
The first one (Antal and Tillé, 2011a) in more complex and makes use of a simple
random Sampling with over-replacement (Antal and Tillé, 2011b),
and it is called by setting method = "over-replacement"
.
The second one (Antal and Tillé, 2014) is the doubled half sampling, which is
simpler and quickier to compute, and can employed by setting
method = "doubled-half"
; this is the default option.
Value
an integer vector of size n
, indicating how many times each unit is
present in the sample
Select a simple random sampling with over-replacement
Description
Used for resampling procedures. Proposed by Antal and Tille' (2011).
Usage
over_replacement(N, n)
Arguments
N |
integer, the population size |
n |
integer, the sample size |
Value
an integer vector of size n, indicating how many times each unit is present in the sample
References
Antal, E.; Tillé, Y. (2011). Simple random sampling with over-replacement. Journal of Statistical Planning and Inference, 141(1), 597-601.
Pseudo-population bootstrap for simple random sampling
Description
Pseudo-population bootstrap for simple random sampling
Usage
ppBS_srs(y, N, B, D = 1, method)
Arguments
y |
vector of sample values |
N |
scalar, represents the population size |
B |
scalar, number of bootstrap replications |
D |
scalar, number of replications for the double bootstrap (when applicable) |
method |
a string indicating the bootstrap method to be used, available methods are: 'Gross', 'Booth', 'ChaoLo85', 'ChaoLo94', 'BickelFreedman', 'Sitter' |
Details
See Mashreghi et al. (2016) for details about these bootstrap methods.
References
Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.
Pseudo-population bootstrap for simple random sampling
Description
Pseudo-population bootstrap for simple random sampling
Usage
ppBS_ups(y, pik, B, D = 1, method, smplFUN, x = NULL, s = NULL)
Arguments
y |
vector of sample values |
pik |
vector of sample first-order inclusion probabilities |
B |
scalar, number of bootstrap replications |
D |
scalar, number of replications for the double bootstrap |
method |
a string indicating the bootstrap method to be used, available methods are: 'Gross', 'Booth', 'ChaoLo85', 'ChaoLo94', 'BickelFreedman', 'Sitter' |
smplFUN |
a function that takes as input a vector of length N of
inclusion probabilities and return a vector of length N, either logical or a
vector of 0s and 1s, where |
x |
vector of length N with values of the auxiliary variable for all population units, only required if method "HotDeck" is chosen |
s |
logical vector of length N, TRUE for units in the sample, FALSE otherwise. Alternatively, a vector of length n with the indices of the sample units. Only required for "HotDeck" method. |
References
Mashreghi Z.; Haziza D.; Léger C., 2016. A survey of bootstrap methods in finite population sampling. Statistics Surveys 10 1-52.
Select the random part of a pseudo-population
Description
Helper function that generates the fixed part of a pseudo-population in
function ppBS_srs()
.
Usage
select_Uc(..., method)
Arguments
... |
parameters of the function, depending on the bootstap method chosen. |
method |
string indicating the bootstrap method |