Generates n
bootstrap samples of data
and
returns the bootstrapped data frames as list-variable.
bootstrap(data, n, size)
A data frame.
Number of bootstraps to be generated.
Optional, size of the bootstrap samples. May either be a number
between 1 and nrow(data)
or a value between 0 and 1 to sample
a proportion of observations from data
(see 'Examples').
A data frame with one column: a list-variable
strap
, which contains resample-objects of class sj_resample
.
These resample-objects are lists with three elements:
the original data frame, data
the rownmumbers id
, i.e. rownumbers of data
, indicating the resampled rows with replacement
the resample.id
, indicating the index of the resample (i.e. the position of the sj_resample
-object in the list strap
)
By default, each bootstrap sample has the same number of observations
as data
. To generate bootstrap samples without resampling
same observations (i.e. sampling without replacement), use
size
to get bootstrapped data with a specific number
of observations. However, specifying the size
-argument is much
less memory-efficient than the bootstrap with replacement. Hence,
it is recommended to ignore the size
-argument, if it is
not really needed.
This function applies nonparametric bootstrapping, i.e. the function
draws samples with replacement.
There is an as.data.frame
- and a print
-method to get or
print the resampled data frames. See 'Examples'. The as.data.frame
-
method automatically applies whenever coercion is done because a data
frame is required as input. See 'Examples' in boot_ci
.
boot_ci
to calculate confidence intervals from
bootstrap samples.
data(efc)
bs <- bootstrap(efc, 5)
# now run models for each bootstrapped sample
lapply(bs$strap, function(x) lm(neg_c_7 ~ e42dep + c161sex, data = x))
#> [[1]]
#>
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#>
#> Coefficients:
#> (Intercept) e42dep c161sex
#> 7.58282 1.49628 0.02491
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#>
#> Coefficients:
#> (Intercept) e42dep c161sex
#> 5.825 1.679 0.622
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#>
#> Coefficients:
#> (Intercept) e42dep c161sex
#> 6.8755 1.5939 0.1626
#>
#>
#> [[4]]
#>
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#>
#> Coefficients:
#> (Intercept) e42dep c161sex
#> 7.4220 1.4319 0.1228
#>
#>
#> [[5]]
#>
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#>
#> Coefficients:
#> (Intercept) e42dep c161sex
#> 6.8990 1.3165 0.6587
#>
#>
# generate bootstrap samples with 600 observations for each sample
bs <- bootstrap(efc, 5, 600)
# generate bootstrap samples with 70% observations of the original sample size
bs <- bootstrap(efc, 5, .7)
# compute standard error for a simple vector from bootstraps
# use the `as.data.frame()`-method to get the resampled
# data frame
bs <- bootstrap(efc, 100)
bs$c12hour <- unlist(lapply(bs$strap, function(x) {
mean(as.data.frame(x)$c12hour, na.rm = TRUE)
}))
# bootstrapped standard error
boot_se(bs, "c12hour")
#> term std.err
#> 1 c12hour 1.643594
# bootstrapped CI
boot_ci(bs, "c12hour")
#> term conf.low conf.high
#> 1 c12hour 39.34469 45.86718