Generates n bootstrap samples of data and returns the bootstrapped data frames as list-variable.

bootstrap(data, n, size)

Arguments

data

A data frame.

n

Number of bootstraps to be generated.

size

Optional, size of the bootstrap samples. May either be a number between 1 and nrow(data) or a value between 0 and 1 to sample a proportion of observations from data (see 'Examples').

Value

A data frame with one column: a list-variable

strap, which contains resample-objects of class sj_resample. These resample-objects are lists with three elements:

  1. the original data frame, data

  2. the rownmumbers id, i.e. rownumbers of data, indicating the resampled rows with replacement

  3. the resample.id, indicating the index of the resample (i.e. the position of the sj_resample-object in the list strap)

Details

By default, each bootstrap sample has the same number of observations as data. To generate bootstrap samples without resampling same observations (i.e. sampling without replacement), use size to get bootstrapped data with a specific number of observations. However, specifying the size-argument is much less memory-efficient than the bootstrap with replacement. Hence, it is recommended to ignore the size-argument, if it is not really needed.

Note

This function applies nonparametric bootstrapping, i.e. the function draws samples with replacement.

There is an as.data.frame- and a print-method to get or print the resampled data frames. See 'Examples'. The as.data.frame- method automatically applies whenever coercion is done because a data frame is required as input. See 'Examples' in boot_ci.

See also

boot_ci to calculate confidence intervals from bootstrap samples.

Examples

data(efc)
bs <- bootstrap(efc, 5)

# now run models for each bootstrapped sample
lapply(bs$strap, function(x) lm(neg_c_7 ~ e42dep + c161sex, data = x))
#> [[1]]
#> 
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#> 
#> Coefficients:
#> (Intercept)       e42dep      c161sex  
#>      6.7217       1.4854       0.5294  
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#> 
#> Coefficients:
#> (Intercept)       e42dep      c161sex  
#>      6.8244       1.5494       0.4017  
#> 
#> 
#> [[3]]
#> 
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#> 
#> Coefficients:
#> (Intercept)       e42dep      c161sex  
#>      6.1121       1.6200       0.6238  
#> 
#> 
#> [[4]]
#> 
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#> 
#> Coefficients:
#> (Intercept)       e42dep      c161sex  
#>      6.5247       1.5744       0.2904  
#> 
#> 
#> [[5]]
#> 
#> Call:
#> lm(formula = neg_c_7 ~ e42dep + c161sex, data = x)
#> 
#> Coefficients:
#> (Intercept)       e42dep      c161sex  
#>       5.294        1.791        0.694  
#> 
#> 

# generate bootstrap samples with 600 observations for each sample
bs <- bootstrap(efc, 5, 600)

# generate bootstrap samples with 70% observations of the original sample size
bs <- bootstrap(efc, 5, .7)

# compute standard error for a simple vector from bootstraps
# use the `as.data.frame()`-method to get the resampled
# data frame
bs <- bootstrap(efc, 100)
bs$c12hour <- unlist(lapply(bs$strap, function(x) {
  mean(as.data.frame(x)$c12hour, na.rm = TRUE)
}))

# or as tidyverse-approach
if (require("dplyr") && require("purrr")) {
  bs <- efc %>%
    bootstrap(100) %>%
    mutate(
      c12hour = map_dbl(strap, ~mean(as.data.frame(.x)$c12hour, na.rm = TRUE))
    )

  # bootstrapped standard error
  boot_se(bs, c12hour)
}
#>      term  std.err
#> 1 c12hour 1.663036