Sample complete groups (“blocks”) from a grouped data frame. This package implements a simple block bootstrap style sampler: instead of sampling individual rows, you sample entire groups preserving the intra-group structure.
Motivation
When observations belonging to the same experimental unit are spread across multiple rows (e.g. multiple measurements per subject, dose combinations, time series segments), ordinary row-wise sampling breaks these units apart. A block sampler keeps units intact by sampling at the group level.
Core function
slice_block() works on a grouped data frame. If you call it on an ungrouped data frame, it throws a helpful error.
Key arguments:
-
n: number of groups (blocks) to sample. -
replace: sample with replacement? Needed whennexceeds number of groups. -
weight_by: optional expression (unquoted) evaluated per-group to weight sampling probabilities. -
...: passed to basesample()
Basic example
We use the built-in ToothGrowth dataset and treat each supplement-dose combination as a block.
library(dplyr)
library(blockstrap)
set.seed(1)
ToothGrowth |>
group_by(supp, dose) |>
slice_block(n = 2)## # A tibble: 20 × 3
## # Groups: supp, dose [2]
## len supp dose
## <dbl> <fct> <dbl>
## 1 15.2 OJ 0.5
## 2 21.5 OJ 0.5
## 3 17.6 OJ 0.5
## 4 9.7 OJ 0.5
## 5 14.5 OJ 0.5
## 6 10 OJ 0.5
## 7 8.2 OJ 0.5
## 8 9.4 OJ 0.5
## 9 16.5 OJ 0.5
## 10 9.7 OJ 0.5
## 11 4.2 VC 0.5
## 12 11.5 VC 0.5
## 13 7.3 VC 0.5
## 14 5.8 VC 0.5
## 15 6.4 VC 0.5
## 16 10 VC 0.5
## 17 11.2 VC 0.5
## 18 11.2 VC 0.5
## 19 5.2 VC 0.5
## 20 7 VC 0.5Sampling with replacement
If you want to sample more groups than exist, or allow repeats:
ToothGrowth |>
group_by(supp, dose) |>
slice_block(n = 10, replace = TRUE) |>
count(supp, dose)## # A tibble: 5 × 3
## # Groups: supp, dose [5]
## supp dose n
## <fct> <dbl> <int>
## 1 OJ 0.5 20
## 2 OJ 1 20
## 3 OJ 2 30
## 4 VC 1 20
## 5 VC 2 10Repeated blocks will appear multiple times (row counts summed accordingly).
Weighted sampling
Weight blocks by a statistic, e.g. mean tooth length, to favor larger mean response groups:
set.seed(42)
weighted <- ToothGrowth |>
group_by(supp, dose) |>
slice_block(n = 3, weight_by = mean(len))You can verify weighting bias by repeating and tallying frequencies: