第 21 章 功效分析

CRAN 上有很多功效计算和分析的 R 包,我们针对不同的混合效应模型和统计检验,提供对应的 R 实现。

MKpower 包提供 Welch 和 Hsu(许宝騄)t 检验、Wilcoxon 秩和检验、符号秩检验的功效分析和样本量计算,经验功效和第一类错误的计算方法是蒙特卡罗模拟。Superpower 基于模拟的方法分析三因素方差分析实验设计的功效,开发者写了本书介绍,详见 https://aaroncaldwell.us/SuperpowerBook/,也开发了两个 Shiny 应用。powerlmm 可用于计算两、三个水平的纵向多水平/线性混合效应模型的功效。pwrAB Welch 两样本 t 检验的功效分析,常用于 A/B 测试。Metin Bulus 开发 PowerUpR 计算响应变量是连续型的多水平随机对照实验统计功效,最小可检测的效应大小,最小样本量要求。simr 通过模拟方法分析广义线性混合效应模型的功效。WebPower 提供相关性、比例、t 检验、单因素方差分析、两因素方差分析、线性回归、逻辑回归、泊松回归、纵向数据分析、结构方程模型和多水平模型等的功效分析,详见网站 https://webpower.psychstat.org/,包含书籍和功效分析的工具。PowerAnalysisIL 功效分析的 shiny 应用 http://daniellakens.blogspot.com/2015/01/always-use-welchs-t-test-instead-of.html

此外,还有 lmerTest [51] 和 lmtest [52]。 试验设计 [53] 可以视为一种组织形式,包括各类检验, R 语言实战 [54] 作者 Robert I. Kabacoff 创建了网站 Quick-R,实战这本书第 10 章功效分析主要基于 pwr 包来介绍,Jacob Cohen 的著作《Statistical Power Analysis for the Behavioral Sciences》第二版 [55]

https://powerandsamplesize.com/ 功效和样本量计算器

pbkrtest 提供 parametric bootstrap test、Kenward-Roger-type F-test、Satterthwaite-type F-test 用于线性混合效应模型,parametric bootstrap test 用于广义线性混合效应模型

21.1 方差分析检验的功效

power.anova.test() 计算平衡的单因素方差分析检验的功效

usage(power.anova.test)
power.anova.test(groups = NULL, n = NULL, between.var = NULL, within.var = NULL,
    sig.level = 0.05, power = NULL)
power.anova.test(
  groups = 4,       #  4 个组  
  between.var = 1,  # 组间方差为 1
  within.var = 3,   # 组内方差为 3
  power = 0.95      # 1 - 犯第二类错误的概率
)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##          groups = 4
##               n = 18.18245
##     between.var = 1
##      within.var = 3
##       sig.level = 0.05
##           power = 0.95
## 
## NOTE: n is number in each group

21.2 比例检验的功效

power.prop.test() 计算两样本比例检验的功效

usage(power.prop.test)
power.prop.test(n = NULL, p1 = NULL, p2 = NULL, sig.level = 0.05, power = NULL,
    alternative = c("two.sided", "one.sided"), strict = FALSE,
    tol = .Machine$double.eps^0.25)

功效可以用来计算实验所需要的样本量,检验统计量的功效越大/高,检验方法越好,实验所需要的样本量越少

# p1 >= p2 的检验 单边和双边检验
power.prop.test(
  p1 = .65, p2 = 0.6, sig.level = .05,
  power = 0.90, alternative = "one.sided"
)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 1603.846
##              p1 = 0.65
##              p2 = 0.6
##       sig.level = 0.05
##           power = 0.9
##     alternative = one.sided
## 
## NOTE: n is number in *each* group
power.prop.test(
  p1 = .65, p2 = 0.6, sig.level = .05,
  power = 0.90, alternative = "two.sided"
)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 1968.064
##              p1 = 0.65
##              p2 = 0.6
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

pwrpwr.2p.test() 函数提供了类似 power.prop.test() 函数的功能

library(pwr)
# 明确 p1 > p2 的检验
# 单边检验拆分更加明细,分为大于和小于
pwr.2p.test(
  h = ES.h(p1 = 0.65, p2 = 0.6),
  sig.level = 0.05, power = 0.9, alternative = "greater"
)
## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.1033347
##               n = 1604.007
##       sig.level = 0.05
##           power = 0.9
##     alternative = greater
## 
## NOTE: same sample sizes

已知两样本的样本量不等,检验 H0: \(p_1 = p_2\) H1: \(p_1 \neq p_2\) 的功效

library(pwr)
pwr.2p2n.test(
  h = 0.30, n1 = 80, n2 = 245,
  sig.level = 0.05, alternative = "greater"
)
## 
##      difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.3
##              n1 = 80
##              n2 = 245
##       sig.level = 0.05
##           power = 0.7532924
##     alternative = greater
## 
## NOTE: different sample sizes

h 表示两个样本的差异,计算得到的功效是 0.75

21.3 t 检验的功效

power.t.test() 计算单样本或两样本的 t 检验的功效,或者根据功效计算参数,如样本量

Cohen’s d 单样本/配对 t 检验的功效分析

n = 30  # 样本量(只是一个例子)
x = seq(0, 12, 0.01)
library(ggplot2)
dat <- data.frame(xx = x/sqrt(n), yy = 2 * (1 - pt(x, n - 1)))
ggplot(data = dat, aes(x = xx, y = yy)) +
  geom_line() +
  geom_vline(xintercept = c(0.01, 0.2, 0.5, 0.8, 1.2, 2), linetype = 2) +
  theme_minimal() +
  labs(x = "d = t / sqrt(n)", y = "2 * (1 - pt(x, n - 1))")
t 检验的功效

图 21.1: t 检验的功效

usage(power.t.test)
power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL,
    type = c("two.sample", "one.sample", "paired"),
    alternative = c("two.sided", "one.sided"), strict = FALSE,
    tol = .Machine$double.eps^0.25)
power.t.test(
  n = 100, delta = 2.2,
  sd = 1, sig.level = 0.05,
  type = "two.sample",
  alternative = "two.sided"
)
## 
##      Two-sample t test power calculation 
## 
##               n = 100
##           delta = 2.2
##              sd = 1
##       sig.level = 0.05
##           power = 1
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
表 21.1: 函数 power.t.test() 的参数表
参数 含义
n 每个组的样本量
delta 两个组的均值之差
sd 标准差,默认值 1
sig.level 显著性水平,默认是 0.05 (犯第 I 类错误的概率)
power 检验的功效(1 - 犯第 II 类错误的概率)
type t 检验的类型 "two.sample" 两样本、"one.sample" 单样本或 "paired" 配对样本
alternative 单边或双边检验,取值为 "two.sided""one.sided"

参数 ndeltapowersdsig.level 必须有一个值为 NULL,为 NULL 的参数是由其它参数决定的。

Jacob Cohen 提出的 Cohen’s d 和 Cohen’s f 详见书籍 [55],他的代表性文章,地球是圆的 [35]

# 前面 t 检验和方差分析检验的等价功效计算
library(pwr)
pwr.t.test(
  d = 2.2 / 6.4,
  n = 100,
  sig.level = 0.05,
  type = "two.sample",
  alternative = "two.sided"
)
## 
##      Two-sample t test power calculation 
## 
##               n = 100
##               d = 0.34375
##       sig.level = 0.05
##           power = 0.6768572
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
# f 是如何和上面的组间/组内方差等价指定的
pwr.anova.test(
  k = 4, # 组数
  f = 0.5,
  power = 0.95 # 检验的效
)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 18.18244
##               f = 0.5
##       sig.level = 0.05
##           power = 0.95
## 
## NOTE: n is number in each group
with(
  aggregate(
    data = PlantGrowth, weight ~ group,
    FUN = function(x) c(dist_mean = mean(x), dist_sd = sd(x))
  ),
  cbind.data.frame(weight, group)
)

R 3.5.0 以后,函数 aggregate 的参数 drop 默认设置为 TRUE 表示扔掉未用来分组的变量,聚合返回的是一个矩阵类型的数据对象。

ggsignif 添加显著性注释

library(ggplot2)
library(ggsignif)

ggplot(data = PlantGrowth, aes(x = group, y = weight)) +
  geom_boxplot() +
  geom_signif(comparisons = list(c("ctrl", "trt1"), c("trt1", "trt2")), 
              map_signif_level = function(p) sprintf("p = %.2g", p), 
              textsize = 6, test = "t.test") +
  theme_minimal()

无条件 \(2 \times 2\) 列联表

fisher.test https://en.wikipedia.org/wiki/Fisher's_exact_test

Exact https://en.wikipedia.org/wiki/Barnard's_test exact.test power.exact.test

exact2x2

21.4 运行环境

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.5  lme4_1.1-27.1  Matrix_1.3-4   pwr_1.3-0      formatR_1.11  
## [6] magrittr_2.0.1
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.1 xfun_0.25        bslib_0.3.0      purrr_0.3.4     
##  [5] splines_4.1.1    lattice_0.20-44  colorspace_2.0-2 vctrs_0.3.8     
##  [9] generics_0.1.0   htmltools_0.5.2  yaml_2.2.1       utf8_1.2.2      
## [13] rlang_0.4.11     jquerylib_0.1.4  pillar_1.6.2     nloptr_1.2.2.2  
## [17] glue_1.4.2       withr_2.4.2      DBI_1.1.1        lifecycle_1.0.0 
## [21] stringr_1.4.0    munsell_0.5.0    gtable_0.3.0     evaluate_0.14   
## [25] labeling_0.4.2   knitr_1.33       fastmap_1.1.0    curl_4.3.2      
## [29] fansi_0.5.0      highr_0.9        Rcpp_1.0.7       scales_1.1.1    
## [33] jsonlite_1.7.2   farver_2.1.0     fs_1.5.0         digest_0.6.27   
## [37] stringi_1.7.4    bookdown_0.24    dplyr_1.0.7      grid_4.1.1      
## [41] tools_4.1.1      sass_0.4.0       tibble_3.1.4     crayon_1.4.1    
## [45] pkgconfig_2.0.3  downlit_0.2.1    MASS_7.3-54      ellipsis_0.3.2  
## [49] xml2_1.3.2       assertthat_0.2.1 minqa_1.2.4      rmarkdown_2.10  
## [53] R6_2.5.1         boot_1.3-28      nlme_3.1-152     compiler_4.1.1

参考文献

[35]
J. Cohen, “The earth is round (\(p < .05\)),” American Psychologist, vol. 49, no. 12, pp. 997–1003, 1994, doi: 10.1037/0003-066x.49.12.997.
[51]
A. Kuznetsova, P. B. Brockhoff, and R. H. B. Christensen, lmerTest package: Tests in linear mixed effects models,” Journal of Statistical Software, vol. 82, no. 13, pp. 1–26, 2017, doi: 10.18637/jss.v082.i13.
[52]
A. Zeileis and T. Hothorn, “Diagnostic checking in regression relationships,” R News, vol. 2, no. 3, pp. 7–10, 2002,Available: https://CRAN.R-project.org/doc/Rnews/
[53]
茆诗松, 周纪芗, and 陈颖, 试验设计, 1st ed. 北京: 中国统计出版社, 2004.
[54]
R. I. Kabacoff, R in action: Data analysis and graphics with r, 2nd ed. Shelter Island, NY: Manning PUblications Co., 2015.Available: https://github.com/kabacoff/RiA2
[55]
J. Cohen, Statistical power analysis for the behavioral sciences, 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates, 1988.Available: https://www.utstat.toronto.edu/~brunner/oldclass/378f16/readings/CohenPower.pdf