This posting is based on the paper “A simple method of sample size calculation for linear and logistic regression” by F. Y. Hsieh et al., which can be found under http://dx.doi.org/10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S .
We consider the case where we want to calculate the sample size for a multiple logistic regression with continous response variable and with continous covariates.
Fomula (1) in the paper computes the required sample size for a simple logistic regression, given the effect size to be tested, the event rate at the mean of the (single) covariate, level of significance, and required power for the test. This formula is implemented in the function SSizeLogisticCon() from R package “powerMediation” and can easily be applied.
For the multiple case, Hsieh et al. introduce the variance inflation factor (VIF), with which the sample size for the simple case can be inflated to get the sample size for the multiple case. I have implemented it as R function:
## p1: the event rate at the mean of the predictor X ## OR: expected odds ratio. log(OR) is the change in log odds ## for an increase of one unit in X. ## beta*=log(OR) is the effect size to be tested ## r2: r2 = rho^2 = R^2, for X_1 ~ X_2 + ... + X_p ssize.multi <- function(p1, OR, r2, alpha=0.05, power=0.8) { n1 <- SSizeLogisticCon(p1, OR, alpha, power) np <- n1 / (1-r2) return(np) }
Another approximation for the simple case is given in Formula (4), and is based on formulae given by A. Whittemore in “Sample size for logistic regression with small response probability”. I have implemente the simple case,
## p1: as above ## p2: event rate at one SD above the mean of X ssize.whittemore <- function (p1, p2, alpha = 0.05, power = 0.8) { beta.star <- log(p2*(1-p1)/(p1*(1-p2))) za <- qnorm(1 - alpha/2) zb <- qnorm(power) V0 <- 1 Vb <- exp(-beta.star^2 / 2) delta <- (1+(1+beta.star^2)*exp(5*beta.star^2 / 4)) * (1+exp(-beta.star^2 / 4))^(-1) n <- (V0^(1/2)*za + Vb^(1/2)*zb)^2 * (1+2*p1*delta) / (p1*beta.star^2) n.int <- ceiling(n) return(n.int) }
and the multiple case.
## all parameters as above ssize.whittemore.multi <- function(p1, p2, r2, alpha=0.05, power=0.8) { n1 <- ssize.whittemore(p1, p2, alpha, power) np <- n1 / (1-r2) return(np) }
The complete R script, the examples from the paper included, can be found under http://rpubs.com/candrea/ssizelogreg .