Monthly Archives: March 2015

A Simple Method of Sample Size Calculation for Logistic Regression

This posting is based on the paper “A simple method of sample size calculation for linear and logistic regression” by F. Y. Hsieh et al., which can be found under<1623::AID-SIM871>3.0.CO;2-S .

We consider  the case where we want to calculate the sample size for a multiple  logistic regression with continous response variable and with continous covariates.

Fomula (1) in the paper computes the required sample size for a simple logistic regression, given the effect size to be tested, the event rate at the mean of the (single) covariate, level of significance, and required power for the test. This formula is implemented in the function SSizeLogisticCon()  from R package “powerMediation” and can easily be applied.

For the multiple case, Hsieh et al. introduce the variance inflation factor (VIF), with which the sample size for the simple case can be inflated to get the sample size for the multiple case. I have implemented it as R function:

## p1: the event rate at the mean of the predictor X
## OR: expected odds ratio. log(OR) is the change in log odds 
##     for an increase of one unit in X.
##     beta*=log(OR) is the effect size to be tested
## r2: r2 = rho^2 = R^2, for X_1 ~ X_2 + ... + X_p
ssize.multi <- function(p1, OR, r2, alpha=0.05, power=0.8) {
	n1 <- SSizeLogisticCon(p1, OR, alpha, power)
	np <- n1 / (1-r2)

Another approximation for the simple case is given in Formula (4), and is based on formulae given by A. Whittemore in “Sample size for logistic regression with small response probability”. I have implemente the simple case,

## p1: as above
## p2: event rate at one SD above the mean of X
ssize.whittemore <- function (p1, p2, alpha = 0.05, power = 0.8) { <- log(p2*(1-p1)/(p1*(1-p2)))
    za <- qnorm(1 - alpha/2)
    zb <- qnorm(power)
    V0 <- 1
    Vb <- exp(^2 / 2)
    delta <- (1+(^2)*exp(5*^2 / 4)) * (1+exp(^2 / 4))^(-1)
    n <- (V0^(1/2)*za + Vb^(1/2)*zb)^2 * (1+2*p1*delta) / (p1*^2) <- ceiling(n)

and the multiple case.

## all parameters as above
ssize.whittemore.multi <- function(p1, p2, r2, alpha=0.05, power=0.8) {
	n1 <- ssize.whittemore(p1, p2, alpha, power)
	np <- n1 / (1-r2)

The complete R script, the examples from the paper included, can be found under .


New R Package “betas”

In social science it is often required to compute standardized regression coefficients – called beta coefficients, or simply betas. These betas can be interpreted as effects, and thus are independent of the original scale.

There are two approaches how you can obtain betas. You Z-standardize the data before fitting or you compute the betas after fitting the model.

The first approach has flaws if you have non-numeric variables in your data or if you intent to incorporate interaction terms in your model.

The second approach is a way more convenient, but until now there was no R package helping you compute betas for as many kinds of models as you needed. For example with lm.beta() from “QuantPsyc” Package you cannot handle models with factors with more than two levels.

The features of the “betas” R package are (so far for v0.1.1):

Compute standardized beta coefficients and corresponding standard errors for the following models:

  • linear regression models with numerical covariates only
  • linear regression models with numerical and factorial covariates
  • weighted linear regression models
  • all these linear regression models with interaction terms
  • robust linear regression models with numerical covariates only

You can install the package from CRAN (


The package is maintained on GitHub: .

Feel free to report issues: .

Enjoy hassle-free computations of betas in R!