{"id":158,"date":"2015-03-31T15:04:00","date_gmt":"2015-03-31T13:04:00","guid":{"rendered":"http:\/\/candrea.ch\/blog\/?p=158"},"modified":"2015-03-31T15:05:02","modified_gmt":"2015-03-31T13:05:02","slug":"a-simple-method-of-sample-size-calculation-for-logistic-regression","status":"publish","type":"post","link":"https:\/\/candrea.ch\/blog\/a-simple-method-of-sample-size-calculation-for-logistic-regression\/","title":{"rendered":"A Simple Method of Sample Size Calculation for Logistic Regression"},"content":{"rendered":"<p>This posting is based on the paper\u00a0&#8220;A simple method of sample size calculation for linear and logistic regression&#8221; by F. Y. Hsieh et al., which can be found under\u00a0<a href=\"http:\/\/dx.doi.org\/10.1002\/(SICI)1097-0258(19980730)17:14&amp;lt;1623::AID-SIM871&amp;gt;3.0.CO;2-S\" target=\"_blank\">http:\/\/dx.doi.org\/10.1002\/(SICI)1097-0258(19980730)17:14&lt;1623::AID-SIM871&gt;3.0.CO;2-S<\/a>\u00a0.<\/p>\n<p>We consider \u00a0the case where we want to calculate the sample size for a multiple \u00a0logistic regression with continous response variable and with continous covariates.<\/p>\n<p>Fomula (1) in the paper computes the required sample size for a simple logistic regression, given the effect size to be tested, the event rate at the mean of the (single) covariate, level of significance, and required power for the test. This formula is implemented in the function\u00a0<span class=\"lang:default decode:true  crayon-inline\">SSizeLogisticCon()<\/span>\u00a0 from R package &#8220;powerMediation&#8221; and can easily be applied.<\/p>\n<p>For the multiple case, Hsieh et al. introduce the variance inflation factor (VIF), with which the sample size for the simple case can be inflated to get the sample size for the multiple case. I have implemented it as R function:<\/p>\n<pre class=\"lang:r decode:true\">## p1: the event rate at the mean of the predictor X\r\n## OR: expected odds ratio. log(OR) is the change in log odds \r\n##     for an increase of one unit in X.\r\n##     beta*=log(OR) is the effect size to be tested\r\n## r2: r2 = rho^2 = R^2, for X_1 ~ X_2 + ... + X_p\r\nssize.multi &lt;- function(p1, OR, r2, alpha=0.05, power=0.8) {\r\n\tn1 &lt;- SSizeLogisticCon(p1, OR, alpha, power)\r\n\tnp &lt;- n1 \/ (1-r2)\r\n\treturn(np)\r\n}<\/pre>\n<p>Another approximation for the simple case is given in Formula (4), and is based on formulae given by A. Whittemore in &#8220;Sample size for logistic regression with small response probability&#8221;. I have implemente the simple case,<\/p>\n<pre class=\"lang:r decode:true \">## p1: as above\r\n## p2: event rate at one SD above the mean of X\r\nssize.whittemore &lt;- function (p1, p2, alpha = 0.05, power = 0.8) {\r\n    beta.star &lt;- log(p2*(1-p1)\/(p1*(1-p2)))\r\n    za &lt;- qnorm(1 - alpha\/2)\r\n    zb &lt;- qnorm(power)\r\n    V0 &lt;- 1\r\n    Vb &lt;- exp(-beta.star^2 \/ 2)\r\n    delta &lt;- (1+(1+beta.star^2)*exp(5*beta.star^2 \/ 4)) * (1+exp(-beta.star^2 \/ 4))^(-1)\r\n    n &lt;- (V0^(1\/2)*za + Vb^(1\/2)*zb)^2 * (1+2*p1*delta) \/ (p1*beta.star^2)\r\n    n.int &lt;- ceiling(n)\r\n    return(n.int)\r\n}<\/pre>\n<p>and the multiple case.<\/p>\n<pre class=\"lang:r decode:true \">## all parameters as above\r\nssize.whittemore.multi &lt;- function(p1, p2, r2, alpha=0.05, power=0.8) {\r\n\tn1 &lt;- ssize.whittemore(p1, p2, alpha, power)\r\n\tnp &lt;- n1 \/ (1-r2)\r\n\treturn(np)\r\n}<\/pre>\n<p>The complete R script, the examples from the paper included, can be found under\u00a0<a href=\"http:\/\/rpubs.com\/candrea\/ssizelogreg\" target=\"_blank\">http:\/\/rpubs.com\/candrea\/ssizelogreg<\/a>\u00a0.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This posting is based on the paper\u00a0&#8220;A simple method of sample size calculation for linear and logistic regression&#8221; by F. Y. Hsieh et al., which can be found under\u00a0http:\/\/dx.doi.org\/10.1002\/(SICI)1097-0258(19980730)17:14&lt;1623::AID-SIM871&gt;3.0.CO;2-S\u00a0. We consider \u00a0the case where we want to calculate the sample size for a multiple \u00a0logistic regression with continous response variable and with continous covariates. Fomula &hellip; <a href=\"https:\/\/candrea.ch\/blog\/a-simple-method-of-sample-size-calculation-for-logistic-regression\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">A Simple Method of Sample Size Calculation for Logistic Regression<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-158","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts\/158"}],"collection":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/comments?post=158"}],"version-history":[{"count":4,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts\/158\/revisions"}],"predecessor-version":[{"id":162,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts\/158\/revisions\/162"}],"wp:attachment":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/media?parent=158"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/categories?post=158"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/tags?post=158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}