CFA with a single factor

Fit values for a CFA with a single factor can only be computed with models with more than 3 observed variables. With 3 obeserved variables and one factor you have to compute 6 free parameters (with one fixed). But your covariance matrix has also 6 nonredundant elements. Hence you get df=0 degress of freedom. A chi-squared distribution with df=0 is constant 0, hence we can expect the test statistic to be 0.

CFA single factor


Little letters are also important

In the PISA 2012 Technical Report on page 312 (chapter 16), the formula (16.3) for the Partial Credit Model is printed for computing scale parameters. Looking at it twice, one can see that something must be wrong with the subscribs used for the sumations.

P_{x_i}(\theta_n) = \frac{\displaystyle\exp\sum_{k=0}^x(\theta_n-\delta_i+\tau_{ij})}{\displaystyle\sum_{h=0}^{m_i}\exp\displaystyle\sum_{k=0}^h(\theta_n-\delta_i+\tau_{ik})}


By consulting the chapter “Polytomous Rasch Models and their Estimation” in the book “Rasch Models” (Fischer, 1995), and combining the formulas (15.3) with (15.8) there, we get

P(X_{vi}=h) = \frac{\displaystyle\exp(\phi_h\theta_v + \sum_{l=0}^{h}\alpha_{il})}{\displaystyle\sum_{l=0}^m \exp(\phi_h\theta_v + \sum_{j=0}^{l}\alpha_{ij})}


Now by comparing the meaning of all the symbols and applying it on the formula from PISA Technical Report – by respecting as much as possible of the notations used by them – we get

P_{x_i}(\theta_n) = \frac{\displaystyle\exp\sum_{k=0}^{x_i}(\theta_n-\delta_i+\tau_{ik})}{\displaystyle\sum_{h=0}^{m_i}\exp\displaystyle\sum_{k=0}^h(\theta_n-\delta_i+\tau_{ik})}


We see that we have to change \tau_{ij} in the numerator of the very first formula above to \tau_{ik}, and x to x_i.

That makes sense!

A Simple Method of Sample Size Calculation for Logistic Regression

This posting is based on the paper “A simple method of sample size calculation for linear and logistic regression” by F. Y. Hsieh et al., which can be found under<1623::AID-SIM871>3.0.CO;2-S .

We consider  the case where we want to calculate the sample size for a multiple  logistic regression with continous response variable and with continous covariates.

Fomula (1) in the paper computes the required sample size for a simple logistic regression, given the effect size to be tested, the event rate at the mean of the (single) covariate, level of significance, and required power for the test. This formula is implemented in the function  SSizeLogisticCon()  from R package “powerMediation” and can easily be applied.

For the multiple case, Hsieh et al. introduce the variance inflation factor (VIF), with which the sample size for the simple case can be inflated to get the sample size for the multiple case. I have implemented it as R function:

Another approximation for the simple case is given in Formula (4), and is based on formulae given by A. Whittemore in “Sample size for logistic regression with small response probability”. I have implemente the simple case,

and the multiple case.

The complete R script, the examples from the paper included, can be found under .


New R Package “betas”

In social science it is often required to compute standardized regression coefficients – called beta coefficients, or simply betas. These betas can be interpreted as effects, and thus are independent of the original scale.

There are two approaches how you can obtain betas. You Z-standardize the data before fitting or you compute the betas after fitting the model.

The first approach has flaws if you have non-numeric variables in your data or if you intent to incorporate interaction terms in your model.

The second approach is a way more convenient, but until now there was no R package helping you compute betas for as many kinds of models as you needed. For example with lm.beta() from “QuantPsyc” Package you cannot handle models with factors with more than two levels.

The features of the “betas” R package are (so far for v0.1.1):

Compute standardized beta coefficients and corresponding standard errors for the following models:

  • linear regression models with numerical covariates only
  • linear regression models with numerical and factorial covariates
  • weighted linear regression models
  • all these linear regression models with interaction terms
  • robust linear regression models with numerical covariates only

You can install the package from CRAN (

The package is maintained on GitHub: .

Feel free to report issues: .

Enjoy hassle-free computations of betas in R!

Robust Standardized Beta Coefficients

Standardized beta coefficients are definded as

beta = b * sd_x/sd_y

where b are the coefficients from OLS linear regression, and sd_x and sd_y are standard deviations of each x variable and of y.

In the case where you performe a robust linear regression, sd_x and sd_y seems not be very meanigfull anymore, because variances and hence standard deviations are not robust. The R package “robust” provides the function covRob() to compute a robust covariance estimator.

I have written the following function to compute standardized beta coefficients for a robust linear regression. Setting the parameter classic=TRUE gives you the classic estimation of the beta coefficients. For very bad data, the covRob() function cannot compute the covariance due singularities. In this case the classical estimator is returned.

 UPDATE — 2014-07-23

Computing standard deviations for factors makes sense, because variance is definded for the binomial distribution.  So I have removed the num variable.


Select Pupils by Number of Pupils per Group

Selecting rows conditioned on values in columns is easy, as for example selecting people aged over 33. What is about selecting rows conditioned on statistics computed on multiple rows of the data frame, as for example selecting pupils in groups by the number of pupils per group?

That is where the very nice dplyr package comes in.

We build and print the data frame:

Now, we want to select – i.e. “filter” in terms of the dplyr package – pupils that are part of groups/classes with more than two pupils per class. In dplyr there are three different syntaxes to achieve this.

The result is the same for all:

Of course, you can do this the pure-R-way,

but I think with dplyr it looks quite a bit nicer.

Happy dpylr!

Installing JAGS 3.4.0 under OS X 10.9 Maverick

First of all see the excellent installation manual by Martyn Plummer and Bill Northcott (JAGS Version 3.4.0 installation manual) at section 2 “Mac OS X”.

For the rest of us (me included) thinking   ./configure without any options will do the job are getting the following error:

(It will not work even with the   --with-lapack='-framework vecLib' option.)

Do not follow this instruction! You do not have to install the LAPACK library because on OS X 10.9 the optimized (accelerated by Apple engineers) version of LAPACK is allready installed, see the LAPACK(7) Mac OS X Manual Page.

From now on just type the slightly modified commands in your bash-shell (terminal):

Of course, make sure you have installed all the required tools before executing the commands.

Happy ./configuring!

Raspberry Pi Camera sends Pictures to ownCloud

Of course, there are many variants to take pictures with your raspberry pi camera and save them somewhere.

I use my raspberry pi camera to observe my flat when I am away. I want to do this in a as secure way as possible, i.e., I do not want to open any special ports on my router nor do I want to send unencrypted images over the web.

Inspired by  my idea was to take a photo every 5 or 10 minutes and save them to my ownCloud server via WebDAV with SSL encryption.

Step by step:

1.) Make sure you have a raspberry pi with a camera module and an owncloud installation reachable over https  somewhere out in the web (with picture app activated).

2.) Mount your owncloud drive on your raspberry pi: Create the directory /home/pi/owncloud  and add the following line to fstab:

then mount the drive by calling  mount owncloud/  or by calling  echo -e "y" | mount owncloud/   if your server has a self-signed certificate as mine has.

3.) My idea is to take a picture, say, every 10 minutes, and take them for 24 hours with any need to remove them manually. So, write a script like the following, name it “” and make it executable:

4.) Set up the crontab by calling  crontab -e  and add the following line:

5.) You are finished. Raspberry pi will take every 10 minutes a photo and save them to your ownCloud where you have a beautiful picture viewer that you can access wherever you are on whatever for a device.

Compute SPSS like mean index variables

Suppose the problem discribed under

The point is to compute meaningfull mean index variables while missing values are present. In R you have the switch  na.rm  to tell a function – here  the mean()  function –  what to do with missing values. Setting this to true – mean(..., na.rm=TRUE)  – forces the function to use all non-missing values, even if there is only one. In the other case – mean(..., na.rm=FALSE)  – the function will return  NA  even if there is ony one missing value.

To handel this situation I have written a very handy function that works like the MEAN.{X}() function in SPSS, where {X} denotes the minimal number of variables that should be non-missing to be incorporated in computing the mean value.

My single line R function looks like

As the first argument you have to pass the variables (in columns) and the second argument is the minimal number of variables that should be non-missing.

Have fun(ction)!


The Beauty of Unique Alphanumeric Random Codes

My challenge was to generate random codes of variable length and size. The codes should be all unique and build of alphanumeric symbols. I wrote the following function using the build-in function sample():