Structural Equation Modelling in R

Structural Equation Modelling (SEM) Software is frequently used in psychology. This post discusses the exciting prospect of greater support for SEM in R.

My use of SEM

I have used SEM to:

Run confirmatory factor analyses to examine the measurement structure of multi-factor psychological scales
Compare the factor structure of a scale across multiple groups
Examine the plausibility of various structural and mediation models. It's particularly useful when the mediation is more complex than the standard three variable scenario.
Estimate correlations or regression models of the latent variables (i.e., adjusting for reliability).
Determine parsimonious descriptions of a correlation matrix by exploring the fit that results from placing and removing various equality constraints. For example of this see: Grant, S., Langan-Fox, J., & Anglim, J. (2009). Big Five Traits as predictors of subjective and psychological well-being. Psychological Reports, 105, 201-231.
I have some materials on Structural Equation Modelling available online. There are of course many good books and online resources.

A little history:

I was originally taught to do Structural Equation modelling in Amos (which was bought out by SPSS, which was bought out by IBM). Among other things Amos attempted to bring SEM to the masses. The main mode of creating models in Amos is to draw them graphically. This makes it fairly easy to draw simple confirmatory factor analysis models and simple structural models. There are also many drawing tools designed to make it more efficient to draw diagrams and so on. However, the switch from simple drawing of models to testing models programmatically is a big jump, especially considering that you have to learn a programming language for something you might only use occasionally. And even drawing a single model can eventually become quite time consuming and error prone. For this and several other reasons I have been excited about the idea of running structural equation models in R.

Reasons why SEM and R are a good fit

1. Model comparison

Bad SEM style involves a researcher saying this is my model and testing only that model and ticking the Hu and Bentller fit statistics boxes. Good SEM style typically involves adopting a model comparison approach. A series of models are specified: e.g., baseline simple model, an hypothesised model, a series of plausible alternative models, and one or models models based on post-hoc theoretically justifiable refinements. R is well-suited to such a model comparison approach. Each model can be stored in a list. Fit statistics can be extracted using code. Tables for comparing models in terms of fit and nested chi-squares can easily be obtained.

2. Specification of models in R:

The challenge is to provide a way of specifying models that is easy and efficient. It should then be easy to additionally adjust models by for example specifying equality constraints, constraint relationships to zero and so on.

3. Extracting model information in R

SEM produces a lot of output. This is well suited to R where this information can be stored in a list structure. This information can then be selectively extracted as needed.

4. Writing code for SEM and R

SEM tends to be a niche statistical task. I might use it 3 or 4 times per year. Thus, learning a whole new scripting environment is annoying. Using the same programming language as R makes a lot of sense. Scripts can more easily be shared to highlight common analyses, and those with more knowledge of SEM can lead the way in how to program more advanced models.

5. Graphically representing models in R

R is great for graphics. It would be great to be able to specify an SEM model and simply run a plot function to graphically represent it with options for what information is represented and how it is presented.

6. Implementation of various preparatory processes in R:

R should make it easier to do various common preparatory activities, such as item parcelling, calculating alternatively estimates of correlations (e.g., polychoric correlations, etc.). The beauty of this is that the analysts could quickly examine the effect of tweaking various initial conditions on the final results.

7. Incremental improvement

SEM practice is constantly evolving. R programs typically adopt a modular orientation that allow for incorporation of additional procedures. E.g. new fit measures, new estimation algorithms, and so on.

Status of SEM in R

1. `sem` package

John Fox wrote the sem package. An article called Structural Equation Modeling with the sem package in R provides an overview. As far as I am aware, it was the first structural equation modelling package for R. It's a great package. There's less hand-holding than with Amos, and specifying models efficiently takes some getting used to. It also does not have all the fit statistics and features of some of the bigger commercial packages. It also does not appear to be under active development. There's a further discussion on a psychology wiki.

2. `OpenMx` package

OpenMx is a newer SEM package for R. There appears to be considerable programming and development effort going into producing a powerful SEM package for R. There's also a lot of documentation available for OpenMx.

3. `lavaan` package

lavaan is another new SEM package in R developed by Yves Rossel from the Ghent University in Belgium. Already it supports many of the most common structural equation modelling tasks. More information is available on the website. User friendly documentation is available.

Jeromy Anglim's Blog: Psychology and Statistics

Monday, September 21, 2009