Jeromy Anglim's Blog: Psychology and Statistics


Monday, September 21, 2009

Structural Equation Modelling in R

Structural Equation Modelling (SEM) Software is frequently used in psychology. This post discusses the exciting prospect of greater support for SEM in R.

My use of SEM

I have used SEM to:
  • Run confirmatory factor analyses to examine the measurement structure of multi-factor psychological scales
  • Compare the factor structure of a scale across multiple groups
  • Examine the plausibility of various structural and mediation models. It's particularly useful when the mediation is more complex than the standard three variable scenario.
  • Estimate correlations or regression models of the latent variables (i.e., adjusting for reliability).
  • Determine parsimonious descriptions of a correlation matrix by exploring the fit that results from placing and removing various equality constraints. For example of this see: Grant, S., Langan-Fox, J., & Anglim, J. (2009). Big Five Traits as predictors of subjective and psychological well-being. Psychological Reports, 105, 201-231.
  • I have some materials on Structural Equation Modelling available online. There are of course many good books and online resources.

A little history:

I was originally taught to do Structural Equation modelling in Amos (which was bought out by SPSS, which was bought out by IBM). Among other things Amos attempted to bring SEM to the masses. The main mode of creating models in Amos is to draw them graphically. This makes it fairly easy to draw simple confirmatory factor analysis models and simple structural models. There are also many drawing tools designed to make it more efficient to draw diagrams and so on. However, the switch from simple drawing of models to testing models programmatically is a big jump, especially considering that you have to learn a programming language for something you might only use occasionally. And even drawing a single model can eventually become quite time consuming and error prone. For this and several other reasons I have been excited about the idea of running structural equation models in R.

Reasons why SEM and R are a good fit

1. Model comparison

Bad SEM style involves a researcher saying this is my model and testing only that model and ticking the Hu and Bentller fit statistics boxes. Good SEM style typically involves adopting a model comparison approach. A series of models are specified: e.g., baseline simple model, an hypothesised model, a series of plausible alternative models, and one or models models based on post-hoc theoretically justifiable refinements. R is well-suited to such a model comparison approach. Each model can be stored in a list. Fit statistics can be extracted using code. Tables for comparing models in terms of fit and nested chi-squares can easily be obtained.

2. Specification of models in R:

The challenge is to provide a way of specifying models that is easy and efficient. It should then be easy to additionally adjust models by for example specifying equality constraints, constraint relationships to zero and so on.

3. Extracting model information in R

SEM produces a lot of output. This is well suited to R where this information can be stored in a list structure. This information can then be selectively extracted as needed.

4. Writing code for SEM and R

SEM tends to be a niche statistical task. I might use it 3 or 4 times per year. Thus, learning a whole new scripting environment is annoying. Using the same programming language as R makes a lot of sense. Scripts can more easily be shared to highlight common analyses, and those with more knowledge of SEM can lead the way in how to program more advanced models.

5. Graphically representing models in R

R is great for graphics. It would be great to be able to specify an SEM model and simply run a plot function to graphically represent it with options for what information is represented and how it is presented.

6. Implementation of various preparatory processes in R:

R should make it easier to do various common preparatory activities, such as item parcelling, calculating alternatively estimates of correlations (e.g., polychoric correlations, etc.). The beauty of this is that the analysts could quickly examine the effect of tweaking various initial conditions on the final results.

7. Incremental improvement

SEM practice is constantly evolving. R programs typically adopt a modular orientation that allow for incorporation of additional procedures. E.g. new fit measures, new estimation algorithms, and so on.

Status of SEM in R

1. sem package

John Fox wrote the sem package. An article called Structural Equation Modeling with the sem package in R provides an overview. As far as I am aware, it was the first structural equation modelling package for R. It's a great package. There's less hand-holding than with Amos, and specifying models efficiently takes some getting used to. It also does not have all the fit statistics and features of some of the bigger commercial packages. It also does not appear to be under active development. There's a further discussion on a psychology wiki.

2. OpenMx package

OpenMx is a newer SEM package for R. There appears to be considerable programming and development effort going into producing a powerful SEM package for R. There's also a lot of documentation available for OpenMx.

3. lavaan package

lavaan is another new SEM package in R developed by Yves Rossel from the Ghent University in Belgium. Already it supports many of the most common structural equation modelling tasks. More information is available on the website. User friendly documentation is available.

6 comments:

  1. Hey thanks a lot for this post. I got some more details now. However I am still struggling with my data set and thus, I think it is best to discuss this on email. will it be feasible for you?

    ReplyDelete
  2. Hi Amol,
    Unfortunately, I do not have time to answer stats questions by email.
    If you wish to ask me a written stats question, follow the protocol set out here:
    http://jeromyanglim.blogspot.com/2011/03/how-to-ask-me-statistics-question.html

    ReplyDelete
  3. Dear Jeromy,

    I would kindly like to ask about the for loop within the lavaan syntax since I am facing some problems.

    ReplyDelete
    Replies
    1. I imagine the lavaan discussion forum would be the best place to ask:

      https://groups.google.com/forum/#!forum/lavaan

      Delete
  4. Hey Jeromy! I just read your article on the Big Five and subjective / psychological well-being...It's exciting! I have very similar data myself that I'm preparing for publication using SEM in R (based on my dissertation: http://www.escholarship.org/uc/item/3t34c68w). I think I'll have to cite you and your colleagues in the final version! Later, I'll be sure to see what I can replicate of your results. It should be very interesting to compare your Australian managers to my southern Californian undergraduates!

    As for the general topic of SEM in R, I've posted some tips recently over on Cross Validated about latent common factor modeling in R (http://stats.stackexchange.com/a/77314/32036) and SEM in general (http://stats.stackexchange.com/a/83545/32036). I've been studying these topics in depth lately, so I've touched briefly on a few advanced ideas and references about choice of estimator and how to relax model constraints when using the same data to both explore and confirm latent structure...but I've also laid out a sort of basic walkthrough for SEM in general (particularly for analysis of Likert scale ratings) that people who are very new to using SEM (like I was a few months ago) might find useful. I hope these are helpful, and would appreciate any feedback!

    ReplyDelete
  5. Hi Jeromy,

    I have a data set containing 30 Binary Variables (1,0) . I would like to perform SEM on these 30 Binary Variables against 4 Dependent Variables. Can you please explain the step by step procedure in R to perform the SEM analysis (R Code)

    ReplyDelete