Jeromy Anglim's Blog: Psychology and Statistics

Friday, October 9, 2009

Introduction to SPSS Syntax | Advice for Conducting Reproducible Research

This post provides an overview of SPSS syntax for researchers using SPSS. It sets out (1) why it is important to use syntax, (2) tips on how to use and learn syntax, (3) tips on dealing with errors; 4) tips on organising your syntax, (5) additional resources to learn more, and (6) where to go after you have reached the limits of SPSS syntax.

First, I must confess that I rarely use SPSS these days. I tend to do almost all analyses in R (I have a post on learning R here). Nonetheless, most researchers in psychology with which I interact use SPSS. Thus, this post aims to provide guidance on the use of syntax in SPSS.

1) Why use Syntax?
  • Syntax facilitates reproducible research. To state this more strongly, it could be argued that it is unethical to report results from research where the process for generating the results can not be audited by an independent third party.
  • Errors can be diagnosed.
  • Repeated analyses can be done more efficiently.
  • The procedure for generating the data and results can be communicated more efficiently with others. In particular, other factors permitting, researchers who publish their raw data and the code for generating their analyses at the time of publication provide a greater benefit to the world than those that do not.
  • If you need to return to a project months or years later, syntax provides a better record of what was done.
  • Syntax encourages a better mental model for thinking about data analysis.
2) Tips on Using Syntax
  • Pressing F1 when the cursor is over some syntax will bring up the syntax help for that command. It's important to learn to read these syntax help templates.
  • Using the menus to set up a command then pressing "Paste" is an excellent way to learn syntax. This can often be used to set up the basic syntax and then you can alter the variables used or various attribute values to suit your purposes.
  • Single-line comments can be added by placing a "*" at the start of the line and a "." at the end of the line.
  • There are several ways to run syntax (by line, the whole file, the selection). Highlighting the syntax you want to run and pressing run is generally the best option.
  • Control + A will highlight all of the syntax in a file; Control + R will run the highlighted syntax or where no syntax is highlighted, it will run the current command.
  • Since version 17, the syntax editor has improved substantially. It now has colour coding, auto-completion, consistent use of a mono-spaced font, an error-pane, and a navigation pane.
  • Save your syntax files.
3) Common Syntax Errors
  • Common errors include:

    • Leaving off the period (".") at the end of a command
    • Just pressing run instead of highlighting the syntax that you want to run.
    • Leaving out or not highlighting the command "EXECUTE." when performing transformations and certain other data manipulations.
    • Incorrectly writing variable names: this includes typos, putting spaces where there should not be

  • Tips on debugging syntax

    • Error messages are displayed in the Output window; Read them
    • Delete all messages in the output window before running syntax. This makes it easier to determine which error messages apply to your most recently run syntax.
    • Run the syntax one line at a time. This helps to show which part of the syntax is causing problems.

  • Raynald offers further tips for debugging SPSS syntax.
4) Tips on Organising Syntax
There are many ways to organise SPSS syntax. This is my advice on one way to get researchers new to syntax started.
  • Have at least three syntax files: 

    • 1) Data input, metadata, and permanent manipulation: This would typically include a data import command, metadata commands such as (value labels and missing), and commands for transforming the data (e.g., compute, if, do repeat, recode, etc.).
    • 2) Analyses and temporary manipulation: In a typical example this would include all the syntax for generating your descriptive statistics, correlations, regressions, and graphs.
    • 3) Temporary: analyses that are not retained. For example, if you try out a number of analyses in the process of finding the right one, you might choose to file the analyses that you do not retain in a temporary file.
    • The rationale for this separation is as follows. Data input and permanent manipulations need to be done correctly. Having these in one place allows for easy auditing of accuracy by yourself or others. Analyses depend on the data but should not alter the data. Thus, analyses logically need to be run after input and permanent manipulations have been performed.
    • I define temporary manipulations as manipulations to the data used for a specific analysis. Examples of temporary manipulations include filtering cases for a particular analysis, creating a log transformed version of a variable for a single analysis, and so on. It seems effective to place these just before the particular analysis. This should make analyses clearer.

  • Regularly use comments in a line or two above syntax to document your decisions.
  • Use white space strategically. Sets of commands that can be logically grouped together can be written with no line between. When a new set starts, one or two lines can be placed in between.
  • While SPSS syntax is NOT case sensitive, code can be easier to read if you follow the convention of writing commands in CAPITALS and variable names in lower case.
  • Thus, the aim is to keep your syntax as a record of what you have done. Writing and organising syntax is a skill like writing standard prose. It has its own conventions. Clear writing facilitates clear communication.
5) Additional Resources
6) Final Note
  • If you are a researcher who only uses SPSS and you are getting good at SPSS syntax and want to move up to the next level of control, you may want to check out R.