Jeromy Anglim's Blog: Psychology and Statistics

Monday, July 19, 2010

How to Process Inquisit Raw Data in SPSS

This post provides advice for processing and importing raw data from Inquisit into SPSS. It is also relevant to importing Inquisit data into other data analysis packages.


Inquisit is a tool for conducting computerised psychological experiments. It is particularly useful when timing is important. The Inquisit website provides a trial download, sample scripts, and useful documentation. I've previously posted about the benefits of Inquisit. Ron Dotsch also has an introductory tutorial

This post sets out how to process the raw data that is generated by Inquisit. Specifically, my intended audience are researchers in psychology who are intending to import raw Inquisit data into SPSS and process it in SPSS. This is a common scenario in many psychology departments, although I apply the same general logic when I import and process the data in R.

Before presenting my own tutorial, it's worth noting the resources already available. Ron Dotsch has a tutorial on processing Inquisit Data. Also, a lot can be learnt by inspecting existing SPSS data processing scripts such as that on processing the IAT.

Overview of the Process

In summary, I divide the process into the following steps:
  1. Import raw data
  2. Remove unwanted rows
  3. Remove incomplete participant data and duplicate logins
  4. Further processing in long format
  5. Restructure data file from long to wide format
  6. Merge additional wide format data into existing wide format data

1. Import raw data

The raw Inquisit data file is usually a tab delimited text file. Each row is one observation (i.e., a trial) on one individual. Variables names are in the first row. Raw data files typically include data for multiple participants. SPSS has the Read Text Wizard. It's fairly self explanatory, but here is a tutorial.

2. Remove unwanted rows

The raw data file is likely to have many rows that need to be removed. There may be items that you do not want to analyse. Also, when you have multiple cases in the data file, the variable names will be printed throughout the data file when the data for each new participant commences. For example, the following SPSS syntax retains rows where the variable time does not equal the text 'time'. See the menu Data - Select Cases: if condition is satisfied.
COMPUTE filter_$= (time ~= 'time').
FILTER BY filter_$.
Once you're satisfied that the filter has worked. You can adjust it to delete the unselected data instead of filtering:
SELECT IF (time ~= 'time').
The above logic could be extended to particular trialcodes, blockcodes, and so on.

3. Remove incomplete participant data and duplicate logins

It sometimes happens with Inquisit that participants log on to the experiment a second time. Participants sometimes click the start button more than once. In online settings participants sometimes do the experiment a second time.

There are several ways to check for duplicate logins. You can select the first trial and then get a frequency count on the subject ID. In SPSS syntax this might look like this:
COMPUTE filter_$=(blocknum = 1 & trialnum = 1).
FILTER BY filter_$.

If you have multiple logins for the one subject ID, you need to determine the valid login. In general, the valid login is the first login that involved completion of the full experiment.

Create a new variable called login that combines subject and the start time of the experiment. You can use the Inquisit variable time for this. However, I typically tell Inquisit to save an additional variable called script.starttime. It has the advantage of being accurate to the second as opposed to the minute. Thus, if a participant logs in more than once in a minute, a unique login can still be readily determined. This more precise start time can be saved to your Inquisit raw data file by adding script.starttime to your Inquisit script as seen in the following example:

/columns=[date, time, build, subject, trialcode, 
    blockcode, blocknum, trialnum, latency, response, 
    pretrialpause, posttrialpause, trialtimeout, blocktimeout, 
    correct, stimulusitem, stimulusnumber,
          display.height, display.width,
           computer.cpuspeed, computer.os,
          script.starttime, script.elapsedtime]

You can create a variable to represent a unique login with SPSS syntax like the following:
STRING  login (A50).
COMPUTE login=CONCAT(ltrim(string(subject, F12.0)),".", script.starttime).
The above code declares a string variable of maximum width equal to 50. It then computes the value of login to be the concatenated string of subject, a full stop, and the script start time. In order to concatenate in SPSS, subject needs to be converted to a character variable (assuming it is numeric). F12.0 means a number of width 12 with no decimal places. The ltrim trims white space of the left of the resulting string. You might need to tweak the above to meet your needs.

You can now go through your data file and determine which logins are unwanted. A table of frequencies of login usually clarifies which logins are incomplete.
Sometimes you'll have to determine which of two logins from the same participant occurred earlier in time or is otherwise the valid login.

This should result in a list of logins that you wish to exclude. You can use the previously mentioned selection code to do this. For example, the following could be used to remove the specified logins.
SELECT IF (login ~= '12.14:21:08').
SELECT IF (login ~= '13.15:22:23').

4. Further processing in long format

The Inquisit raw data file is in long format, which is to say that each row is a participant by trial combination. Often the aim is to convert the data file into wide format, where each row is a single participant. Many of the following steps can be performed either while the data file is in long format or after it has been transformed to wide format.

The details of subsequent steps vary substantially between studies. I'll just discuss a couple of common tasks.
4.1 Recoding responses
You might apply a filter based on trial code or stimulus property and then use transform - recode to convert responses. For example, you could:
  • Reverse code responses to selected self-report items
  • Remove or adjust certain latencies (e.g., dealing with outliers)
4.2 Aggregation
You might apply a filter and use Data - Aggregate to get a summary of a set of items. The break variable is typically subject ID. Common examples include:
  • Mean reaction time over a set of items
  • Sum of errors
  • Mean for a set of items on a scale.

5. Restructure data file from long to wide format

As mentioned earlier, the aim is often to get the data into wide format with one row per participant. This can be achieved by using the Data - Restructure tool in SPSS. It is often necessary to do this in several steps.

The general process is as follows:
  1. Prepare long format data. You typically want only three variables: ID, VARIABLE, and RESPONSE.
    • ID will typically be called subject in the Inquisit data file. It represents the participant ID.
    • VARIABLE is a string variable that uniquely identifies what will become a new variable in wide format. It is often necessary to create this by concatenating strings from variables such as trialcode, trialnum, or stimulusnumber1. Go to Transform - Compute and see the various string functions particularly (CONCAT). Also see the example earlier on using CONCAT. For example, I might concatenate trialcode and stimulusnumber1 for a personality test where stimulusnumber1 records the item number. It's essential that each value of VARIABLE has only one value for each participant ID. It's also best if the values of VARIABLE do not include spaces.
    • RESPONSE is the actual value of the variable that you want to extract. This might be the actual response or it might be the latency.
  2. You may also need to filter out various rows that are not part of the current export. You'll also need to temporarily delete the many variables in the raw Inquisit data file that are not needed.
  3. Run Data - Restructure: Restructure selected cases into variables. The identifier variable is ID and the index variable is VARIABLE.
This process should result in the conversion from long to wide format. You may need to repeat this process a few times as you extract different information (e.g., latency, response data, etc.).

6. Merge additional wide format data into existing wide format data

If you have existing wide format data on participants or if you created a wide format through aggregation, or if you have multiple restructured files from the previous step, you'll probably want to merge the files together. This is straight forward in SPSS using Data - Merge - Add variables. UCLA has a tutorial on merging in SPSS, but here are a few basic tips:
  • Ensure that ID variables are named the same in the two data files.
  • Sort the ID variables in both datasets before merging.
  • Ensure that the formatting of ID variables are the same (e.g., make them both numeric or if they are string ensure that they have the same width).
  • Ensure that the variable names other than ID have distinct names across the data files.

Additional Resources

Of course there are many other steps you might take when preparing your data. The preceding are simply some common issues to processing Inquisit raw data. The following resources are also relevant to the present context: