Jeromy Anglim's Blog: Psychology and Statistics

Monday, October 26, 2009

Scale Construction | Item Reversal, Scale Scores, Reliability, and Metadata

This post discusses how to: (a) use exploratory factor analysis output to determine items to retain on a test; (b) run syntax that reverses items and produces scale scores for a test; (c) calculate reliabilities for scales using retained items. Various tips are provided to make the process more efficient and less error-prone. The example uses SPSS, but many of the ideas would generalise to other statistics packages.
The context:
Research context: The research example is based a set of items informally produced by a class that I teach. Students responded to all items on a five point scale where 5 means strongly like me. The aim of the exercise was to determine the factor structure of this class test, remove undesirable items, calculate scale scores for factors using retained items, calculate reliabilities for the scales, and correlate the class test scales with scales on an established personality test called the IPIP.
Note: the process of item development was rather ad hoc. Thus, the point of the exercise was NOT to highlight best practice in item development. Rather, the aim was to show how to calculate scale scores and apply some basic principles of item and factor assessment to refining a scale.
Relevance of context: The above scenario is very common in psychology. Computing scale scores and reliabilities is common to any setting where psychological tests are used. The additional step of refining the scale, determining the factor structure, and making decisions about items to retain pertains more to settings where the scale is in some sense novel. Such novel settings include: (a) when a researcher is developing a new scale from scratch; (b) when a researcher has adapted an existing scale; (c) when a researcher believes an existing scale should be modified on factor analytic or theoretical grounds; (d) when a scale is being applied in a novel setting or there are other reasons to believe that the typical factor structure is inappropriate.

Outline of Procedure
1. Determine factor structure
2. Calculate scale scores
3. Calculate reliabilities
4. Use scale scores

1. Determine factor structure
Aim: This step aims to determine: (a) the number of factors to retain; (b) whether an item should be included in the calculation for a given scale; (c) whether an item should be reversed.
Assumptions: I assume that each item will be assigned to only one factor and that each item will be equally weighted in the composite (i.e., a simple mean of items).
Process: There are several ways to complete this task. The following is one procedure that I find relatively efficient.

1.1 Create an item database: An item database has one row per item and a series of columns. As part of the above example, I set up an Excel table with the 64 items. The data base initially includes the item number, item text, a label for SPSS, and the variable name in the data file.

Figure 1. The picture below shows a partial screen shot of the item database.

1.2 Fill-in additional information in item database: Specifically, additional columns are required to indicate what scale the item is assigned to, whether the item is reversed, and whether the item is retained. Much could, and has, been said about the process of determining the number of factors and determining which items to retain. It is not my aim to discuss these issues here. If you want to learn more, see my post on procedures for scale development and the role of exploratory factor analysis, where I list a number of theoretical and procedural resources. I assume here that a set of rules have been adopted and that the application of these rules has resulted in a final factor loading matrix which captures the number of factors, item assignment to factors, and whether an item should be reversed.

Basic SPSS syntax for running a factor analysis might look like this, where "personality1" to "etcetera" is replaced by the variables representing each item in the test.
  /VARIABLES personality1 personality2 etcetera

I find it useful to update the retained list of variables in the Excel item database. Then, if I want to rerun the analysis with just the items that are retained I can use the Excel item database to filter only the rows of retained items. I can then select just the variable names required and paste this string of variable names into the syntax. I discuss this idea of efficient variable selection in in two other posts, one on R, and one on SPSS
Thus, I start with a columns called "retain" will 1s in all rows. As the decision is made to exclude items, the value for excluded items in the "retain" column is changed to 0.

For example, in this class constructed personality test, a couple of items were accidentally included twice. These duplicates can be identified and flagged to be excluded:

Figure 2. The picture below shows a partial screen shot of the item database with duplicate items flagged for deletion.

I can use this database to select rows where "retained" = 1; and then copy and paste the variable names into space in the Excel file; and then paste this column of variable names into the appropriate section of the factor analysis syntax template above (for details, see this post). This procedure is iterated a number of times. The factor analysis gets run and items are identified as problematic and flagged for removal. The status of discarded items is changed in the "retain" column; the Excel table is filtered again; the retained variable names are copy and pasted into SPSS. The procedure ends once all items are deemed to be acceptable. The final result should be a factor loading matrix which indicates which items are to be retained, which factors each item is assigned to, and whether the item should be reversed.

Figure 3. The image below shows what an item database could look like in Excel and illustrates the idea of filtering items.

Figure 4. The image below shows partial output from SPSS showing the factor loadings for the class personality test after excluding several particularly poor items. 

Clearly this ad hoc class personality test is far from perfect. Several items have moderate loadings on more than one factor. Some items lack large loadings on any factor. However, present purposes I assume it is satisfactory.

The next step is to fill-in the remaining information in the item database. There are several ways to do this, the following is just one.
First, labels are assigned to each of the extracted factors. The first factor included positive loadings for items like "I am the life of the party" and negative loadings for "I prefer to stay at home". This factor might loosely be described as extraversion, although a more accurate definition might be gregariousness. The second factor has all positive loadings with the largest loadings coming from the following items: "I love facing challenges", "I am willing to try new things", "I appreciate and frequent exhibitions". The factor might be called openness to experience, possibly with elements of sensation seeking. The third factor is all about preferences to work in teams. The fourth factor includes items, "I am always on time", "I am rarely late", and negative loadings for items like "I am always in a hurry". This factor might loosely be called conscientiousness or more specifically seems to reflect punctuality. The fifth and final factor has negative loadings for items like "I tend to avoid conflicts with friends" and "I tend to go along with what other people expect of me", and positive loadings for "I find it hard to control my anger" and "I am a bit aggressive". The factor is thus measuring disagreeableness or possibly aggressiveness.

Factor names are then entered into the item database. This can be done by creating a separate table in Excel with three columns: "item number", "factor" and "reverse".
Step 1: paste in the factor loading matrix into Excel (Paste special - unicode; alt + e + s - unicode)
Step 2: insert two column between the variable names and the factor loadings;
Step 3: record the factor name for each item in the first new column (use copy and paste to make this quick)
Step 4: record whether the item should be reversed in the second new column; I use a "1" to indicate a non-reversed item and a "-1" for a reversed item. Item reversal is indicated by a negative loading on the main factor.
Step 5: extract out the item numbers in a column to the left of the scale and reverse columns. In this case I copy and pasted the item text; highlighted the copy of the item text; ran the text to column Wizard and suggested delimited by "Other" "Q" and then repeated the process but this time delimited by ".". This left me with just the item numbers which could be copy and pasted back into the first column in the factor loading table in Excel.

Figure 5. The image below shows a partial screen shot of the factor loading matrix in Excel after applying the above steps.

Step 6: Convert the three columns (item number, scale, and reverse) of the factor loading data in Excel into a named range. I called mine "classPersonalityLoadings".

Step 7: run vlookup functions (see here for an example) in the original item database to incorporate the information.
I placed this function in my "scale" column of the item database:
=VLOOKUP(A2, classPersonalityLoadings,2,FALSE)

And this one in my "reverse" column of the item database
=VLOOKUP(A2, classPersonalityLoadings,3,FALSE)

In words the above functions are using the item number (A2) to lookup the table "classPersonalityLoadings". If the item number is found in the first column of "classPersonalityLoadings" (i.e., there is an exact match; hence the FALSE argument), then the designated column (2nd in the first example; 3rd in the second example) is looked up for the corresponding row.

Figure 6. The result is an excel Item database like this.

This ends step 1.

2. Calculate scale scores
This is where we see some of the major benefits of the item database.
The following provides a basic template for the syntax for computing scale scores for the five scales. The basic point is that items are first reversed and then scale scores are calculated.
* reverse items before computing scale scores (assumes 1 to 5 scale).
DO REPEAT x = [originalVariables] /
xReversed = [reversedVariables] /
xMultiplier = [1s and -1s depending on repeat status]
compute xReversed = x.
if (xMultiplier = -1) xReversed = 6 - x.

*compute scale scores.
COMPUTE [factorName] = mean([variable names separated by columns).

Applying the above template to the present example, I simply: sort items by item number and filter on retained items in the item database. then extract
the reverse column and insert it, where it says: [1s and -1s depending on repeat status]
the original variable names where it says [originalVariables]
and the [reversedVariables] are created by adding an extra column to the item database with a suitable name such as "personalityReversed1", "personalityReversed2", etc.
Note: When copying and pasting out of the item database, it may be easiest to follow this procedure: (1) copy the cells; (2) paste the cells into an empty part of the Excel database; (3) paste special into Word (alt + e, s - Unformatted); (4) run replace "^p" with " " (i.e., replace new lines with a space); (5) copy and paste into the syntax.

Here's what the reversal syntax could look like after copy and pasting data from the item database as set out above.
* reverse items before computing scale scores (assumes 1 to 5 scale).
DO REPEAT x = personality1 personality3 personality4 personality5 personality6 personality7 
 personality8 personality9 personality10 personality12 personality14 personality16 
 personality17 personality19 personality20 personality21 personality22 personality24 
 personality25 personality27 personality29 personality32 personality33 personality34 
 personality35 personality37 personality38 personality39 personality40 personality41 
 personality42 personality44 personality45 personality46 personality47 personality48 
 personality49 personality50 personality52 personality53 personality54 personality55 
 personality56 personality57 personality58 personality59 personality60 personality61 
 personality62 personality63 personality64 /
xReversed = personalityReversed1 personalityReversed3 personalityReversed4 personalityReversed5
 personalityReversed6 personalityReversed7 personalityReversed8 personalityReversed9 
 personalityReversed10 personalityReversed12 personalityReversed14 personalityReversed16 
 personalityReversed17 personalityReversed19 personalityReversed20 personalityReversed21 
 personalityReversed22 personalityReversed24 personalityReversed25 personalityReversed27 
 personalityReversed29 personalityReversed32 personalityReversed33 personalityReversed34 
 personalityReversed35 personalityReversed37 personalityReversed38 personalityReversed39 
 personalityReversed40 personalityReversed41 personalityReversed42 personalityReversed44 
 personalityReversed45 personalityReversed46 personalityReversed47 personalityReversed48 
 personalityReversed49 personalityReversed50 personalityReversed52 personalityReversed53 
 personalityReversed54 personalityReversed55 personalityReversed56 personalityReversed57 
 personalityReversed58 personalityReversed59 personalityReversed60 personalityReversed61 
 personalityReversed62 personalityReversed63 personalityReversed64 /
xMultiplier = 1 1 1 1 1 -1 1 -1 1 1  -1 1 1 1 1 1 1 -1 1 1 1 -1 1 1 -1 -1 1 -1 1 
-1 1 1 1 1 1 1 1 1 1 1 -1 1 1  -1 1 1 1 1 -1 -1 -1.
compute xReversed = x.
if (xMultiplier = -1) xReversed = 6 - x.

The compute statements can then created also using the item database. (1) Filter the item database so that only retained items are shown and sort by scale; (2) for each factor copy the reversed variable names [copy into space area in Excel]; (3) copy and paste into Word; (4) replace "^p" with ", " (i.e., the syntax which is required for the compute statement); (5) paste into the compute syntax; (6) give the factor a variable name.

COMPUTE personalityDisaggreeableMean = mean(personalityReversed7, personalityReversed9, personalityReversed14, personalityReversed33, personalityReversed45, personalityReversed60).

COMPUTE personalityGregariousnessMean = mean(personalityReversed5, personalityReversed21, personalityReversed25, personalityReversed27, personalityReversed29, personalityReversed32, personalityReversed37, personalityReversed39, personalityReversed41, personalityReversed42, personalityReversed46, personalityReversed48, personalityReversed49, personalityReversed52, personalityReversed53, personalityReversed57, personalityReversed59, personalityReversed62, personalityReversed63).

COMPUTE personalityOpennessMean = mean(personalityReversed4, personalityReversed6, personalityReversed12, personalityReversed16, personalityReversed17, personalityReversed19, personalityReversed20, personalityReversed38, personalityReversed40, personalityReversed47, personalityReversed55, personalityReversed58).

COMPUTE personalityPunctualityMean = mean(personalityReversed24, personalityReversed34, personalityReversed44, personalityReversed54, personalityReversed56, personalityReversed61, personalityReversed64).

COMPUTE personalityTeamworkMean = mean(personalityReversed1, personalityReversed3, personalityReversed8, personalityReversed10, personalityReversed22, personalityReversed35, personalityReversed50).


And here's what the syntax might look like. When this is run, this will generate the required scale scores.

3. Calculate reliabilities
The combination of the item database and the reversed items can then be used to calculate reliabilities.
I discuss this extensively here.

But here's one example: (1) set up the basic options under analyze - scale - reliability analysis and press paste; (2) copy in the variable list from the corresponding compute statement into the section of the reliability syntax that expects the variable list; (3) remove all the commas either manually or using a highlight and replace selection; (4) run the reliability syntax.
  /VARIABLES=personalityReversed7 personalityReversed9 personalityReversed14 personalityReversed33 personalityReversed45 personalityReversed60
  /SCALE('Disagreeable') ALL

Figure 7. This then yields the output below (i.e., a mediocre scale with one item (item 9) that probably should be removed).

4. Use scale scores
Then comes the easy part. We can run an analysis. For example, we could correlate the five new scales with some established IPIP scales.

Figure 8. The image below shows a partial extract of a correlation matrix from SPSS (Analyze - Correlate - Bivariate)

Further formatting would substantially improve the readability of this correlation matrix. See this post for more details.

Concluding comments
Data analysis can be done in many different ways. The above procedure for refining and using a scale is just one way. The example hopefully highlights: (a) the importance of learning little shortcuts when doing data analysis; (b) the importance of methodically representing metadata; and (c) the way that data manipulation is often far more time-consuming than the analysis itself.

Additional Resources


  1. Hello Jeromy!

    Thanks for providing an interesting blog! I’d appreciate it greatly if you could give me some advice or direct me to relevant resources concerning scale development.
    Being a bit over ambitious 3 year student (psychology) I was/am interested to research the relationship between personality in first year students and their future academic performance (i.e. study pace, GPA, drop-out and motivation). I was afraid social desirability would influence the responses, as it was administered during their admissions to my university. Thus, I created my own item pool, inspired from IPIP scales. The questionnaire contains 160 items and is answered with a 6-point likert-scale.
    I have now finished gathering my sample, resulting with approx. 500 respondents. The next procedure for me, which I sadly understand too little about, is to perform dimension reduction (I use SPSS):

    a) Decide which factor solution and rotation would be appropriate
    b) Remove items with low factor loadings. Back to Step a.
    c) Decide on the numbers of factors to extract
    d) Convert raw scores into to new scores based on their factor loadings appropriate for further regression analyses to my dependent variables.

    1. Do you have any ideas or advice, which factor-solution & rotation would be appropriate for each step?
    2. Which method would you recommend to compute factor scores?

    Greetings from Finland,

  2. Hi Christopher,

    If you are using an established psychological scale like the IPIP, then you would generally just score the test as the test designers intended it to be scored.
    This allows your results to be easily comparable to other studies that have used the same scale.
    In such cases the exploratory factor analysis is generally just a guide to check that such an approach is reasonable.

    However, if you've modified the items a lot, then yes, you would need to perform exploratory factor analysis. In general, creating a scale is a big job, and I would advise against modifying personality tests.

    That said, you have adapted the test and perhaps you have good reasons for doing it.
    In terms of resources, check out:

    The EFA lecture and tutorial as part of this course:

    And here are some general readings of EFA: