Jeromy Anglim's Blog: Psychology and Statistics


Thursday, February 19, 2009

Formatting Correlation Matrices in Psychology

Researchers in psychology often want to present a correlation matrix of the main variables in a study. This post sets out one way of producing a formatted correlation matrix that conforms to APA style.
I have written this post assuming that you are using:
  • SPSS to compute the correlation matrix
  • Excel to format the matrix
  • Microsoft Word to present the matrix
This is the most common scenario in my statistics consulting, although I have written R code that automates the entire process, which I use for my own analyses. You could also apply the steps below substituting a different statistics package to generate the correlation matrix.

The data for the example comes from the world95.sav file available in SPSS 17. The data are only used for illustration purposes. To actually run this analysis you would want to consider the possibility of transformations.

1. Decide on variables to include
Correlation matrices can take binary variables, numeric variables and ordinal variables that are being treated numerically, but not nominal variables. There is a practical issue of which variables to include. On a landscaped page, it is difficult to include more than about 17 variables. Thus, it is useful to think about the most interesting variables to include that summarise the main variables in a study. It may also be useful to present additional correlation matrices on particular subsets of variables. If there are more than 17 variables that need to be included in a correlation matrix, make the correlation matrix available in some accessible online repository. Make sure that variable names are worded such that high scores on the variable is reflected in the name. For example, if a variable is called poverty, then high scores on the variable should mean more poverty, not more income. In the case of binary variables like gender, either call the variable by the name of the higher scored variable or include a specific note to the table which says how the variable is coded (e.g., 0 = male, female = 1).


2. Inspect the data
The main issue is whether Pearson’s Correlation is a good summary of the relationship between each pair of variables. Univariate outliers, bivariate outliers, and nonlinear relationships can make Pearson’s correlation problematic. The absence of bivariate normality may also make p-values somewhat inaccurate. However, I usually find that this is a secondary issue. In many psychological studies based on self-report measures most pairs of variables correlate, and the absence of a significant correlation is often just a sign of inadequate sample size. Thus, the main story concerns the relative size of correlations and what the pattern indicates about the phenomena of interest.


3. Computing the Correlation Matrix
There are several ways of getting a correlation matrix in SPSS.
3.a Analyze - Correlate- Bivariate
3.b Analyze - Correlate - Distances [between variables; similarities]
3.c Analyze- Dimension Reduction - Factor [Descriptives=Coefficients; Exclude cases=pairwise]
Use 3.a, if you have missing data and want to know the number of participants used in each correlation
Use 3.a, if you want to know the exact p-value of your correlations
If you want to just extract the correlations for putting into a document without all the superfluous information (i.e., sample size and exact p-values), use 3.c OR use 3.a (then double click on correlation matrix in SPSS and play around with the pivot trays putting statistics into the layer)


4. Format the correlation matrix
One way of formatting a correlation matrix involves pasting the SPSS output into Excel.
It might look something like this:

4.a Add numbers to the first column (1 to the number of variables)
4.b Clear any formatting on the table
4.c Replace column names with numbers (1 to the number of variables)
4.d Remove data from the diagonal and either the upper or lower diagonal

4.e Format the correlations; the usual format in psychology involves the decimal place followed by two or three decimal places. This can be achieved in Excel by highlighting the cells with correlations in them and going to format – cells (control+1); Number Tab >> Custom >> Type = .00 or .000

4.f Add any additional table information
You may wish to add additional columns between the variable name and the first variable. Most commonly this is the mean, standard deviation, and alpha. Alpha is sometimes displayed on the diagonal.
4.g Set column widths, fonts, alignment, and lines

It should now be ready for Word.


5. Paste into Word
If the correlation matrix has more than around 8 variables, you may need to use landscape format. Inserting a single landscape page into a Word document requires the use of section breaks before and after the insertion point and converting the page layout between the breaks to landscape (see here for details).


6. Notes about statistical significance
Some people like the star system, whereby correlations are given *, **, or ***, depending on whether the p value is less than .05, .01, or .001 respectively. I prefer to display a note below the correlation matrix. The note states that correlations larger in absolute terms than a particular value are statistically significant at .05 / .01. Displaying a note produces a less cluttered table, allows for the presentation of more variables, and focuses the reader on the relative size of the correlations as opposed to statistical significance. It can also be nice to present the sample size at the bottom of the table.

To determine whether correlations which define statistical significance:
6.a Use an online calculator. For the 95% confidence interval scenario, use this. I haven’t found a general calculator for all sizes of confidence interval, but you can use the following site to indicate the p-value for an r. Thus, you can use it like the Price is Right: Plug in your sample size and then gradually increase your correlation by .01 until the p-value is just less than .05 and then just less than .01. For example, N=100, (r=.19, two-tailed p = .0583), (r=.20, p=.046) (r=.26, p=.009, ie.., p <.01). Thus, you could say, r>|.20| has p <.05; r>|.26| has p<.01. Note the use of the bars “||” to indicate it is the absolute size of the correlation that matters.


Examples from the literature