Jeromy Anglim's Blog: Psychology and Statistics


Saturday, June 6, 2009

Normality and Transformations: A few thoughts

A lot of researchers ask me about normality and transformations. This post sets out some my advice on assessing normality and dealing with violations of the assumption.

A common decision process that I encouter is the following:
1) assess whether variables are normally distributed
2) if variables are not normally distributed, either transform them to make them normally distributed or use non-parametric tests.

While such a decision rule might keep you out of trouble, I tend to find it a bit restrictive. I also find that it is associated with a general mindset where statistics is a checkbox for drawing predetermined conclusions as opposed to using data to inform understanding. The following are a few thoughts I have about normality and transformations.

First, the shape of the distribution of variables in your dataset is an interesting question. I see many researchers who see verifying normality as a checkbox to tick. I prefer to see examining the distribution of variables as a chance to learn something about my data. Micceri's once likened the normal curve to a unicorn. He essentially showed that normality is not so 'normal' in psychology. For example, from my experience: average reaction time is often positively skewed. Agreeableness scores on the IPIP on personality tests are often negatively skewed. Clinical measures like depression are typically positively skewed in non-clinical populations. All these findings say something about the measures and about the phenomena of interest.

I often meet researchers who spend a lot of time thinking about normality and how they can find the right transformation to achieve normality. The following is some general advice:
  • Follow the golden rule of statistics: try the analyses both ways (i.e., before and after transformations; or using parametric and nonparametric approaches) and see whether your substantive findings differ. If the answer is no, this suggest that whichever option you choose, your findings will be the same.
  • Consider other measures of central tendency and spread: There is often a broader question when you have non-normal data of whether the mean and standard deviation are appropriate summary measures of central tendency and spread. For example, in highly skewed distributions, the median might be a better reflection of what is typical.
  • The true assumption of a model: Some researchers that I speak to ask me whether the fact that one of their variables is not normal is a problem. It is important to be clear on how it might be a concern. It is not the normality of the variable itself which is usually the issue. Rather, the issue of normality is only an issue in relation to assumptions of particular statistical techniques. For example, in the context of multiple regression, the assumptions are that the residuals are normally distributed, not that the dependent variable itself is normally distributed.
  • Bootstrapping: Often the concern about non-normality is that the p-values and confidence intervals of the mdoel will not be accurate. Bootstrapping provides a useful tool for estimating p-values and confidence intervals when standard assumptions are not satisfied.
  • Statistical tests of normality: Several books recommend using statistical tests to assess normality in your data, such as the kolmogorov smirnov test or the decision rule of seeing whether skew / s.e. of skew or kurtosis / s.e. of kurtosis is greater than plus or minus 2.5 or 3. While there is nothing wrong with these tests per se, as a component in a decision rule, they have a problem. From a pragmatic researcher's perspective, the issue is usually not whether normality is true in the population or not; the issue is whether normality is a reasonable enough assumption such that the results of the model (regression coefficients, confidence intervals, p-values, etc.) are sufficiently accurate. Thus, the issue is not whether normality is satisfied, but rather the degree to which it is not satisfied. It is the same reason that effect size estimates with confidence intervals are more useful than p-values. As sample sizes get larger, you get more power and therefore small departures from normality in the population are more likely to be detected by the previously mentioned tests. I recommend learning to interpret plots of distributions or thinking more about the absolute size of the skewness and kurtosis values.
For further discussion of normality and transformation: