Jeromy Anglim's Blog: Psychology and Statistics


Friday, October 30, 2009

How to Reason about Causes in Psychology | When does Correlation Co-occur with Causation?

This post is on causal inference. Errors in causal inference in the social and behavioural sciences are prevalent both in the scientific literature and in the media. This post (a) suggests articles on causality to read, particularly for researchers in psychology; (b) discusses and critiques material on the Internet on causality; and (c) provides further links to material on causality.


INTRODUCTION
I have casually observed students, researchers, and lay people alike evolve in their understanding of causal inference. At level 1, association is assumed to be causal. For example, a negative correlation between IQ and smacking is seen as evidence of the causal effect of smacking. At level 2, it is believed that correlation says nothing about causation. This is a little bit like a child who says the word "brang" instead of "brought". The idea that correlation does not imply causation may keep someone out of trouble most of the time, but it does not help much in trying to determine what does imply causation. Another incorrect ideas at this level is that a statistical correlation does not mean causation, but that ANOVA or SEM or some other statistical method does mean causation (WRONG!). Level 3 involves having a more nuanced understanding of causality. At the risk of preaching to the converted, this post presents ways of  acquiring this more sophisticated understanding of causality.

IMPORTANT PERSPECTIVES
APA Statistical Task Force (1999, Statistical Methods in Psychology Journals: Guidelines and Explanations): In addition to the quoted section below, there are relevant sections on random and nonrandom assignment.
"Causality. Inferring causality from nonrandomized designs is a risky enterprise. Researchers using nonrandomized designs have an extra obligation to explain the logic behind covariates included in their designs and to alert the reader to plausible rival hypotheses that might explain their results. Even in randomized experiments, attributing causal effects to any one aspect of the treatment condition requires support from additional experimentation." APA Task Force
The article then goes on to elaborate on these principles. The article recommends setting up the problem of causal inference in terms of missing data based on the approach of Rubin (1976, Inference and Missing Data) article (i.e., Rubin's Causal Model). This article is essential reading for any researcher in psychology both in terms of what it says about causality and what it says about other aspects of conducting and reporting research.


Wright (2006, Causal and Associative Hypotheses in Psychology): An excellent and accessible article for students and researchers in psychology wanting to learn about causal inference. I'd recommend this as a starting point

Holand (1984, Statistics and Causal Inference): This article describe's Rubin's Causal Model in a relatively accessible form.

Pearl (2009, Causal Inference in Statistics)

ADDITIONAL PERSPECTIVES



WEB RESOURCES: DESCRIPTION AND CRITIQUE
The following discusses some sites that come up on Google when you search for "correlation is not causation" (October, 2009).

Statistics-help-onlineDESCRIPTION: The site provides examples where our intuition tells us that an observed correlation is or is not causation. It suggests that an experiment or a clear explanation of a process may be sufficient to infer that a correlation is causal in a particular direction. CRITIQUE: (1) The site does not set out principles in much detail in order to know whether a correlation does mean causation. (2) It somewhat implies in its discussion of the smoking example that public consensus is sufficient to accept a causal claim.

Cambridge 2000 memos: DESCRIPTION: The site mentions that precise mathematical definitions are available but for accessibility purposes it adopts dictionary definitions of correlation and causation. The site gives several interesting examples of correlations and shows how these correlations do not necessarily mean causation. The site then goes on to give alternative interpretations which the site finds more reasonable. CRITIQUE: (1) The site provides nice examples and does a good job of highlighting how incorrect inferences about causation from correlation can be used for political purposes. (2) While the alternative explanations provided on the site for the correlations do seem more persuasive, they also are based on intuition and a basic theory of how the world works. Thus, the site does not get to the heart of the issue of how to infer whether a correlation is causal, and if it is not how to infer the reason.

Information Works!DESCRIPTION: The site notes some of the alternative explanations for an associations (coincidence, third variables, partial causation, mutual change over time). The site suggest that causation is likely when (a) there is a reasonable causal explanation; (b) connection exists under varying conditions; (c) confounding variables are ruled out. The site then asserts that controlled experiments involving manipulation of a variable are the best way to infer causal effects.The site present causal inference in correlation settings as the combination of both correlation and additional knowledge about the processes operating in the domain. CRITIQUE: (1) coincidence can explain some correlations. That's what null hypothesis testing is all about. But if the sample size is large, as is the case with many correlations in psychology (and it helps if the correlation is of at least moderate size), coincidence can be reasonably ruled out.  (2) The post does not answer the question of what is a reasonable causal explanation. (3)  If a correlation exists under varying conditions, it suggests that the correlation generalises. It still says nothing about whether there is a causal effect and if one exists, what direction it operates. (3) The post does not discuss how to rule out confounding variables. In practice this is quite difficult. Does inclusion of a confounding variable in a regression equation suffice? My answer is "probably not".

Jonathon Mueller: DESCRIPTION: The author provides a large set of links on headlines to give students practice in evaluating causal claims in the media.

Stats and George MasonDESCRIPTION: The site claims "If one action causes another, then they are most certainly correlated." CRITIQUE: It depends what is meant by this. I can think of examples where there is causation but no correlation. For example, perhaps instructions not to fake a personality test make some people more honest and some people more dishonest. The effects could cancel out: No correlation, but a causal effect.




Other posts on causation

1 comment:

  1. Two points, first we can think of correlation as necessary, but not sufficient to signify causation. Second, related to your example of causation without correlation, there is correlation, but you've simply made the problem conditional on "reactivity" or whatever factor compels people to not follow directions, e.g.,

    cor(x,y)

    vs.

    cor(x,y|R)

    Where R is "Reactivity" or some such stand in.

    (I enjoy the posts and resources you pass along, by the way!)

    Scot

    ReplyDelete