Jeromy Anglim's Blog: Psychology and Statistics


Friday, March 26, 2010

Causal Inference from Aggregate Data: The Facebook Syphilis Case Study

The Age has an article on the reported connection between using Facebook and Syphilis that is doing the rounds on the Internet. While the idea that Facebook is facilitating social connections seems plausible. And while Facebook may play a role in facilitating risky sexual encounters, the evidence offered (at least that presented in the news reports) for the connection is poor, and provides a good case study in how NOT to reason from data. The following quote seems to report the gist of the claim:
"Data published by several British newspapers this week indicated that cases of syphilis had increased fourfold in Sunderland, Durham and Teesside - the areas of Britain where Facebook is most popular." Source: The Age

Critical Points

I just wanted to make three quick points, less regarding the specifics of the issue, and more regarding issues to think about when drawing inferences from data.

1. What about random sampling?

While I haven't seen the original analyses, it is possible that the association between county syphilis levels and Facebook use is NO MORE than you would expect from random sampling. I wonder if there is a complete set of data showing the correlation between county syphilis levels and Facebook usage. Showing that a few cases high on one variable are high on another variable is inadequate.

2. Correlation is not causation

Several people have noted that the claim involves a confusion of correlation with causation. (see my earlier post on the reported link between spanking and IQ: ) While it is possible that increasing Facebook use in a community causes the community to increase its Syphilis levels, a more plausible idea to me is that there is a common cause. A plausible explanation would be that communities with certain social and demographic features are more likely to use Facebook and are more likely to engage in life styles conducive to getting syphilis.

3. Beware of ecological fallacy

The ecological fallacy involves drawing inferences from group-level data to individual-level data. Thus, just because counties with more Facebook users have higher levels of syphilis cases (group-level) does not mean that people who use Facebook are more likely to have syphilis (individual-level). Even if the same association holds at the individual-level, from my experience it tends to be a much weaker association.

Update

Since posting this, I've noticed a few other bloggers commenting on the topic.