Jeromy Anglim's Blog: Psychology and Statistics

Saturday, October 24, 2009

Data Mining and R

This post lists a few data mining resources in R. I also provide a few observations on the distinction between data mining, data analysis, and statistics as it pertains to the analysis work that I do in psychology.

Online Resources
Some Casual Observations
  • Data mining seems more concerned with prediction using observed variables than with understanding the causal system of latent variables; psychology is typically more concerned with the causal system of latent variables.
  • Data mining typically involves massive datasets (e.g. 10,000 + rows) collected for a purpose other than the purpose of the data mining. Psychological datasets are typically small (e.g., less than 1,000 or 100 rows) and collected explicitly to explore a research question.
  • Psychological analysis typically involves testing specific models. Automated model development approaches tend not to be theoretically interesting.


  1. Thanks Jeromy, I have been wondering for a while what the difference between data-mining and data-analysis was. More specifically, you helped clarify the more important question as to how interested I should be in it as a Psychologist (in training ;P).



  2. An R Reference Card for Data Mining is available for free download at, which lists many useful R functions and
    packages for data mining applications.