Jeromy Anglim's Blog: Psychology and Statistics

Friday, February 26, 2010

Clustered Samples and Assuming Independence of Observations

I sometimes speak to researchers who have a design where units are nested within clusters (e.g., 200 employees nested within 50 stores). While this is often called cluster sampling, the research that this post addresses is often more about convenience than about following a rigorous sampling plan. At some point, the researchers discover that this clustering has implications for the assumption of independence of observations, which in turn has implications for the validity of standard statistical techniques, such as t-tests and regressions that assume independence. This post discusses what to do in such a situation and when if ever it is appropriate to ignore the clustered nature of the sampling.

Tuesday, February 23, 2010

Getting Started with Sweave: R, LaTeX, Eclipse, StatET, & TeXlipse

Being able to press a single button that runs all your statistical analyses and integrates the output into your final report is a beautiful thing. If you have not already heard, this is what Sweave can do for you. However, getting your computer to run Sweave can be a little bit fiddly. Thus, this post: (1) sets out the benefits of Sweave; (2) sets out how to install and configure R, Sweave, and Eclipse on Windows; (3) lists resources for people wanting to learn more about how to use LaTeX and Sweave; and (4) lists some specific resources relevant to researchers in psychology wanting to use these tools.

Thursday, February 18, 2010

Analysis of Winter Olympic Medal Data Using R

The Winter Olympics are on. The Guardian's DataBlog has graciously compiled a database on Winter Olympic Medals. Thus, I thought I'd run a few quick analyses on the data in R. In this post I was hoping to show how one could quickly churn out some basic analyses (and answer some interesting questions) using R.

Tuesday, February 16, 2010

A Case Study in Optimising Code in R

This post presents an experience I had optimising the efficiency of code for a data analysis task in R. I'm not an expert in programming nor code optimisation. However, I thought my experience might make an interesting case study for others at a similar level in their R programming development. The post sets out: (1) the context of the problem; (2) the strategies and tools in R that I used to diagnose the problem and optimise the code; (3) some lessons learnt from the experience; and (4) some links to additional resources.