Jeromy Anglim's Blog: Psychology and Statistics

Friday, February 26, 2010

Clustered Samples and Assuming Independence of Observations

I sometimes speak to researchers who have a design where units are nested within clusters (e.g., 200 employees nested within 50 stores). While this is often called cluster sampling, the research that this post addresses is often more about convenience than about following a rigorous sampling plan. At some point, the researchers discover that this clustering has implications for the assumption of independence of observations, which in turn has implications for the validity of standard statistical techniques, such as t-tests and regressions that assume independence. This post discusses what to do in such a situation and when if ever it is appropriate to ignore the clustered nature of the sampling.

The Context:
Common examples that I encounter include:
  • employees nested within stores,
  • students nested within schools,
  • students nested within class rooms, and
  • participants nested within geographic regions.
The typical examples that I see are largely convenience samples at both the cluster-level and the participant-level. The clusters are not necessarily a random sample of possible clusters. The samples within clusters are not necessarily random within clusters. For example, a researcher may have collected data from 300 participants from 80 stores. And The distribution of employees per store is skewed such that some stores provide only one or two participants and other stores provide 15 to 20 participants. Such a sampling design is not trying to be a representative sample of a defined population as might be encountered in a national pole. Rather these social science studies are using the limited resources available regarding the availability of participants to maximise sample size for analyses.

The Problem:
The main issue is that statistics that assume independence of observations will yield standard errors that are smaller than they should be. Thus, the researcher is more likely to conclude that an effect (e.g., a difference between group means; a correlation; a regression; etc.) is statistically significant regardless of whether an effect is actually present in the population.

However, introducing a multilevel modelling framework or some other standard error adjustment procedure often introduces greater complexity into the modelling task. In addition, having a large number of clusters some of which only have a  few cases may introduce estimation issues. Thus, researchers sometimes want to justify the use of standard techniques that assume independence.

General Discussion:
The following three pages provide useful information on the topic.
A Few Thoughts on How to Deal with This Situation: 
1. Think about the Cluster Effect:  Are there reasons to expect participants to be more similar within clusters on the outcome variable? One way to approach this is to think about what correlates with the outcome variables and think about whether the clusters are likely to differ in their mean levels on these predictors. For example, if all stores tend to be fairly similar, the participants do not interact, and the outcome variable has little to do with geography or the workplace itself, then the effect of store may be close to zero. 

2. Estimate the Cluster Effect: Assess the extent to which cluster explains variance in the outcome variable. Most assessments are based on first estimating some form of intraclass correlation (ICC). SPSS has the MIXED procedure. R has the multilevel package and the psychometric package which both uses the nlme package. The typical procedure involves running a model with cluster as a random effect on the outcome variable potentially with additional predictors in the model. 
ICC = var(cluster) / [var(cluster) + var(error)] (see Wikipedia or UCLA or Hedges and Hedberg for details)

3. Adopt a Procedure: 
Thus, if researchers choose to ignore the clustering, they need to make a strong argument from theory and from the data that it is appropriate. This argument should include some or all of the following points if they are applicable:

  1. ICC is close to zero. It should be noted that the UCLA post suggests that ICCs equal to .01 are still likely to bias standard errors.
  2. Theory suggests that there is no reason to expect clusters to effect the dependent variable
  3. Prior research suggests little to no clustering effect
  4. The intended audience is less likely to comprehend the more sophisticated techniques
  5. It is standard in the literature to ignore the clustering
  6. p values are sufficiently small that results would be robust anyway
  7. the purpose of the analyses is exploratory
  8. the number of participants per cluster is small
  9. a multilevel model or some other more sophisticated procedure was tried and yielded the same substantive results. 
  10. a multilevel model or some other more sophisticated procedure was tried and could not be run due to estimation issues. 
The most important element of this argument is that the ICC is close to zero. However, even with all this, a reviewer may still not accept the argument, and expect a more sophisticated approach to be adopted.

If the ICC does suggest that cluster explains variance in the outcome variable, then the above links (under General Discussion) suggest ways of modelling the data (e.g., multilevel modelling, adjusting standard errors, etc.).


  1. "It is standard in the literature to ignore the clustering"

    Be careful distinguishing between "a strong argument...that it is appropriate" and "convincing reviewers." Literature standards can just as easily be based in poor analytic techniques, and while saying "well everyone else does it!" may get your piece published, it does no one any good in the long run.

  2. Hi Richard,
    Great comment. I considered not including the point about standards in the literature for the reasons that you outline.

    "Everybody does it" can be a shorthand for justifiable reasons (e.g., there tends to be low ICCs in the field) or
    it can represent a form of superficial analysis that perpetuates poor statistical practice.