The Context:
Common examples that I encounter include:
- employees nested within stores,
- students nested within schools,
- students nested within class rooms, and
- participants nested within geographic regions.
The typical examples that I see are largely convenience samples at both the cluster-level and the participant-level. The clusters are not necessarily a random sample of possible clusters. The samples within clusters are not necessarily random within clusters. For example, a researcher may have collected data from 300 participants from 80 stores. And The distribution of employees per store is skewed such that some stores provide only one or two participants and other stores provide 15 to 20 participants. Such a sampling design is not trying to be a representative sample of a defined population as might be encountered in a national pole. Rather these social science studies are using the limited resources available regarding the availability of participants to maximise sample size for analyses.
The Problem:
The main issue is that statistics that assume independence of observations will yield standard errors that are smaller than they should be. Thus, the researcher is more likely to conclude that an effect (e.g., a difference between group means; a correlation; a regression; etc.) is statistically significant regardless of whether an effect is actually present in the population.
However, introducing a multilevel modelling framework or some other standard error adjustment procedure often introduces greater complexity into the modelling task. In addition, having a large number of clusters some of which only have a few cases may introduce estimation issues. Thus, researchers sometimes want to justify the use of standard techniques that assume independence.
General Discussion:
The following three pages provide useful information on the topic.
- UCLA discusses the issue on a page called Analyzing correlated data. The page discusses options for adjusting standard errors. It also shows the effect of sample size and intraclass correlation (ICC) on p values.
- Hedges and Hedberg (2007, Intraclass correlation for planning Group Randomized Experiments) discusses the implications of cluster sampling in experiments with and without pretest scores.
- Gene Shackman discusses how to calculate the Design Effect from the ICC and the average size of groups.
A Few Thoughts on How to Deal with This Situation:
1. Think about the Cluster Effect: Are there reasons to expect participants to be more similar within clusters on the outcome variable? One way to approach this is to think about what correlates with the outcome variables and think about whether the clusters are likely to differ in their mean levels on these predictors. For example, if all stores tend to be fairly similar, the participants do not interact, and the outcome variable has little to do with geography or the workplace itself, then the effect of store may be close to zero. 2. Estimate the Cluster Effect: Assess the extent to which cluster explains variance in the outcome variable. Most assessments are based on first estimating some form of intraclass correlation (ICC). SPSS has the MIXED procedure. R has the multilevel package and the psychometric package which both uses the nlme package. The typical procedure involves running a model with cluster as a random effect on the outcome variable potentially with additional predictors in the model.
ICC = var(cluster) / [var(cluster) + var(error)] (see Wikipedia or UCLA or Hedges and Hedberg for details)
3. Adopt a Procedure:
Thus, if researchers choose to ignore the clustering, they need to make a strong argument from theory and from the data that it is appropriate. This argument should include some or all of the following points if they are applicable:
- ICC is close to zero. It should be noted that the UCLA post suggests that ICCs equal to .01 are still likely to bias standard errors.
- Theory suggests that there is no reason to expect clusters to effect the dependent variable
- Prior research suggests little to no clustering effect
- The intended audience is less likely to comprehend the more sophisticated techniques
- It is standard in the literature to ignore the clustering
- p values are sufficiently small that results would be robust anyway
- the purpose of the analyses is exploratory
- the number of participants per cluster is small
- a multilevel model or some other more sophisticated procedure was tried and yielded the same substantive results.
- a multilevel model or some other more sophisticated procedure was tried and could not be run due to estimation issues.
If the ICC does suggest that cluster explains variance in the outcome variable, then the above links (under General Discussion) suggest ways of modelling the data (e.g., multilevel modelling, adjusting standard errors, etc.).