Great Measurement but Small Sample Size: Case Study of Videotaped Families

New York Times reports on an interesting UCLA study that involved video taping 32 Los Angeles families over the course of a week. The study generated rich data for analysis. It's great to see researchers moving beyond self-report measures towards real-world well-coded behavioural observations. However, great measurement does not overcome issues of a small sample size.

Overview

When reporting the results of the study New York Times states that:
"Mothers still do most of the housework, spending 27 percent of their time on it, on average, compared with 18 percent for fathers and 3 percent for children (giving an allowance made no difference)." (New York Times, 2010)

However, this is relatively poor data for making claims about:

What percentage of housework men and women do
Whether an allowance is effective at increasing house work

Assuming there is a reasonable standard deviation in the percent of time spent doing housework, the confidence intervals around the estimated mean housework for husbands and wives and the confidence interval around the difference will be large. While the effect size of the difference seems to be sufficiently large to show a significant difference, greater confidence on the size of the effect could be obtained with a bigger sample.

Statistical Power on a difference between group means

There are also issues with the claims that an allowance is not effective at increasing housework levels of children. First, the study is observational making it difficult to assess causal effects. A randomised control trial would be more appropriate. However, regardless of this, the sample size issue is even more problematic. Presumably something like an independent groups t-test was done at the family-level to see whether families that give an allowance to their children on average had children who did more housework than families that did not give an allowance.

The statistical power for testing such a difference with a sample size of 32 is very low. Assuming that an allowance has a medium effect (i.e., a half standard deviation increase) on household work a standard alpha level of .05, and that there were equal numbers of families that had an allowance or did not have an allowance (16 in each), this would result in statistical power (i.e., the probability of correctly rejecting the null hypothesis of no differences) of .277 (see Figure 1). That is to say, the study had a 27.7% chance of rejecting the null hypothesis. Or more broadly, the study was more likely to accept the null hypothesis than reject it.

Figure 1. Screen shot of G*Power 3 showing statistical power of reported study given the above stated assumptions

In order to have reasonable statistical power (i.e., 80% power) given the above assumptions a sample size of 124 families (64 in each group) would be required (see Figure 2).

Figure 2. Screen shot of G*Power 3 showing sample size required to have 80% power given above stated assumptions

Concluding Points

My main points are that:

Improving the validity of social science measurement is important. The use of coded videos of real world settings creates exciting measurement possibilities.
Even with improved measurement, a small sample size will result in large confidence intervals when estimating effects of interest.

Jeromy Anglim's Blog: Psychology and Statistics

Monday, May 24, 2010