Sweave Tutorial 3: Console Input and Output - Multiple Choice Test Analysis

This post provides an example of using Sweave to perform an item analysis of a multiple choice test. It is designed as a tutorial for learning more about using Sweave in a mode where console input and output is displayed. Copies of all source code and the final PDF report is provided.

Overview

The repository with all source files is available at:

https://github.com/jeromyanglim/Sweave_Item_Analysis/.

A copy of the resulting PDF can be viewed here.

For general information on program requirements and running the code see this earlier post

Sweave Documents that Display Console Input and Output

I find it useful to distinguish between different kinds of Sweave documents. One key distinction is between

reports that display the console and reports that do no display the console.

Reports that display the console are suited to distinct applications, including:

R tutorials
Personal analyses
Analyses provided to experts who understand R and the Project

It would be further possible to distinguish between reports that do and do not show the console input (echo=true).

Developing Sweave reports that display the console still benefit from thoughtful variable names, selective display of output and so forth. However, naturally less time is invested in putting the polish on figures, tables, and inline text.

The previous Sweave Tutorials were examples of Sweave documents that do not display the console. Tutorial 1 was a data driven document Tutorial 2 was a set of batch polished reports.

The present tutorial is an example of a Sweave document that displays console input and output.

The remainder of the post discusses various aspects of the source code.

Source Code

Folder and File Structure

.gitignore records the folder where derived files are stored by make.
makefile is similar to that explained and used in Sweave Tutorial 1. This similarity has been obtained through (a) the use of variables in the makefile (b) the fact that both projects are driven by the Rnw file; thus, make calls Rnw, which in turn imports data, and so forth.
README.md is a file in markdown. Markdown is the markup language used on Github. The file is automatically displayed on the repository home page.
data: This folder contains a file with the responses to the 50 multiple choice questions.
meta: This folder contains a file with information about each of the 50 multiple choice questions including the text, response options, and the supposedly correct response.
backup: This folder contains a copy of the resulting PDF. Although this is a derived file and as such should not generally be monitored by Git, it's helpful to include a copy for easy access.
Sweave.sty: I find it easier and more portable to just include this LaTeX style file required by Sweave in with the project.

Item_Analysis_Report.Rnw

Library loading and data import

<<initial_settings, echo=false>>=
options(stringsAsFactors=FALSE)
options(width=80)
library(psych) # used scoring and alpha
library(CTT) # used for spearman brown prophecy
@


<<import_data, echo=false>>=
cases <- read.delim("data/cases.tsv")
items <- read.delim("meta/items.tsv")
items$variable <- paste("item", items$item, sep="")
@

options(width=80) ensures that the console width is suitable for the printed page
options(stringsAsFactors=FALSE) means that character variables imported using read.delim are left as character variables and not converted into factors. In general I find this a more useful default behaviour. In particular I often use the actual text, particularly in metadata, to generate variable names, print text and so forth. Leaving variables as character is better for this.

Using data before it has apparently been generated

...
The example involves performing an item analysis of 
responses of \Sexpr{nrow(cases)} students 
to a set of \Sexpr{nrow(items)} multiple choice test items.
...

<<>>=
<<initial_settings>>
<<import_data>>
@

In the above code I wanted to be able to write the number of cases before showing the code for importing settings and data. Thus, I first ran the code chunks with echo=false to prevent display. Then, afterwards, these code chunks were rerun inside a code chunk using the syntax <<name_of_code_chunk>> (i.e., without the = sign at the end of the opening. This time they were displayed.

Scoring multiple choice tests

<<score_test>>=
itemstats <- score.multiple.choice(key = items$correct, 
            data = cases[,items$variable])
@

score.multiple.choice is a function in the psych package for scoring multiple choice tests. key is a vector of integers representing the correct response. data is a matrix or data.frame of responses from a set of respondents.
the example shows how metadata can be used to simplify code. items$variable includes the name of the 50 personality test items items$correct includes the vector of correct responses.

...

Figures in Sweave

<<plot_mean_by_r, fig=true>>=
plot(r ~ mean , itemstats$item.stats, type="n")
text(itemstats$item.stats$mean, itemstats$item.stats$r, 1:50)
abline(h=.2, v=c(.5, .9))
@

Code chunks can produce single figures. the fig=true key-value pair is required.
type="n" is used to not show points and then text(...) is used to plot the item numbers on the plot.
Because the document is an informal document designed to display the console, the figure is not wrapped in a figure float. A float would involve more typing and might even be annoying if it moved around the document.

...

Using Sweave to Better follow the DRY (Don't Repeat Yourself) Principle

<<flag_bad_items>>=
rules <- list(
        tooEasy = .95,
        tooHard = .3,
        lowR = .15)
oritemstats$item.stats$tooEasy <- 
    oritemstats$item.stats$mean > rules$tooEasy
...
@

\begin{itemize}
\item \emph{Too Easy}: mean correct $>$
\Sexpr{rules$tooEasy}.
\Sexpr{sum(oritemstats$item.stats$tooEasy)}
items were bad by this definition.
... 
\end{itemize}

The above abbreviated version of the actual code highlights how Sweave can be used to prevent repetition and facilitate modifiability.
The code flags items as too easy if more than 95% of participants get the item correct. This value (.95) is stored in a variable. It's then subsequently used both in the code to flag items as too easy and also used in the text where the rule is described in plain text (i.e., \Sexpr{rules$tooEasy}).
This is a particularly powerful use of Sweave whereby any text in a document that might be repeated or any text that describes details of a data analytic algorithm is a good candidate for simplification using Sweave.

...

\Sexpr{} and formatting

The formula suggests  that in order to obtain
an alpha of \Sexpr{sbrown$targetAlpha},
\Sexpr{round(sbrown$multiple, 2)} times as many items are required.
Thus, the final scale would need around
\Sexpr{ceiling(sbrown$refinedItemCount)} items.
Assuming a similar number of good and bad items,
this would require an initial pool of around
\Sexpr{ceiling(sbrown$totalItemCount)} items.

The above code highlights a couple of examples of how inline formatting of numbers can be done, and is often required when including inline text. In this case, ceiling and round functions were used.

Sweave Tutorial Series

This post is the third installment in a Sweave Tutorial Series:

Jeromy Anglim's Blog: Psychology and Statistics

Tuesday, November 30, 2010