Overview
The repository with all source files is available at:
A copy of the resulting PDF can be viewed here.
For general information on program requirements and running the code see this earlier post
Sweave Documents that Display Console Input and Output
I find it useful to distinguish between different kinds of Sweave documents. One key distinction is between
- reports that display the console and reports that do no display the console.
Reports that display the console are suited to distinct applications, including:
- R tutorials
- Personal analyses
- Analyses provided to experts who understand R and the Project
It would be further possible to distinguish between
reports that do and do not show the console input (echo=true
).
Developing Sweave reports that display the console still benefit from thoughtful variable names, selective display of output and so forth. However, naturally less time is invested in putting the polish on figures, tables, and inline text.
The previous Sweave Tutorials were examples of Sweave documents that do not display the console. Tutorial 1 was a data driven document Tutorial 2 was a set of batch polished reports.
The present tutorial is an example of a Sweave document that displays console input and output.
The remainder of the post discusses various aspects of the source code.
Source Code
Folder and File Structure
.gitignore
records the folder where derived files are stored bymake
.makefile
is similar to that explained and used in Sweave Tutorial 1. This similarity has been obtained through (a) the use of variables in themakefile
(b) the fact that both projects are driven by theRnw
file; thus, make calls Rnw, which in turn imports data, and so forth.README.md
is a file in markdown. Markdown is the markup language used on Github. The file is automatically displayed on the repository home page.data
: This folder contains a file with the responses to the 50 multiple choice questions.meta
: This folder contains a file with information about each of the 50 multiple choice questions including the text, response options, and the supposedly correct response.backup
: This folder contains a copy of the resulting PDF. Although this is a derived file and as such should not generally be monitored by Git, it's helpful to include a copy for easy access.Sweave.sty
: I find it easier and more portable to just include this LaTeX style file required by Sweave in with the project.
Item_Analysis_Report.Rnw
Library loading and data import
<<initial_settings, echo=false>>=
options(stringsAsFactors=FALSE)
options(width=80)
library(psych) # used scoring and alpha
library(CTT) # used for spearman brown prophecy
@
<<import_data, echo=false>>=
cases <- read.delim("data/cases.tsv")
items <- read.delim("meta/items.tsv")
items$variable <- paste("item", items$item, sep="")
@
options(width=80)
ensures that the console width is suitable for the printed pageoptions(stringsAsFactors=FALSE)
means that character variables imported usingread.delim
are left as character variables and not converted intofactors
. In general I find this a more useful default behaviour. In particular I often use the actual text, particularly in metadata, to generate variable names, print text and so forth. Leaving variables as character is better for this.
Using data before it has apparently been generated
...
The example involves performing an item analysis of
responses of \Sexpr{nrow(cases)} students
to a set of \Sexpr{nrow(items)} multiple choice test items.
...
<<>>=
<<initial_settings>>
<<import_data>>
@
- In the above code I wanted to be able to write the number
of cases before showing the code for importing settings and data.
Thus, I first ran the code chunks with
echo=false
to prevent display. Then, afterwards, these code chunks were rerun inside a code chunk using the syntax<<name_of_code_chunk>>
(i.e., without the=
sign at the end of the opening. This time they were displayed.
Scoring multiple choice tests
<<score_test>>=
itemstats <- score.multiple.choice(key = items$correct,
data = cases[,items$variable])
@
score.multiple.choice
is a function in thepsych
package for scoring multiple choice tests.key
is a vector of integers representing the correct response.data
is a matrix or data.frame of responses from a set of respondents.- the example shows how metadata can be used to simplify code.
items$variable
includes the name of the 50 personality test itemsitems$correct
includes the vector of correct responses.
...
Figures in Sweave
<<plot_mean_by_r, fig=true>>=
plot(r ~ mean , itemstats$item.stats, type="n")
text(itemstats$item.stats$mean, itemstats$item.stats$r, 1:50)
abline(h=.2, v=c(.5, .9))
@
- Code chunks can produce single figures.
the
fig=true
key-value pair is required. type="n"
is used to not show points and thentext(...)
is used to plot the item numbers on the plot.- Because the document is an informal document designed to display the console, the figure is not wrapped in a figure float. A float would involve more typing and might even be annoying if it moved around the document.
...
Using Sweave to Better follow the DRY (Don't Repeat Yourself) Principle
<<flag_bad_items>>=
rules <- list(
tooEasy = .95,
tooHard = .3,
lowR = .15)
oritemstats$item.stats$tooEasy <-
oritemstats$item.stats$mean > rules$tooEasy
...
@
\begin{itemize}
\item \emph{Too Easy}: mean correct $>$
\Sexpr{rules$tooEasy}.
\Sexpr{sum(oritemstats$item.stats$tooEasy)}
items were bad by this definition.
...
\end{itemize}
- The above abbreviated version of the actual code highlights how Sweave can be used to prevent repetition and facilitate modifiability.
- The code flags items as too easy if more than 95% of participants
get the item correct.
This value (
.95
) is stored in a variable. It's then subsequently used both in the code to flag items as too easy and also used in the text where the rule is described in plain text (i.e.,\Sexpr{rules$tooEasy}
). - This is a particularly powerful use of Sweave whereby any text in a document that might be repeated or any text that describes details of a data analytic algorithm is a good candidate for simplification using Sweave.
...
\Sexpr{} and formatting
The formula suggests that in order to obtain
an alpha of \Sexpr{sbrown$targetAlpha},
\Sexpr{round(sbrown$multiple, 2)} times as many items are required.
Thus, the final scale would need around
\Sexpr{ceiling(sbrown$refinedItemCount)} items.
Assuming a similar number of good and bad items,
this would require an initial pool of around
\Sexpr{ceiling(sbrown$totalItemCount)} items.
- The above code highlights a couple of examples of how inline
formatting of numbers can be done, and is often required
when including inline text.
In this case,
ceiling
andround
functions were used.
Sweave Tutorial Series
This post is the third installment in a Sweave Tutorial Series:
- Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions
- Batch Individual Personality Reports using R, Sweave, and LaTeX