Jeromy Anglim's Blog: Psychology and Statistics


Saturday, November 27, 2010

Sweave Tutorial 1: Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions

In this post I present an example of using Sweave to prepare a PDF of formatted multiple choice questions. More broadly the example shows how to use Sweave to incorporate elements of a database into a formatted LaTeX document. It aims to be useful to anyone wanting to learn more about the almost magical powers of make, Sweave, and R.

Overview

The repository with all source files is available at:

The repository allows you to download all files as an archive or view the files individually on the web. A copy of the final PDF generated from the process is available here

I ran the code on Windows with the following programs installed.

  • R: For the R code
  • Rtools: For make and the sh commands in Make and for Sweave to run on the command line
  • MikTeX: For compilation of the PDF using texify and the default downloading of the exam document class

It should run on MAC and Linux with appropriate R, make, and LaTeX tools installed.

Assuming you have the above installed, to run the code

  1. Download the repository from github
  2. Uninstall to a directory
  3. Open the shell in that directory
  4. Type: make

The remainder of this post explains the code in each of the main files in the repository.


The Makefile

The makefile is used to build the PDF from the Rnw Source. It also performs other useful tasks. A copy of the makefile is shown below:

output = .output
rnwfile = Sweave_MCQ
backup = .backup

all:
    R CMD Sweave $(rnwfile).Rnw
    -mkdir $(output)
    -cp *.sty $(output)
    -mv *.tex *.pdf *.eps $(output)
    cd $(output); texify --run-viewer --pdf $(rnwfile).tex 

tex:
    cd $(output); texify --run-viewer --pdf $(rnwfile).tex

clean:
    -rm $(output)/*

backup:
    -mkdir $(backup)
    cp  $(output)/$(rnwfile).pdf $(backup)/$(rnwfile).pdf

I recently posted on the benefits of makefiles when developing Sweave documents.

The make file starts with three variables.

  • output stores the name of the folder where where all derivative files are placed.
  • rnwfile stores the name of the Rnw source file without the .Rnw extension. This is also the base of the resulting .tex and .pdf files.
  • backup stores the name of the folder where a copy of the pdf is placed if this is desired.

The file then has four goals.

The default goal is called all:. If make is called without argument from the command line in the project directory, the recipe immediately below all: is run. Note that all apparent indentations are tab indentations (a set of spaces would cause an error).

  • The first line runs Sweave on the Rnw file from the command line. On windows this R CMD Sweave requires installation of Rtools Note how $(rnwfile).Rnw will actually be Sweave_MCQ.Rnw after variable substitution.
  • The second line creates a directory that corresponds to the value in the variable output. The hyphen at the start of the line ensures that any errors, such as if the folder already exists, do not stop make from running. Note that commands such as mkdir, cp, cd, and rm are based on the sh shell. These commands are supported on Windows if you have Rtools installed.
  • The third line copies Sweave.sty into the output folder.
  • The fourth line copies tex, pdf and eps files (i.e., those generated by the Sweave command) into the output folder. This is done to ensure that the root directory only includes source files. This has several benefits. (a) It makes version control easier;
    (b) it makes it easy to see the source files and (c) it reduces the risk of accidentally deleting source files when deleting derived files.
  • the fifth line changes the home directory to the output directory and then runs texify on the tex file generated from Sweave. The flags ensure that a pdf is generated and that the default viewer is initiated. This command could be modified to something like pdflatex or some other latex program.
  • By changing the directory, all the derived latex files are kept in the output folder.

The tex: goal can be called by running make tex at the command line. I use it in case I want there is an error in the when compiling the pdf from the tex file. Sometimes its easier to work out where the bug is by manipulating the intervening tex file. Of course once the problem has been identified, it needs to be incorporated into the Rnw source.

The clean: goal removes all files in the output directory (i.e., all the derived files)

The backup: goal copies the resulting pdf into the backup folder. I figured this might be useful in order to include a copy of the final product in the repository.


.gitignore

/.output
.project

The .gitignore file prevents all files in the /.output directory (i.e., the derived files) and the file .project from being placed under version control in git.

I'm preparing a post on version control, git, and github which will be posted shortly.


Sweave_MCQ.Rnw

Sweave_MCQ.Rnw is the R noweb file that contains chunks of LaTeX and R code. When Sweave is run on this file, the R code chunks are converted into tex and, potentially, image files are generated.

LateX Preamble

\documentclass[12pt, a4paper]{exam}
\usepackage[OT1]{fontenc}
\usepackage{Sweave}
\SweaveOpts{echo=FALSE}
\usepackage{hyperref}            
\hypersetup{pdfpagelayout=SinglePage} % http://www.tug.org/applications/hyperref/ftp/doc/manual.html
\setkeys{Gin}{width=0.8\textwidth}
\pagestyle{headandfoot} % every page has a header and footer
\header{}{Sample Multiple Choice Questions}{}
\footer{}{Page \thepage\ of \numpages}{}

The latex preamble is mostly general code that ensures proper display of the reuslting document.

  • The exam document class is great for writing a variety of exam style documents in LaTeX. See CTAN - exam for documentation.
  • hyperref is used to display hyperlinks and allows the resulting pdf to open in SinglePage format.
  • \setkeys... controls the width of Sweave figures relative to the paragraph width. There are no figures in this documents; so it is not really required.
  • \pagestyle{headerandfoot}... These three lines ensure the display of header and footer information on each page.

First R Code Chunk

<<prepare_data>>=
items <- read.csv("data/items.csv", stringsAsFactors = FALSE)

writeQuestion <- function(x){
    c("\\filbreak",
            paste("\\question\n", x["itemText"]),
            "\\begin{choices}",
            paste("\\choice", x["optionA"]),
            paste("\\choice", x["optionB"]),
            paste("\\choice", x["optionC"]), 
            paste("\\choice", x["optionD"]), 
            "\\vspace{10 mm}",
            "\\end{choices}\n\n")
}

itemText <- apply(items, 1, function(X)  writeQuestion(x = X))

answers <- paste(items$item, "=",
        LETTERS[as.numeric(items$correctAnswer)],
        sep ="")
answersText <- paste(answers, collapse = "; ")
@
  • R code chunks in Sweave are commenced by <<>>= and ended by @. These need to appear in the first column of the text file.
  • <<prepare_data>>=: The first non-keyword placed in the opening tags provides a name for the R code chunk. A short descriptive title is useful both when reading the source and when debugging Sweave compilation.
  • items...: this line reads in a csv file into a data frame with 40 cases. Each case is a multiple choice question with fields such as the question text, the text for the four response options, and the correct answer.
  • writeQuestion...: This is a function which is designed to take a row of data from the items data frame and return a latex formated character vector, where each element is ultimately be printed on its own line of the tex file. Note how in order to produce one backslash in LaTeX, two backslashes, need to be written. The \question, \choices, and \choice commands are part of the exam document class and are used for formatting multiple choice questions.
  • apply... takes the items data frame and for each row (1=rows) runs the function writeQuestion on the row.
  • answers... and answersText create a formatted string that shows item numbers and letters for correct answers, all drawn from the items data frame.

Remaining code

\begin{questions}
<<print_items, results=tex>>=
cat(itemText, sep = "\n")
@
\newpage
\section*{Answers}
<<print_answers, results=tex>>=
cat(answersText) 
@

\end{questions}
  • The second R code chunk has a descriptive name print_items. It uses the key-value pair results=tex. This ensures that Sweave interprets the text outputed using cat as raw tex.
  • cat... prints the long character vector itemText containing all the latex for the questions. sep="\n" means that each element is printed on a new line which makes the resulting tex file easier to read.

Summary and Related Resources

The combination of make, R, Sweave, and LaTeX is tremendously powerful. Hopefully, this post encourages a few more people to have a play. To learn more check out some of the following posts and pages: