This post documents an example of using Sweave
to generate individualised personality reports based on
responses to a personality test.
Each report provides information on both the responses of the general
sample and responses of the specific respondent.
All source code is provided, and selected aspects are discussed,
including makefiles use of \Sexpr, figures, and LaTeX tables using Sweave.
Overview
All source code is available on GitHub:
Three examples of compiled PDF reports can be viewed as follows: ID1 ID2 and ID4.
The resulting report is a simple proof of concept example.
Discussion of Source Code
makefile
outputDir = .output
backupDir = .backup
test:
-mkdir $(outputDir)
Rscript --verbose run1test.R
test5:
-mkdir $(outputDir)
Rscript --verbose run5test.R
runall:
-mkdir $(outputDir)
Rscript --verbose runAll.R
clean:
-rm $(outputDir)/*
backup:
-mkdir $(backupDir)
cp $(outputDir)/Report_Template_ID*[0123456789].pdf --target-directory=$(backupDir)
outputDirstores the name of the folder used to store derived files (e.g., tex files, images, and compiled document PDFs)backupDirstores the name of the folder where document PDFs are to be storedtest:is the default goal. Running make in the project directory will runrun1test.Rwhich will build one report.- the
--verboseoption shows the progress of R when run as a script. It's useful for seeing progress and debugging. test5:compiles five reports.runall:compiles all reports. Further information on each of theRun... .Rfiles can be obtained by inspecting these files. In general they sourceRun.Rand specify which ids to run reports on.clean:removes all the files from the output directory (i.e., all the derived files)backup:copies reports to the backup folder; i.e., it separates the finished documents from all the other derived files.- To run
test5,runnalletc., typemake test5ormake runnalletc.
main.R
main.R loads external functions and packages, imports data,
imports metadata and processes the data.
# Import Data
ipip <-read.delim("data/ipip.tsv")
ipipmeta <-read.delim("meta/ipipmeta.tsv")
ipipscales <- read.delim("meta/ipipscales.tsv")
- When importing the data, I have adopted the useful convention (which I observed from John Myles White's ProjectTemplate Package) of naming objects and data file names the same. The file extension also clearly indicates the file format (i.e., tab-separated-values).
- I often have separate
dataandmetafolders. Importing metadata often makes for more manageable code than when incorporating metadata by hard coding it into the R script.
Test scores are calculated using the function score.items.
ipipstats <- psych::score.items(ipipmeta[,ipipscales$scale],
ipip[,ipipmeta[,"variable"]],
min = 1, max = 5)
- The
psychpackage has a number of useful functions for psychological research.score.itemsis particularly good. It enables the creation of means and totals for multiple scales. It handles item reversal. It also returns information related to the reliability of the scales.
Run.R
source("main.R", echo = TRUE)
id <- NULL
exportReport <- function(x) {
id <<- x
fileStem <- "Report_Template"
file.copy("Report_Template.Rnw",
paste(".output/", fileStem, "_ID", id, ".Rnw", sep =""),
overwrite = TRUE)
file.copy("Sweave.sty", ".output/Sweave.sty", overwrite = TRUE)
setwd(".output")
Sweave(paste(fileStem, "_ID", id, ".Rnw", sep =""))
tools::texi2dvi(paste(fileStem, "_ID", id, ".tex", sep =""), pdf = TRUE)
setwd("..")
}
- The above code provides the function to run Sweave on each individualised report
- the code is a little bit messy, contains a few hacks, and is not especially robust.
- the
exportReportfunction takes anidvalue as an argumentx. Note the use of the alternative assignment operator. (See ?assignOps) - The code is designed to keep derived files away from source files
by copying files into the
.outputfolder and even changing the working directory to that directory. - The code creates an individualised copy of the Rnw file;
Runs
Sweaveon the report to produce atexfile, and then runstexi2dviwithpdf=TRUEto produce the finalpdf.
Report_Template.Rnw
- The
Rnwfile contains interspersed chunks of LaTeX and R code. - Because the
Rnwfile is called from within R, all the R objects and data processing code does not need to be called at the start of the Rnw file. This approach is one way of reducing the time it takes to run a set of Sweave reports all based on a common data source. - The
\Sexpr{}command is used to incorporate in-line text. (... sample of \Sexpr{nrow(ipip)} students ...). In the example above, it prints the actual number of cases into theipipdata.frame (i.e., the sample size).
Incorporating a figure using Sweave
\begin{figure}
<<plot_scale_distributions, fig=true>>=
plotScale <- function(ipipscale) {
ggplot(ipip, aes_string(x=ipipscale["scale"])) +
scale_x_continuous(limits=c(1, 5),
name = ipipscale["name"]) +
scale_y_continuous(name = "", labels ="", breaks = 0) +
geom_density(fill="green", alpha = .5) +
geom_vline(xintercept = ipip[ipip$id %in% id, ipipscale["scale"]],
size=1)
}
scaleplots <- apply(ipipscales, 1, function(X) plotScale(X))
arrange(scaleplots[[1]],
scaleplots[[2]],
scaleplots[[3]],
scaleplots[[4]],
scaleplots[[5]],
ncol=3)
@
\caption{Figures show distributions of scores of each personality factor
in the norm sample.
Higher scores mean greater levels of the factor.
The black vertical line indicates your score.}
\end{figure}
- The first R code chunk produces a figure using
ggplot2. - The code above takes a while to run (perhaps around 10 seconds on my machine).
But the resulting plot is more attractive than what I could easily get with
basegraphics. <<plot_scale_distributions, fig=true>>=indicates the start of an R code chunk.fig=truelets Sweave know that it has to produce code to include a figure.- The R code chunk is substituted with
\includegraphics{Report_Template_ID10-plot_scale_distributions}in thetexfile and thepdfandepsfigures are created. Thus, if you want a float with captions and labels, you have to add them around the R code chunk. - the
plotScalefunction is used to generate a ggplot2 figure of the distribution of scores on each personality scale along with a marking of the respondent's score on each scale. - The
arrangefunction is used to layout multiple ggplot2 figures on a single plot. The source code is in thelib/vp.layout.Rand was taken from a [post by Stephen Turner( http://gettinggeneticsdone.blogspot.com/2010/03/arrange-multiple-ggplot2-plots-in-same.html)
Preparing a formatted table in R for LaTeX
<<prepare_table>>=
ipiptable <- list()
ipiptable$colnames <- c("item", "scaleF", "text", "meanF",
"sdF", "is1F", "is2F", "is3F", "is4F", "is5F")
ipiptable$cells <- ipipsummary[,ipiptable$colnames ]
ipiptable$cells$item <- paste(ipiptable$cells$item, ".", sep="")
# assign actual respones to table
ipiptable$cells[,c("is1F", "is2F", "is3F", "is4F", "is5F")] <-
sapply(1:5, function(X)
ifelse(as.numeric(ipip[ipip$id %in% id, ipipmeta$variable]) == X,
paste("*", ipiptable$cells[[paste("is", X, "F", sep ="")]], sep =""),
ipiptable$cells[[paste("is", X, "F", sep ="")]]))
ipiptable$cellsF <- as.matrix(ipiptable$cells)
ipiptable$cellsF <- ipiptable$cellsF[order(ipiptable$cellsF[, "scaleF"]), ]
ipiptable$row1 <- c("", "Scale", "Item Text",
"M", "SD", "VI\\%", "MI\\%", "N\\%", "MA\\%", "VA\\%")
ipiptable$table <- rbind(ipiptable$row1, ipiptable$cellsF)
ipiptable$tex <- paste(
apply(ipiptable$table, 1, function(X) paste(X, collapse = " & ")),
"\\\\")
for(i in c(41, 31, 21, 11, 1)) {
ipiptable$tex <- append(ipiptable$tex, "\\midrule", after=i)
}
ipiptable$tex1 <- ipiptable$tex[c(1:34)]
ipiptable$tex2 <- ipiptable$tex[c(1,35:56)]
- I often find it useful to split R code chunks for
table preparation and table presentation.
In general this allows any text that appears before the table to include
\Sexpr{}commands incorporating figures from the analyses which generate the table. In the present case, it was useful because the table was split over two pages. - The code shows some of the general logic I use for customised table creation.
In hindsight I could probably refactor it into a function so that I don't have to always type
ipiptablewhich would make things a little more concise - The general process of table creation involves:
(a) extracting information on cells with cells often grouped into types which will receive common formatting treatment
(b) formatting cells (e.g., rounding, decimals, and so on)
(c) assembling the cells typically using a combination of the functions
rbindandcbind(d) Inserting tex column and end of row separators with something like:paste(apply(x, 1, function(X) paste(X, collapse = " & ")), "\\\\")wherexis the matrix of table cells.
Don't Repeat Yourself Principle using R and Sexpr{}
ipiptable$caption <-
"Response options were
1 = (V)ery (I)naccurate,
2 = (M)oderately (I)naccurate,
3 = (N)either Inaccurate nor Accurate,
4 = (M)oderately (A)ccurate
5 = (V)ery (A)ccurate.
Thus, VI\\\\% indicates the percentage of the norm sample
giving a response indicating that the item is a Very Inaccurate
description of themselves.
Your response is indicated with an asterisk (*)."
- This text was used in both tables.
Thus, this text can then be called using
\Sexpr{ipiptable[["caption"]]}. This follows the DRY principle (Don't Repeat Yourself). Thus, if the caption needs to be modified, it only needs to be modified in one place.
Incorporating the tex formatted table using R Code chunks
\begin{table}
\begin{adjustwidth}{-1cm}{-1cm}
\caption{Table of results for (A)greeableness, (C)onscientiousness
and (E)motional (S)tability items.
\Sexpr{ipiptable[["caption"]]}}
\begin{center}
\begin{tabular}{rrp{4cm}rrrrrrr}
\toprule
<<table_part1, results=tex>>=
cat(ipiptable$tex1, sep="\n")
@
\bottomrule
\end{tabular}
\end{center}
\end{adjustwidth}
\end{table}
- The tables are then incorporated into the
texfile. - The R code only generated some of the required
texfor the table. Thus all the other desired elements such as the table environment and captions are written either side of the R code chunk. - the R code chunk uses the option
results=texin order to enter the output from thecatfunction verbatim into the resulting tex file. cat(ipiptable$tex1, sep="\n")includes a vector oftex. With the newline separator simply making the resultingtexmore readable.