This post documents an example of using Sweave
to generate individualised personality reports based on
responses to a personality test.
Each report provides information on both the responses of the general
sample and responses of the specific respondent.
All source code is provided, and selected aspects are discussed,
including makefiles
use of \Sexpr
, figures, and LaTeX tables using Sweave.
Overview
All source code is available on GitHub:
Three examples of compiled PDF reports can be viewed as follows: ID1 ID2 and ID4.
The resulting report is a simple proof of concept example.
Discussion of Source Code
makefile
outputDir = .output
backupDir = .backup
test:
-mkdir $(outputDir)
Rscript --verbose run1test.R
test5:
-mkdir $(outputDir)
Rscript --verbose run5test.R
runall:
-mkdir $(outputDir)
Rscript --verbose runAll.R
clean:
-rm $(outputDir)/*
backup:
-mkdir $(backupDir)
cp $(outputDir)/Report_Template_ID*[0123456789].pdf --target-directory=$(backupDir)
outputDir
stores the name of the folder used to store derived files (e.g., tex files, images, and compiled document PDFs)backupDir
stores the name of the folder where document PDFs are to be storedtest:
is the default goal. Running make in the project directory will runrun1test.R
which will build one report.- the
--verbose
option shows the progress of R when run as a script. It's useful for seeing progress and debugging. test5:
compiles five reports.runall:
compiles all reports. Further information on each of theRun... .R
files can be obtained by inspecting these files. In general they sourceRun.R
and specify which ids to run reports on.clean:
removes all the files from the output directory (i.e., all the derived files)backup:
copies reports to the backup folder; i.e., it separates the finished documents from all the other derived files.- To run
test5
,runnall
etc., typemake test5
ormake runnall
etc.
main.R
main.R
loads external functions and packages, imports data,
imports metadata and processes the data.
# Import Data
ipip <-read.delim("data/ipip.tsv")
ipipmeta <-read.delim("meta/ipipmeta.tsv")
ipipscales <- read.delim("meta/ipipscales.tsv")
- When importing the data, I have adopted the useful convention (which I observed from John Myles White's ProjectTemplate Package) of naming objects and data file names the same. The file extension also clearly indicates the file format (i.e., tab-separated-values).
- I often have separate
data
andmeta
folders. Importing metadata often makes for more manageable code than when incorporating metadata by hard coding it into the R script.
Test scores are calculated using the function score.items
.
ipipstats <- psych::score.items(ipipmeta[,ipipscales$scale],
ipip[,ipipmeta[,"variable"]],
min = 1, max = 5)
- The
psych
package has a number of useful functions for psychological research.score.items
is particularly good. It enables the creation of means and totals for multiple scales. It handles item reversal. It also returns information related to the reliability of the scales.
Run.R
source("main.R", echo = TRUE)
id <- NULL
exportReport <- function(x) {
id <<- x
fileStem <- "Report_Template"
file.copy("Report_Template.Rnw",
paste(".output/", fileStem, "_ID", id, ".Rnw", sep =""),
overwrite = TRUE)
file.copy("Sweave.sty", ".output/Sweave.sty", overwrite = TRUE)
setwd(".output")
Sweave(paste(fileStem, "_ID", id, ".Rnw", sep =""))
tools::texi2dvi(paste(fileStem, "_ID", id, ".tex", sep =""), pdf = TRUE)
setwd("..")
}
- The above code provides the function to run Sweave on each individualised report
- the code is a little bit messy, contains a few hacks, and is not especially robust.
- the
exportReport
function takes anid
value as an argumentx
. Note the use of the alternative assignment operator. (See ?assignOps) - The code is designed to keep derived files away from source files
by copying files into the
.output
folder and even changing the working directory to that directory. - The code creates an individualised copy of the Rnw file;
Runs
Sweave
on the report to produce atex
file, and then runstexi2dvi
withpdf=TRUE
to produce the finalpdf
.
Report_Template.Rnw
- The
Rnw
file contains interspersed chunks of LaTeX and R code. - Because the
Rnw
file is called from within R, all the R objects and data processing code does not need to be called at the start of the Rnw file. This approach is one way of reducing the time it takes to run a set of Sweave reports all based on a common data source. - The
\Sexpr{}
command is used to incorporate in-line text. (... sample of \Sexpr{nrow(ipip)} students ...
). In the example above, it prints the actual number of cases into theipip
data.frame (i.e., the sample size).
Incorporating a figure using Sweave
\begin{figure}
<<plot_scale_distributions, fig=true>>=
plotScale <- function(ipipscale) {
ggplot(ipip, aes_string(x=ipipscale["scale"])) +
scale_x_continuous(limits=c(1, 5),
name = ipipscale["name"]) +
scale_y_continuous(name = "", labels ="", breaks = 0) +
geom_density(fill="green", alpha = .5) +
geom_vline(xintercept = ipip[ipip$id %in% id, ipipscale["scale"]],
size=1)
}
scaleplots <- apply(ipipscales, 1, function(X) plotScale(X))
arrange(scaleplots[[1]],
scaleplots[[2]],
scaleplots[[3]],
scaleplots[[4]],
scaleplots[[5]],
ncol=3)
@
\caption{Figures show distributions of scores of each personality factor
in the norm sample.
Higher scores mean greater levels of the factor.
The black vertical line indicates your score.}
\end{figure}
- The first R code chunk produces a figure using
ggplot2
. - The code above takes a while to run (perhaps around 10 seconds on my machine).
But the resulting plot is more attractive than what I could easily get with
base
graphics. <<plot_scale_distributions, fig=true>>=
indicates the start of an R code chunk.fig=true
lets Sweave know that it has to produce code to include a figure.- The R code chunk is substituted with
\includegraphics{Report_Template_ID10-plot_scale_distributions}
in thetex
file and thepdf
andeps
figures are created. Thus, if you want a float with captions and labels, you have to add them around the R code chunk. - the
plotScale
function is used to generate a ggplot2 figure of the distribution of scores on each personality scale along with a marking of the respondent's score on each scale. - The
arrange
function is used to layout multiple ggplot2 figures on a single plot. The source code is in thelib/vp.layout.R
and was taken from a [post by Stephen Turner( http://gettinggeneticsdone.blogspot.com/2010/03/arrange-multiple-ggplot2-plots-in-same.html)
Preparing a formatted table in R for LaTeX
<<prepare_table>>=
ipiptable <- list()
ipiptable$colnames <- c("item", "scaleF", "text", "meanF",
"sdF", "is1F", "is2F", "is3F", "is4F", "is5F")
ipiptable$cells <- ipipsummary[,ipiptable$colnames ]
ipiptable$cells$item <- paste(ipiptable$cells$item, ".", sep="")
# assign actual respones to table
ipiptable$cells[,c("is1F", "is2F", "is3F", "is4F", "is5F")] <-
sapply(1:5, function(X)
ifelse(as.numeric(ipip[ipip$id %in% id, ipipmeta$variable]) == X,
paste("*", ipiptable$cells[[paste("is", X, "F", sep ="")]], sep =""),
ipiptable$cells[[paste("is", X, "F", sep ="")]]))
ipiptable$cellsF <- as.matrix(ipiptable$cells)
ipiptable$cellsF <- ipiptable$cellsF[order(ipiptable$cellsF[, "scaleF"]), ]
ipiptable$row1 <- c("", "Scale", "Item Text",
"M", "SD", "VI\\%", "MI\\%", "N\\%", "MA\\%", "VA\\%")
ipiptable$table <- rbind(ipiptable$row1, ipiptable$cellsF)
ipiptable$tex <- paste(
apply(ipiptable$table, 1, function(X) paste(X, collapse = " & ")),
"\\\\")
for(i in c(41, 31, 21, 11, 1)) {
ipiptable$tex <- append(ipiptable$tex, "\\midrule", after=i)
}
ipiptable$tex1 <- ipiptable$tex[c(1:34)]
ipiptable$tex2 <- ipiptable$tex[c(1,35:56)]
- I often find it useful to split R code chunks for
table preparation and table presentation.
In general this allows any text that appears before the table to include
\Sexpr{}
commands incorporating figures from the analyses which generate the table. In the present case, it was useful because the table was split over two pages. - The code shows some of the general logic I use for customised table creation.
In hindsight I could probably refactor it into a function so that I don't have to always type
ipiptable
which would make things a little more concise - The general process of table creation involves:
(a) extracting information on cells with cells often grouped into types which will receive common formatting treatment
(b) formatting cells (e.g., rounding, decimals, and so on)
(c) assembling the cells typically using a combination of the functions
rbind
andcbind
(d) Inserting tex column and end of row separators with something like:paste(apply(x, 1, function(X) paste(X, collapse = " & ")), "\\\\")
wherex
is the matrix of table cells.
Don't Repeat Yourself Principle using R and Sexpr{}
ipiptable$caption <-
"Response options were
1 = (V)ery (I)naccurate,
2 = (M)oderately (I)naccurate,
3 = (N)either Inaccurate nor Accurate,
4 = (M)oderately (A)ccurate
5 = (V)ery (A)ccurate.
Thus, VI\\\\% indicates the percentage of the norm sample
giving a response indicating that the item is a Very Inaccurate
description of themselves.
Your response is indicated with an asterisk (*)."
- This text was used in both tables.
Thus, this text can then be called using
\Sexpr{ipiptable[["caption"]]}
. This follows the DRY principle (Don't Repeat Yourself). Thus, if the caption needs to be modified, it only needs to be modified in one place.
Incorporating the tex formatted table using R Code chunks
\begin{table}
\begin{adjustwidth}{-1cm}{-1cm}
\caption{Table of results for (A)greeableness, (C)onscientiousness
and (E)motional (S)tability items.
\Sexpr{ipiptable[["caption"]]}}
\begin{center}
\begin{tabular}{rrp{4cm}rrrrrrr}
\toprule
<<table_part1, results=tex>>=
cat(ipiptable$tex1, sep="\n")
@
\bottomrule
\end{tabular}
\end{center}
\end{adjustwidth}
\end{table}
- The tables are then incorporated into the
tex
file. - The R code only generated some of the required
tex
for the table. Thus all the other desired elements such as the table environment and captions are written either side of the R code chunk. - the R code chunk uses the option
results=tex
in order to enter the output from thecat
function verbatim into the resulting tex file. cat(ipiptable$tex1, sep="\n")
includes a vector oftex
. With the newline separator simply making the resultingtex
more readable.