ProjectTemplate
package. Update (24th August 2016)
Over the last two years, I have been refining this customised version of ProjectTemplate.
I have more detailed information about the latest version here.
Video at Melbourne R Users July 4th 2017
Video at Melbourne R Users July 4th 2017
Overview of ProjectTemplate
ProjectTemplate is an R Package which facilitates data analysis, encourages good data analysis habits, and standardises many data analytic steps. After many years of refining a data analysis workflow in R, I realised that I'd basically converged on something similar to ProjectTemplate anyway. However, my approach was not quite as systematic, and it took more effort than necessary to get started on a new project. Thus, since late 2013, I've been using ProjectTemplate to organise my R data analysis projects.While I have found ProjectTemplate to be an excellent tool, I realised that when I created a new data analysis project based on ProjectTemplate, I was repeatedly making a large number of customisations to the initial set of files and folders. Thus, I've now set up a repository to store these customisations so that I can get started on a new data analysis project more efficiently. The purpose of this post is to document these modifications.
This post assumes a reasonable knowledge of R and ProjectTemplate. If you're not familiar with ProjectTemplate, you could check out the ProjectTemplate website focusing particularly on the Getting Started section. If you're really keen you could also watch an hour long video on ProjectTemplate, RStudio, and GitHub
General setup
I have a copy of my customised version of the ProjectTemplate directory and file structure on github in the AnglimModifiedProjectTemplate repository. Specifically, it has:- Modifications to
global.dcf
as described below, - a blank
readme.md
- a couple of directories removed that I don't use (e.g.,
diagnositics
,logs
,profiling
) - an initial
rmd
file with the customisations mentioned below in thereports
directory - An
.Rproj
RStudio project file to enable easy launching of RStudio. - An additional
output
directory for storing tabular, text, and other output
Thus, after creating a project folder, the following steps can be skipped when using my customised template.
- Open RStudio and create RStudio Project in existing directory
- Create
ProjectTemplate
folder structure withlibrary(ProjectTemplate); create.project()
- Move ProjectTemplate files into folder
- Modify
global.dcf
- Setup rmd reports
- Setting up the data directory
- Updating the readme file
- Setttig up git repository
Modifying global.dcf
My preferred startingglobal.dcf
settings aredata_loading: on
cache_loading: off
munging: on
logging: off
load_libraries: on
libraries: psych, lattice, Hmisc
as_factors: off
data_tables: off
A little explanation:as_factors
I do quite a bit of string processing, particularly on meta data and on output tables. I find the automatic conversion of strings into factors to be a really annoying feature. Thus, setting this tooff
is my preferred setting.load_libraries:
I always have additional libraries so it makes sense to have thison
.libraries:
There are many common packages that I use, but I almost always make use of the above comma separate list of packages.
Setup rmd files
Basics of such files
The first line in the first chunk is always:```{r}
library(ProjectTemplate); load.project()
```
This loads everything required to get started with the project. Setup data folder
ProjectTemplate automatically names resulting data.frames with a name based on the file name. This is convenient. However, it is often the case that the file names need to be changed from some raw data supplied or it may be that the original data format is not perfectly suited for importing. In that case, I store the raw data in a separate folder calledraw-data
and then export or create a copy in the desired format with the desired name in the data
folder.Overriding default data import options
Some data files can not be imported using the default data import rules. Of course, you can change the file to comply with the rules. Alternatively, I think the standard solution is to add a file in thelib
directory (e.g., data-override.r
) that imports the data files. Give the imported data file the same name that ProjectTemplate would.Update readme
I change the file to README.md to make it clear that it is a markdown formatted file. I can then add a little information about the project.Setup git repository
If using github, I create a new repository on github.Output folder
A common workflow for me is to generate tables, text, and figure output fromthe script which is then incorporated into a manuscript document. While I really like Sweave and RMarkdown, I often find it more practical to write a manuscript in Microsoft Word. I use theoutput
folder to store tabular output, standard text output, and figures.In the case of tabular output, there is the task of ensuring the table is formatted appropriately (e.g., desired number of decimal places, cell alignment, cell borders, font, cell merging, etc.). I typically find this easiest to do in Excel. Thus, I have a file called
output-processing.xlsx
. I import the tabular data into this file and apply relevant formatting. This can then be incorporated into the manuscript. Here are a few more notes about Table conversion in MS Word.