Jeromy Anglim's Blog: Psychology and Statistics


Showing posts with label Literate programming. Show all posts
Showing posts with label Literate programming. Show all posts

Tuesday, February 23, 2010

Getting Started with Sweave: R, LaTeX, Eclipse, StatET, & TeXlipse

Being able to press a single button that runs all your statistical analyses and integrates the output into your final report is a beautiful thing. If you have not already heard, this is what Sweave can do for you. However, getting your computer to run Sweave can be a little bit fiddly. Thus, this post: (1) sets out the benefits of Sweave; (2) sets out how to install and configure R, Sweave, and Eclipse on Windows; (3) lists resources for people wanting to learn more about how to use LaTeX and Sweave; and (4) lists some specific resources relevant to researchers in psychology wanting to use these tools.

Monday, September 21, 2009

Linking text, results, and analyses: Increasing transparency and efficiency

I have recently been thinking about the relationship between text in a final report and data analysis. The broader concern is with making the conduct and reporting of statistical analyses more transparent. I am inspired by the ideas of literate programming, Sweave, and open access to data.

Something to aspire to:
  • Raw data is shared  (ethics, copyright, and other considerations permitting). 
  • Code is shared that shows how the data was imported, transformed, and analysed. This code is well written, commented, and documented.
  • The report is shared as opposed to requiring a paid subscription.
  • Report output including tables, figures, and some text is linked directly to the analyses in code.

While the aspirations transcend R, I like the prospect of having analyses in R integrated with a final report. The inclusion of tables and figures , at least conceptually is a straightforward idea. However, the inclusion of text in a results section is a little fuzzier. Surely, text in a results section (I'll call it "results text" for short) varies in how it relates to actual analyses. Thus, I had the following questions: 1) What is the unit of results text? 2) How does results text vary and what should be automatically supplied by R?; 3) For results text that should not be supplied by R, how should it be integrated into an analysis process?

Initial thoughts: After a little reflection I had the following thoughts: