Overview:
Leland Wilkinson wrote an influential book called The Grammar of Graphics. Wilkinson states that just as language has grammar, so does graphics. And hence just as to express ourselves verbally it helps to have some training in grammar, to express ourselves graphically, it helps to understand the grammar of graphics (see ggplot2 for an R implementation of Grammar of Graphics).
This got me thinking. What is the Grammar of Tables? What follows are my thoughts on the conceptual structure of a table. My aim was to refine a set of concepts which would assist me (and possibly others) to design better tables (particularly for journal articles in psychology). I'm also starting to program customised tables in R, which tends to require greater conceptual clarity than when relying on standard software to produce tables.
Grammar of Tables:
- A table is a matrix made up of rows, columns and cells. A table has a number of rows and a number of columns. Any cell in a table can be referenced by its row and column position where the top-left cell is (1, 1) and the bottom right is (number of rows, number of columns).
- A table section at least in APA style includes a Table Caption (e.g., Table 1), a Table Title (e.g., "Means, Standard Deviations, and Intercorrelations for Predictor Variables"), a Table, and optionally a Note (a note that applies to the whole table), and a Specific Note (i.e., a note that applies a particular part of the table, typically referenced by symbols, such as a, b, and c).
- A Table description is the accompanying text that references the table and discusses the results presented.
- Cells in a table can be empty cells or content cells. Empty cells contain no content and content cells contain content, such as text or numbers.
- Content cells can contain names or values. Names define rows and columns. Values are the value of the combination of the levels of row and column.
- Rows and columns are divided by factors with levels. One notation for describing a table is similar to experimental ANOVA notation. The notation: (ROW FACTORS) x (COLUMN FACTORS).
- Some examples:
- 2 x 2 : e.g., counts of gender by age group (older versus younger)
- (2 x 2) x 3: gender and condition (treatment versus control) by three statistics (mean, sd, and n)
- Factor types describe the different ways that factors can be divided into levels. These include:
- Variable Set: Each row or column represents a different variable. Examples include correlation matrices and where means are presented on a series of variables.
- Categorical variable: Each row or column represents a different level of a categorical variable.
- Statistics Set: Each row or column represents a different statistic (using statistic in the inclusive sense of the word). For example, mean, sd, n, correlation, beta-coefficient, p-value, t-test value, etc.
- Levels can be:
- Ordered: Levels of a factor are always presented in an order. This can be arbitrary. The order should aid comprehension of the table. if the factor is ordinal, levels will typically be entered from lowest to highest, or highest to lowest. Levels of a row factor can be ordered based on one of the columns.
- Grouped: This can either be explicit or implicit. Explicit grouping can be represented in various ways include lines between groupings, a subheading marking out the grouping, and additional space between groupings. Implicit grouping occurs when the ordering of the levels is implicitly based on the grouping dimension.
- Removed: Levels of a factor can be removed from the table.
- Tables and APA Style: See the Publication Manual of the APA for future discussion on table presentation. Docstyles discusses APA style for tables
- reshape: The reshape for R package has an interesting set of ideas about restructuring data with some relevance to thinking about tables (see here for an article on the package).
NOTE: This is a work in progress, which I will refine over time.