Jeromy Anglim's Blog: Psychology and Statistics

Tuesday, August 4, 2020

A Publication Workflow for Organising Files and Directories

The following describes my workflow for publishing journal articles. It defines a set of rules for organising the files and directories associated with writing and publishing a peer-reviewed journal article.
It covers issues of file organisation, version control, and collaboration. It embodies a number of lessons that I've learnt while publishing journal articles. It also helps to have a standardised approach.

In this context, the project is the publication of a journal article.

Short Project Name
Every project needs a short name that uniquely identifies the project. This name is used in several settings including the parent directory name, the data analysis directory name, the manuscript name, and when talking to colleagues about the project. 

A good project name is short, descriptive, and uniquely identifies the project. Two words is usually best, but three words is okay. 8 to 15 characters is usually about right. It's a little bit like thinking of a good running head, but even shorter and more for private consumption. 

Examples: Some recent short project names for my papers include: "hexaco-ei", "hexaco-wellbeing", "employee-facets", "hexaco-applicants", "subtask-learning",  and "dynamic-wellbeing".

Project names can be bad for a range of reasons. 
  • Too long. This makes file names hard to read. It makes it difficult to talk to colleagues about the paper. It makes it more mentally taxing to think about the project by name.
  • Conflicts with other projects. If you have multiple projects in an area, it's important to distinguish the focal project from other similar projects. 
  • Not descriptive enough:  It is best to think about the defining feature of the study. Bad names do not elicit the project in mind.
Parent Directory Name
Every project has a parent directory. This is the directory that contains all the core files of the project.
The parent directory is named:
"short-name-year"or "short-name-year-storagemode"

By storage mode, I refer to tools like dropbox, onedrive, or local for local computer. E.g.,
"short-name-year-dropbox" or "short-name-year-onedrive" or "short-name-year-local"

Appending the year the project commenced is helpful as an additional identifier for the project. In particular, the short name might be good initially, but may become less identifying over time (e.g., you do a lot more similar research). So when you're searching for the project in years to come, the year becomes particularly useful.

Appending the storage mode is particularly helpful when you are collaborating with colleagues on a project using a tool like dropbox or onedrive, but you also need to maintain some files on your local computer. For example, confidential data, private brainstorming, files you don't want colleagues interfering with, files that get corrupted on data sharing platforms, etc. In this case appending "local" to your local files and "dropbox" to the shared files helps distinguish the two.

Examples: "hexaco-ei-2017-dropbox", "hexaco-wellbeing-2017-dropbox"

The following are the core directories of the project
  • manuscript: Stores the authoritative version of the manuscript, the reference manager database, and any online supplement files that will be submitted to the journal.
  • archive: Stores old versions of files. I.e., the manuscript. It provides a simple form of version control.
  • submissions: This contains one directory for each journal submission and each journal submission includes folders for each step of the publication process. 

Additional directories
  • notes: Stores any preliminary analyses, literature reviews, and files that involve analysis or reflection.
  • resources: Store any files related to the study. E.g., meta-data, scoring instructions, details about survey, tasks, and so on, raw data.
  • analysis: The data analysis files
Manuscript directory

The manuscript file name is in the following format:

Examples: if the date of last editing was 4th August 2020 and the short name was "hexaco-values", the manuscript file would be called:

Note that the file never has words like "draft", "rough draft", "final", "absolutely-final", "final2", etc. The date is all that is required to indicate that it is the latest version. 

The word "manuscript" is placed at the start of the file name for several reasons. First it clearly denotes this file as the manuscript file as opposed to some other file (e.g., supplemental files, etc.). Second, it is easier to identify it as the manuscript file than if the file is called ("shortname-manuscript-date"). This then leads to fewer errors when uploading files to the manuscript submission system.

When to update the date in the manuscript file name? A general idea is that whenever the manuscript reaches a key stage, a copy of the manuscript is placed in the "archive" directory and the date in the filename is updated. Key stages include: whenever the manuscript moves between authors, when submitting it to a journal, after revise and resubmit, when it has been a long time since you've touched the manuscript, when you're about to engage in some substantial edits. This essentially implements a basic form of version control. It enables you to recover any deleted content should you need to. It also  more comfortable to implement edits knowing that things can be restored.

Other files in the manuscript directory: 
Files associated with the reference manager. I use Endnote to manage references and I use a database that is project specific. In theory, Endnote can experience issues if multiple collaborators are trying to use Dropbox to work with the same endnote folder. That said, often there are no issues. One solution is to designate one person as the one to manage Endnote, and other authors just put comments about references. 

Other files: Quite often, there are online supplement files that get submitted to the journal. These often provide additional methodological details or additional analyses. It makes sense to keep these in the manuscript folder as they will need to be submitted to the journal.

Submissions directory

The submissions directory is where all the submissions to journals are stored. The general folder structure is that there is a directory for each journal that you submit to with the prefix 1, 2, 3, etc. Obviously, you only submit to journal 2 after journal 1 has rejected you. 

Example: if the first submission was to Journal of Personality it would be called "1-jopy"; If that was rejected, and we tried Australian Psychologist, the second folder would be called "2-apsych". 

Within each journal submission directory are numbered directories for each stage of the submission. Here is one example set of folders:
  • 1-initial-submission (cover letter, manuscript with anonymised title page, non-anonymised title page, online supplement, confirmation of submission email, pdf of submission)
  • 2-first-revision (email with revision requests, updated manuscript/supplement, response notes, confirmation of resubmission email, pdf of resubmission)
  • 3-acceptance (email confirming acceptance, 
  • 4-licence (copy of copyright agreement)
  • 5-proofs (files associated with proofing)
  • 6-formatted-online-first (copy of online first version)
  • 7-preprint (preparing post-print for psyarxiv)
  • 8-page-numbers (final journal pdf with page numbers)
Other common folders include:
  • 3-second-revision (same files as first revision but just updated)
  • 4-third-revision (same files as first revision but just updated)
  • 5-rejection (copy of rejection email; optionally details brainstorming reflections)
The general principle is that these submission directories include (a) a read-only copy of the manuscript (often split up into title page and body) and related files (e.g., online supplement, figures, etc.), (b) any journal submission specific files (e.g., cover letter, responses to reviewer comments, journal specific information such as highlights), and (c) any journal correspondence, (d) PDFs generated by the submission system.

An important principle here is that everything has one authoritative source. So, you never edit the actual manuscript in the submissions folder. These edits belong in the "manuscript" folder. The only edits to the manuscript that occur in the submissions folder are things like: anonymising the title page, making the manuscript conform to journal requirements (e.g., putting tables/figures in specific places).

That said, things like cover letters and response to reviewer comments do live in their respective submission directories. And that is their authoritative home.

Resources and Notes Directories
Journal articles have lots of assorted resources (details on measures and procedure, literature searches, data analysis notes, brainstorming of ideas, etc.). The main point here is that these materials are organised in directories of the project. 

Linked Directories
In some instances not all files are contained in the project directory. Resources may be relevant to more than one project. Or there maybe files that need to be stored elsewhere. In this case, I place an alias or shortcut link to these resources in the parent directory.

Workings File
I often have a file called "workings-shortname.docx" in the root folder of the project. This is used to store all project related brainstorming and notes. 

Template of Project
I store a template version of a new project on github: 

I have a bookmark in my browser which downloads a zipped up copy of the template:

This makes starting a new project very efficient. I update it from time to time to reflect changing conventions and so on (e.g., APA 7).