Project Organization

Overview

Project organization is one of those topics that is–maybe–not exciting but is absolutely critical to the success of synthesis working groups. In individual projects conducted over short timescales, an absence of an overarching strategy for organization can be survivable but in a project that will last for several years and will have many actively contributing colleagues this lack quickly becomes an insurmountable hurdle.

When Should We Plan?

It is never too early to draft your organization plan! The longer you wait to discuss an organization strategy and decide on a system the worse it will be to go back and retroactively sort the content your group has produced into the structure you agree upon. If you make a plan early on it should be clear where a given file should ‘live’ as it is created which completely removes the labor-intensive reorganization that is inherent to organizing plans made later in a project’s lifecycle.

Broad Considerations

There is no single “best” mode of organizing a project. However, there are some decision points that most groups consider even if not all groups reach the same conclusions at those junctures. Here are some guiding questions that might prove helpful as your group discusses your organizing plan:

  • What “modules” or “silos” are likely to be necessary for your project?
    • Semi-discrete subcomponents of the project are likely to exist and should (probably) be separated into different folders
    • For example, groups almost always have at least the following four major folders: “data”, “notes”, “publications”, and “presentations”
  • What level of organization can your group easily maintain for 2-4 years?
    • Ideally your chosen structure will require little to no maintenance after it is initially set-up
  • How hard will it be to onboard new members to navigate your chosen system?
    • Your group will likely need to onboard new members and if they don’t know where they should add their contributions they may add files in incorrect places or refrain from contributing at all–either outcome would be a sad loss for your group
    • Note this question also applies to ‘future you’ if you focus on other work for a time and then need to remind yourself how to navigate this project

SciComp Recommendation Example

While there is no “one size fits all” solution, our team has identified a structure that has worked quite well for past groups because it is relatively simple to maintain and easily extensible as project questions evolve. In addition, it avoids an overly nested folder structure which makes it easier for new members (or ‘future you’) to become familiar with the overall schema.

Key properties of this structure:

  • Everything nested within a project-wide folder
  • Limited use of sub-folders
  • Dedicated “README” files containing high-level information about each folder
  • Consistent folder/file naming conventions
    • Good names should be be both human and machine-readable
    • Avoid spaces and special characters
    • Consistent use of delimeters (e.g., “-”, “_“, etc.)
  • Shared file prefix (a.k.a. “slug”) connecting code files with files they create
    • Allows for easy tracing of errors because the file with issues has an explicit tie to the script that likely introduced that error

SciComp Suggested Project Structure:

synthesis_project
 |–  README.md
 |–  code
 |       |–  README.md
 |       |–  01_harmonize.R
 |        L  02_quality-control.R
 |–  data
 |       |–  README.md
 |       |–  data-log.csv
 |       |–  raw
 |        L  tidy
 |             |–  01_data-harmonized.csv
 |              L  02_data-wrangled.csv
 |–  notes
 |–  presentations
  L  publications
        |–  README.md
        |–  community-composition
         L  biogeochemistry

GitHub versus Google Drive

Ideally your chosen organization system will work for all platforms that your group uses. For many groups, this includes–at a minimum–a GitHub repository and a Shared Google Drive. While the same general organizing structure will work in both locations, there are some slight differences.

GitHub should not track the “data” folder (see our tutorial on using the “.gitignore” file to regulate what Git will/won’t track) and likely the facets of your project that are not directly related to data/code (e.g., “notes”, etc.) won’t be stored in GitHub either.

Similarly, you should not store your code in the Drive. Trust that GitHub does a better job of tracking code files than Drive can and also remember that storing duplicate code files is often a recipe for heartache as there is no “single source of truth” upon which to depend.

Additional Resources

If you’d like more resources on project organization and reproducibility in general, check out the following: