Project Organization
Overview
Project organization is one of those topics that is–maybe–not exciting but is absolutely critical to the success of synthesis working groups. In individual projects conducted over short timescales, an absence of an overarching strategy for organization can be survivable but in a project that will last for several years and will have many actively contributing colleagues this lack quickly becomes an insurmountable hurdle.
When Should We Plan?
It is never too early to draft your organization plan! The longer you wait to discuss an organization strategy and decide on a system the worse it will be to go back and retroactively sort the content your group has produced into the structure you agree upon. If you make a plan early on it should be clear where a given file should ‘live’ as it is created which completely removes the labor-intensive reorganization that is inherent to organizing plans made later in a project’s lifecycle.
Broad Considerations
There is no single “best” mode of organizing a project. However, there are some decision points that most groups consider even if not all groups reach the same conclusions at those junctures. Here are some guiding questions that might prove helpful as your group discusses your organizing plan:
- What “modules” or “silos” are likely to be necessary for your project?
- Semi-discrete subcomponents of the project are likely to exist and should (probably) be separated into different folders
- For example, groups almost always have at least the following four major folders: “data”, “notes”, “publications”, and “presentations”
- What level of organization can your group easily maintain for 2-4 years?
- Ideally your chosen structure will require little to no maintenance after it is initially set-up
- How hard will it be to onboard new members to navigate your chosen system?
- Your group will likely need to onboard new members and if they don’t know where they should add their contributions they may add files in incorrect places or refrain from contributing at all–either outcome would be a sad loss for your group
- Note this question also applies to ‘future you’ if you focus on other work for a time and then need to remind yourself how to navigate this project
SciComp Recommendation Example
While there is no “one size fits all” solution, our team has identified a structure that has worked quite well for past groups because it is relatively simple to maintain and easily extensible as project questions evolve. In addition, it avoids an overly nested folder structure which makes it easier for new members (or ‘future you’) to become familiar with the overall schema.
Google Drive Structure
Shared Drive
|– data
| |– data-log.csv
| |– raw
| |– tidy
| | |– 01_data-harmonized.csv
| | |– 02_data-wrangled.csv
| | L 03_data-filtered.csv
| |– climate
| L land-cover
|– notes
|– presentations
L publications
|– community-composition
| L graphs
L synchrony
GitHub Structure
GitHub Repository
|– README.md
|– .gitignore
|– scripts
| |– 00_download-data.R
| |– 01_harmonize.R
| |– 02_quality-control.R
| |– 03_filter.R
| |– 04_stats.R
| L 05_graph.R
|– tools
| |– README.md
| |– fxn_calc-beta-diversity.R
| L fxn_bookmark-graph.R
L explore
|– README.md
|– downs-stats.py
L lyon-graphs.R
Check out the tabs below for highlights of these structures!
- Limited use of sub-folders
- Consistent folder/file naming conventions
- Good names should be be both human and machine-readable
- Avoid spaces and special characters
- Consistent use of delimeters (e.g., “-”, “_“, etc.)
- Shared file prefix (a.k.a. “slug”) connecting code files with files they create
- Allows for easy tracing of errors because the file with issues has an explicit tie to the script that likely introduced that error
- Contains inputs to code and outputs from code but not code itself
- Trust that GitHub does a better job of tracking code files than Drive can
- Also remember that duplicating code files and storing them in multiple places is a recipe for heartache as there is no “single source of truth” upon which to depend
- Dedicated place for notes / presentations
- Shared
data/
folder agnostic to specific product- Will let your group start with the same data product for each paper (just filtered/analyzed differently for each product)
- Contains code but not inputs/outputs
- Use the
.gitignore
file to regulate what Git will/won’t track (for more info, see here)
- Use the
- Dedicated “README” files containing high-level information about each folder
- Numbered script names making workflow order explicit
- Use of custom functions for repeated operations
- This is a great way of ensuring reproducibility–and if you create enough functions, a software package might be a nice ‘bonus’ product for your group!
explore/
folder for ‘rough draft’ scripts developed by particular members- Including this can be a great way of making everyone feel more comfortable contributing–even if they are not completely confident in their coding skills
- Including surnames in file names here can be a nice way of avoiding merge conflicts
- You can always rename these files and move them to the
scripts/
folder if they seem valuable for the core workflow!
Additional Resources
If you’d like more resources on project organization and reproducibility in general, check out the following:
- LTER’s Synthesis Skills for Early Career Researchers (SSECR) Reproducibility module
- LTER Scientific Computing Team’s Tips for File Naming