<- file.path("data", "raw_data")
my_path <- read.csv(file = file.path(my_path, "example.csv")) my_raw_data
Team Coding: 5 Essentials
Have you ever had trouble running someone else’s code or re-running your own old code? Working in a synthesis group can bring up challenges like these as members try to run workflows written by others or written by themselves at the last in-person meeting months ago. To make the process easier and more reproducible, here is a list of our top 5 best practices to follow as you collaborate on scripts that address your group’s scientific questions. These are merely suggestions but we hope that they help facilitate seamless synthesis science!
1. Prioritize ‘Future You’
If something takes more time now but will work better in the long run, invest the time now to save yourself heartache in the future. This can apply to taking the time to make a document listing the scripts in a given workflow, adding descriptive comments in an existing script, and many other contexts. By investing this time now, you will save ‘future you’ from unnecessary labor.
2. Always Leave Comments
Leave enough comments in your code so that other members of your team (and ‘future you’!) can understand what your script does. This is a crucial habit that will benefit you immensely. By making your code more human-readable, you open the door for improved communication among team members. This makes it easier for people who weren’t involved in your workflow to jump in and give feedback if necessary or to onboard new team members who join later in the project. Plus, it is less of a hassle to edit and maintain well-commented code in the future; you can make changes without spending too much time deciphering each line of code.
3. Use Relative File Paths
When coding collaboratively, accounting for the difference between your folder structure and those of your colleagues becomes critical. For example, if you read in a data file using its absolute file path (e.g. “/users/my_name/documents/project/data/raw_data/example.csv”), only you will be able to successfully run that line of code and–by extension–your entire script! Also, the slashes between folder names are different depending on each person’s computer operating system, which means even a relative file path will only work for your teammates that share your computer brand.
If you’re an R user, there are two quick things you can do in your code to avoid these problems:
Relative Paths – Use the dir.create
function in your script to create any necessary folders.
Need a data folder? Use dir.create("data")
and you’ll create an empty data folder. Anyone else running your code will create the same folder and you can safely assume that part of the file path going forward.
Operating System Differences – Use the file.path
function with folder names without slashes.
Reading in data? Use file.path("data", "raw_data", "site_a.csv")
and file.path
will automatically sense the computer’s operating system and insert the correct slashes for each user.
For example, if you are already working in the directory called “project”, then you can access example.csv using this relative file path: data/raw_data/example.csv. You can improve beyond even that by using the file.path
function to automatically detect the computer operating system and insert the correct slash for you and anyone else running the code. We recommend using this function and assigning your file path to an object so you can use it anytime.
4. Store Raw Data in the Cloud
If you’re a GitHub user, you may be tempted to store your data files there, but GitHub limits the size of files allowed in repositories. Adding files larger than 50MB will receive a warning, and files larger than 100MB will be blocked. If you’re working with big datasets or spatial data, you can exceed this limit pretty fast.
To avoid this, we recommend instead that you store your raw data files in the cloud and make them available to everyone in your group. For example, you can create a folder for raw data in a Shared Google Drive (which we can create for you!). Then, you can download the data using the googledrive
R package or with any other Google Drive API in your preferred language.
5. Meta-Document
Documenting every individual script is important, but it’s also well worth the time and effort to document the big picture of your workflow. As you continue to build on your workflow, it can be hard to keep track of each script’s role and how they relate to each other. You might need to update a script upstream and then try to figure out what other scripts downstream need to be updated next in order to account for the new edits. If you’re not using a workflow management software, then it’s best to thoroughly document how each script fits into the larger workflow. The README is a great place to document each step along the way.