RStudio

Module Learning Objectives

By the end of this module, you will be able to:

  • Describe the computer-to-GitHub order of operations
  • Define fundamental Git vocabulary
  • Create a local version-controlled repository that is connected to GitHub

Overview of Git Workflow

Before we get into using Git and GitHub through RStudio, it will be helpful to review the major steps of including version control as you work on code.

Beginning on your local computer, you make changes to a file in a folder that you have previously marked for version control tracking (i.e., a working directory). Once those changes are made you can stage changes within your local computer. After staging, it is best to retrieve the latest file versions from the cloud. You likely will already be up-to-date but this preemptive step can save you a lot of heartache down the line. Once you’ve confirmed that you have the latest file versions, you can shift the revised file(s) to the cloud where any GitHub users with access to your project can access the most recent file and look at the history of all previous changes.

Graphic of a white rectangle on top of a blue square. The white rectangle has a happy cloud image and is labeled 'GitHub' while the blue square has an emoji-style laptop. Numbered steps start at the bottom left and work towards the top right as follows: '1-make changes', '2-stage changes', '3-retrieve latest from GitHub', and '4-put in GitHub'

Git Vocabulary

Finally, it will be helpful to introduce four key pieces of vocabulary before we dive into the interactive component of this workshop.

  • Clone = copy the entire contents of a GitHub repository to your local computer (done once per computer)

  • Commit = move a changed local file to your local staging area (step 2 of the above diagram)

  • Pull = get file(s) from the cloud to your local computer – opposite of a “push” (step 3)

  • Push = move file(s) to the cloud from your local computer – opposite of a “pull” (step 4)

Graphics demonstrating a clone (copy entire folder from GitHub to a copmputer), a commit (putting local changes into the staging area), a pull (overwriting local copies with GitHub versions of the same), and a push (overwriting GitHub versions of files with the committed local versions)

Cloning a Repository

Now, the first step in using Git with RStudio is cloning the repository from GitHub. Note for clarity that in the screenshots below, GitHub is in dark mode while RStudio is in light mode. To clone a repository, follow these steps:

Navigate to the repository on GitHub and click on Code. Select “HTTPS” and copy the link.

Screenshot of the menu that appears in a GitHub repository when someone clicks the 'code' button

Now, return to (or open!) RStudio.

Screenshot of RStudio when no project or scripts are open

Go to the Project tab on the top right corner and click New Project…

Screenshot of the menu that appears when someone clicks 'Project: (None)' in RStudio and hovers over the 'New Project...' button

Select Version Control.

Screenshot of a menu with three options: 'new directory', 'existing directory', and 'version control'

Select Git.

Screenshot of a menu with two options: 'git' and 'subversion'

Paste the repository URL that you just copied from GitHub. Choose a file path to save your project to.

Screenshot of a menu with three open text fields, one for the 'repository URL', one for 'project directory name' and one for 'create project as subdirectory of'

Now we have finished cloning the repository to our RStudio! Notice that we are working in our git-practice project and that our README.md file shows up under the list of files, just like in our GitHub repository.

Screenshot of RStudio after a project has been selected. The project name--in the top right corner--is circled in red

Workflow Refresher

The typical workflow with Git goes like this:

Step 1: You modify files in your working directory and save them as usual.

Step 2: You stage files to mark your intention to “commit” them and then commit that version of those files.

- In RStudio, "staging" is done by checking the box next to a given file in the "Git" tab
- Committing files permanently stores them as snapshots to your Git directory

Step 3: You pull the most recent changes to make sure you’ve been editing the latest versions.

Step 4: You push your the version of your files that you committed to GitHub.

Here is the infographic from the start of this chapter again, which shows the same workflow:

Graphic of a white rectangle on top of a blue square. The white rectangle has a happy cloud image and is labeled 'GitHub' while the blue square has an emoji-style laptop. Numbered steps start at the bottom left and work towards the top right as follows: '1-make changes', '2-stage changes', '3-retrieve latest from GitHub', and '4-put in GitHub'

Stage versus Commit

The functional difference between “staging” a file and “committing” one can be a little tough to grasp at first so let’s explore that briefly here. We can make an analogy with taking a family picture, where each family member would represent a file.

  • Staging files is like deciding which family member(s) are going to be in your next picture
  • Committing is like taking the picture

This 2-step process enables you to flexibly group files into a specific commit. Those groupings can be helpful to you later if you’re trying to find what you changed for a specific task (because those changes likely are all in the same commit).

Creating a New File

Let’s try out a simple Git workflow by first creating a new file. This is Step 1 of the process. We can add new R scripts and files to our repository through RStudio. Try creating a new script by going to File > New File > R Script. Feel free to type anything you want into this script as an example. Name this script after yourself. In the screenshot below, I have named my script angel-script.R.

Once you are done, navigate to the Git tab on the upper left corner. You should see your new script show up there along with a .gitignore and git-practice.Rproj file. Do not worry about the .gitignore file for now, it was created by RStudio to make sure that some temporary files are not tracked by Git. The git-practice.Rproj file will save your settings and open tabs when you close the project, and will restore these settings the next time you open it.

Screenshot of RStudio with an open script and several uncommitted changes identified in the 'Git' pane. Lack of commit is identified by double yellow squares containing a question mark

Notice that there are color-coded icons next to the files in the “Git” tab. These icons are shorthand for the status–according to Git–of every* file in your working directory. Not technically “every” file because files that are tracked but haven’t been modified are not included. See below for definitions.

A legend of five icons RStudio uses to indicate Git status of a file. Yellow question mark = untracked (file not tracked by Git). Green 'A' = added (file marked to start tracking). Blue 'M' = modified (tracked file with changes). Red 'D' = deleted (tracked file that was deleted). Purple 'R' = renamed (tracked file that was renamed)

In our case, it means that our R script, .gitignore, and git-practice.Rproj files have never been tracked by Git (since these files were just created). Note also that the README.md file is not listed, but it exists (check the Files pane). It is because files that are tracked but have no modifications since the last commit are not listed.

Adding our Script to the Next Commit

Let us look at the diff of our script. Click on the Diff tab.

Screenshot of the 'diff' button--circled in red--in the 'git' pane of RStudio

Checking our script, we can see the new lines that we just typed are in green, which indicates that these lines have been added for Git. We would like to save a snapshot of this version of our script. Since we’ve just done Step 1, here are the rest of the steps we will need to do to get our script to show up on our GitHub repository:

Step 2: Add the file to the next commit by checking the box in front of the file name. Note that the two ? icons will change to a single A on the left to show you that this file is now staged to be part of the next commit.

Step 3: In the right pane, type a short but descriptive commit message detailing what you have done so far. Then click on the Commit button to save this version of the script in the Git database.

Screenshot of the commit menu of RStudio where one file has been staged and an informative commit message has been entered

If all of the above steps went well, you should see something like this:

Screenshot of the success message after a commit has been made

Notice that Git tells us that 1 file changed because we’ve just added a new file to our commit. Now close the window. Before sending our changes back to GitHub, we should make sure that the copy of the repository on RStudio is completely up-to-date with the one on GitHub to avoid any conflicts.

Getting the Latest Updates

There are two Git commands to exchange between a local and remote versions of a repository:

  • Pull: Git will get the latest remote version and try to merge it with your local version

  • Push: Git will send your local version to the remote version of the repository (in our case GitHub)

Before sending your local version to the remote, you should always get the latest remote version first. In other words, you should pull first and push second. This is the way Git protects the remote version against incompatibilities with the local version. You always deal with potential problems on your local machine. Therefore your sequence will always be:

  1. Commit
  2. Pull
  3. Push

Of course RStudio has icons for that on top of the “Git” tab, with the blue arrow down being for pull and the green arrow up being for push. Remember the icons are organized in sequence!

Let us do the pull and push to synchronized the remote repositories. Click on the Pull button to pull changes (if any) from the GitHub repository to the copy on RStudio. We have now synchronized the local (our computer) and remote (on GitHub) versions of our repository. You may have noticed that all of our preceding graphics use blue for pull-related content and green for push-related information. Hopefully that helps cement the two ideas in your mind!

Screenshot of the blue 'pull' button and green 'push' button in RStudio

In my case, it turns out that a new script, lyon-script, was added to the GitHub repository by a collaborator while I was making my own script. Since I have just pulled, lyon-script now shows up in my RStudio files.

Screenshot of the message returned when you pull files you didn't have locally from GitHub

Screenshot of RStudio with the 'git' pane underlined in red where it says 'your branch is ahead of origin/main by 2 commits'

A new message has popped up for me: “Your branch is ahead of ‘origin/main’ by two commits”. This means that I have two additional commits on my local machine that I never shared back to the remote repository on GitHub. If I look at the content of my repository on GitHub, I will see just the README.md and lyon-script. My changes are NOT in the cloud yet. You might be seeing a similar message as well.

Sending Changes back to GitHub

So how do we send our changes back to GitHub? Locate the Push button on the “Git” tab and click on it. Now your script should show up in the GitHub repository!

Screenshot of the blue 'pull' button and green 'push' button in RStudio

Once you click that button you should get a success screen like the one pictured below.

Screenshot of the message received when a push is performed successfully from RStudio

Navigate back to the GitHub website and find your repository. Check to see if your script has been added correctly. In my case, angel-script.R finally shows up in my repository.

Screenshot of a GitHub repository with the commits made in earlier screenshots shown in its history

If RStudio ever asks for a “password”…

If your personal access token (PAT) was not set up correctly with RStudio or if it expired, then RStudio will ask for your GitHub username and password in a pop-up when you try to push. Please be aware that when they ask for a “password”, they actually meant your token! Enter your token in the field and you should be able to push now. Make sure to run gitcreds::gitcreds_set() to set a valid token afterwards so you don’t have to enter it manually every time!

Rinse and Repeat

Great! Now that your script has been added to the group repository, you should try to repeat the same workflow over again just to get a feel for how it works. Go back to RStudio and edit your own script. Save those edits, add your edited file to the staging area, write a commit message, then commit your changes. After committing, make sure to pull first then push after! When you pull, you might notice that scripts from your group members/collaborators will show up in your RStudio files.

Make sure to work on your own script. If you and another group member work on the same script at the same time, this may lead to merge conflicts with Git. If two people were to work on the same script, they may be making different edits on the same lines, and Git would not know which edits to keep. To avoid merge conflicts, be mindful of what files you are working on and always communicate this to your group members!