Conecting R & Google Drive

Overview

The googledrive R package is a package that lets R users directly interact with files on GoogleDrive. This can be extremely useful because it lets all members of a team share the same source data file(s) and guarantees that updates to “living” documents are received by all group members the next time they run their R script. This package is technically part of the Tidyverse but is not loaded by running library(tidyverse).

Because this package requires access to an R user’s GoogleDrive, you must “authenticate” the googledrive package. This essentially tells Google that it is okay if an R package uses your credentials to access and (potentially) modify your Drive content. There are only a few steps to this process but follow along with the below tutorial and we’ll get you ready to integrate the Google Drive into your code workflows using the googledrive package in no time!

Prerequisites

To follow along with this tutorial you will need to take the following steps:

  • Download R
  • Download RStudio
  • Create a Gmail account

Feel free to skip any steps that you have already completed!

Authorize googledrive

In order to connect R with a GoogleDrive, we’ll need to authorize googledrive to act on our behalf. This only needs to be done once (per computer) so follow along and you’ll be building GoogleDrive into your workflows in no time!

First, install the googledrive and httpuv R packages. The googledrive package’s need is self-evident while the httpuv package makes the following steps a little easier than googledrive makes it alone. Be sure to load the googledrive package after you install it!

# Install packages
install.packages(c("googledrive", "httpuv"))

# Load them
library(googledrive)

Once you’ve installed the packages we can begin the authentication in R using the drive_auth function in the googledrive package.

googledrive::drive_auth(email = "enter your gmail here!")

If this is your first time using googledrive, drive_auth will kick you to a new tab of your browser (see below for a screen grab of that screen) where you can pick which Gmail you’d like to connect to R.

Click the Gmail you want to use and you will get a second screen where Google tells you that “Tidyverse API” wants access to your Google Account. This message is followed by three checkboxes, the first two are grayed out but the third is unchecked.

NOTE

This next bit is vitally important so carefully read and follow the next instruction!

In this screen, you must check the unchecked box to be able to use the googledrive R package. If you do not check this box all attempts to use googledrive functions will get an error that says “insufficient permissions”.

While granting access to “see, edit, create, and”delete” all of your Google Drive files” sounds like a significant security risk, those powers are actually why you’re using the googledrive package in the first place! You want to be able to download existing Drive files, change them in R on your computer, and then put them back in Google Drive which is exactly what is meant by “see, edit, create, and delete”.

Also, this power only applies to the computer you’re currently working on! Granting access on your work computer allows only that computer to access your Drive files. So don’t worry about giving access to your Drive to the whole world, that is protected by the same failsafes that you use when you let your computer remember a password to a website you frequent.

After you’ve checked the authorization box, scroll down and click the “Continue” button.

This should result in a plain text page that tells you to close this window and return to R. If you see this message you are ready to use the googledrive package!

Problems with Authorization

If you have tried to use drive_auth and did not check the box indicated above, you need to make the googledrive package ask you again. Using drive_auth will not (annoyingly) return you to the place it sent you the first time. However, if you run the following code chunk it should give you another chance to check the needed box.

The gargle R package referenced below is required for interacting with Google Application Program Interfaces (APIs). This package does the heavy lifting of secure password and token management and is necessary for the googledrive authentication chunk below.

googledrive::drive_auth(
  email = gargle::gargle_oauth_email(),
  path = NULL,
  scopes = "https://www.googleapis.com/auth/drive",
  cache = gargle::gargle_oauth_cache(),
  use_oob = gargle::gargle_oob_default(),
  token = NULL)

Unfortunately, to use the googledrive package you must check the box that empowers the package to function as designed. If you’re uncomfortable giving the googledrive that much power you will need to pivot your workflow away from using GoogleDrive directly. However, NCEAS does offer access to an internal server called “Aurora” where data can be securely saved and shared among group members without special authentication like what googledrive requires. Reach out to our team if this seems like a more attractive option for your working group and we can offer training on how to use this powerful tool!

Find and Download Files

Now that you’ve authorized the googledrive package, you can start downloading the Google Drive files you need through R! Let’s say that you want to download a csv file from a folder or shared drive. You can save the URL of that folder/shared drive to a variable.

The googledrive package makes it straightforward to access Drive folders and files with the as_id function. This function allows the full link to a file or folder to serve as a direct connection to that file/folder. Most of the other googledrive functions will require a URL that is wrapped with as_id in this way. You would replace “your url here” with the actual link but make sure it is in quotation marks.

drive_url <- googledrive::as_id("your url here")

To list all the contents of this folder, we can use the drive_ls function. You will get a dataframe-like object of the files back as the output. An example is shown below in the screenshot. Here, this Google Drive folder contains 4 csv files: ingredients.csv, favorite_soups.csv, favorite_fruits.csv and favorite_desserts.csv

drive_folder <- googledrive::drive_ls(path = drive_url)
drive_folder

If it has been a while since you’ve used googledrive, it will prompt you to refresh your token. Simply enter the number that corresponds to the correct Google Drive account.

If you only want to list files of a certain type, you can specify this in the type argument. And let’s say that my folder contains a bunch of csv files, but I only want to download the one named “favorite_desserts.csv”. In that case, I can also put a matching string in the pattern argument in order to filter down to 1 file.

drive_folder <- googledrive::drive_ls(path = drive_url,
                                      type = "csv", 
                                      pattern = "favorite_desserts")
drive_folder

Once we’ve narrowed down to the file we want, we can download it using drive_download. This function takes the file identifier as an argument so we can access it using drive_folder$id.

googledrive::drive_download(file = drive_folder$id)

This will automatically download the file to our working directory. If you want, you can specify a different path to download to. Just put the new file path into the path argument, replacing the “your path here”, but keep the quotation marks.

googledrive::drive_download(file = drive_folder$id, 
                            path = "your path here")

If you’ve downloaded the file before, and you want to overwrite it, there’s a handy overwrite argument that you can set to TRUE. Note that the default is FALSE.

googledrive::drive_download(file = drive_folder$id, 
                            path = "your path here",
                            overwrite = T)

If there are multiple files in the Drive folder and you want to download them all, you can use a loop like so:

# For each file:
for(focal_file in drive_folder$name){
  
  # Find the file identifier for that file
  file_id <- subset(drive_folder, name == focal_file)

  # Download that file
  drive_download(file = file_id$id, 
                 path = "your path here",
                 overwrite = T)
}