Overview

Welcome!

This workshop provides an overview of many of the packages included in the Tidyverse suite of packages for the R programming language. The Tidyverse is a veritable universe of tools though that no single workshop could hope to cover so we are focusing here on an introductory approach that focuses primarily on some fundamentals to tidying data in R. We are always happy to improve workshop content so please don’t hesitate to post an Issue on our GitHub repository if you see clear areas for improvement!

Hex logo for the Tidyverse R package

To maximize the value of this workshop to you, we recommend that you take the following steps before the day of the workshop. If anything is unclear, feel free to reach out to us; our contact information can be found here

Workshop Preparation

For those of you with a dedicated IT team that has sole power to install software on your computer: you may need to contact them before the workshop to do the installation bits of the prep steps we outline below.

1. Install R

Install R. If you already have R, check that you have at least version 4.0.0 by running the following code:

version$version.string

If your version starts with a 3 (e.g., the above code returns “R version 3…”), please update R to make sure all packages behave as expected.

2. Install RStudio

Once you have R (ver. ≥4.0), install RStudio. If you already have RStudio installed, you may want to make sure that you’re using a recent version to take advantage of some quality of life improvements that are broadly useful.

3. Install Some R Packages

Install the tidyverse and palmerpenguins R packages using the following code:

install.packages("librarian")
librarian::shelf(tidyverse, palmerpenguins)

Please run the above code even if you already have these packages to update these packages and ensure that your code aligns with the examples and challenges introduced during the workshop.

4. Celebrate!

After following all the previous preparation steps, your setup should now be complete.

Example Data - Penguins

The data we’ll be using for this workshop comes from the palmerpenguins package, maintained by Allison Horst. The “penguins” dataset from this package contains size measurements for adult foraging penguins near Palmer Station, Antarctica. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station Long Term Ecological Research (LTER) Program. Let’s take a look at it!

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

The “penguins” dataset has 344 rows and 8 columns.

The columns are as follows:

  • species: a factor denoting penguin species (Adélie, Chinstrap and Gentoo)
  • island: a factor denoting island in Palmer Archipelago, Antarctica (Biscoe, Dream or Torgersen)
  • bill_length_mm: a number denoting bill length (millimeters)
  • bill_depth_mm: a number denoting bill depth (millimeters)
  • flipper_length_mm: an integer denoting flipper length (millimeters)
  • body_mass_g: an integer denoting body mass (grams)
  • sex: a factor denoting penguin sex (female, male)
  • year: an integer denoting the study year (2007, 2008, or 2009)

This dataset is an example of tidy data, which means that each variable is in its own column and each observation is in its own row. Generally speaking, functions from packages in the Tidyverse expect tidy data though they can be used in some cases to help get data into tidy format! Regardless, the penguins dataset is what we’ll use for all examples in this workshop so be sure that you install the palmerpenguins R package. The examples on this page were adapted from Allison Horst’s dplyr tutorial!

Websites to Visit

Supplemental Material

While not technically necessary to attend the workshop, if you’d like you can see the content that created the workshop website you are viewing by visiting our GitHub repository here.

Also, check out NCEAS’ Learning Hub for a complete list of workshops and trainings offered by NCEAS.