Reproducible Reports

Overview

At this point in the course, we anticipate that you’re likely approaching the end of your team’s synthesis project (see our suggested milestones page for more information). As the end of your project and the course as a whole nears, it might be valuable for your group to consider how you can reproducibly document all of the work you’ve been doing for the last several months. “Computational notebooks” (e.g., Quarto documents, Jupyter Notebooks, R Markdown files, etc.) can be a reproducible way of documenting your results in a format that allows you to leverage both your technical skills and your scientific communication skills.

This module focuses on the structure and content of these notebooks from a primarily technical lens, so please consult the communicating findings module for the team science perspective.

Learning Objectives

After completing this module you will be able to:

  • Describe contexts where computational notebooks are useful
  • Identify the three fundamental elements of a typical notebook
  • Use Markdown syntax to accomplish text styling
  • Create notebooks that include a blend of plain text and embedded code chunks
  • Make a Quarto website
  • Define the purpose of GitHub Actions
  • Deploy a notebook with GitHub Pages

Preparation

TBD (To Be Determined)

Networking Session

To Be Determined!

Notebook Structure & Value

Files that combine plain text with embedded code chunks are an excellent way of reproducibly documenting workflows and faciliating conversations about said workflows (or their outputs). Examples of notebook files include Quarto documents, Jupyter Notebooks, and R Markdown files. Regardless of the specific type, all of them function in the same way. Each of them allows you to use code chunks in the same way that you might use a typical script but between the code chunks you can add–and format–plain, human-readable text. Arguably you could do this with comments in a script but this format is built around the idea that this plain text is intended to be interpretable without any prior coding experience. The plain text can be formatted with Markdown syntax (we’ll discuss that in greater depth later) but even unformatted text outside of code chunks is visually ‘easier on the eyes’ than comment lines in scripts.

Another shared facet of notebook interfaces is that they are meant to be “rendered” (a.k.a. “knit”) to produce a different file type that translates code chunks and Markdown-formatted text into something that looks much more similar to what you might produce in Microsoft Word or a Google Doc. Typically such files are rendered into PDFs or HTMLs though there are other output options. These rendered files can then be shared (and opened) outside of coding platforms and thus make their content even more accessible to non-coders.

In synthesis work these reports can be especially valuable because your team may include those with a wealth of discipline insight but not necessarily coding literacy. Creating reports with embedded code can enable these collaborators to engage more fully than they might otherwise be able to if there was essentially a minimum threshold of coding literacy required in order to contribute. These reports can also be useful documentation of past coding decisions and serve as reminders for judgement calls for which no one in the team remembers the rationale.

Structural Elements

Typically, these notebooks have three structural components:

  1. YAML
    • Pronounced ‘YEAH-mull
  2. Plain text
    • Possibly formatted with Markdown syntax
  3. Embedded code chunks

The YAML (Yet Another Markup Language) defines document-level metadata. Most fundamentally, the YAML defines what file type will be produced when the report is rendered. It can also be used to define the top-level title, author, and date information. Finally, it can change the default options for code chunks throughout the document (more on code chunk options elsewhere).

Different notebook file types will specify the YAML differently but in both Quarto documents and R Markdown files, the YAML is defined in the first few lines of the report and starts/ends with a line containing three hyphens. This looks something like this:

---
title: "Reproducible Reports"
output: html_document
---

The text outside of the YAML and code chunks is plain text that accepts Markdown syntax to accomplish text format tweaks. Dedicated text-formatting software (e.g., Microsoft Word, Gmail, etc.) provides buttons and hot keys for these sorts of format alterations but many programming IDEs do not provide such user interface elements.1 Markdown syntax is used to support this same functionality.

Some fundamental Markdown options include:

  • **bold text** bold text
  • _italic text_ italic text
  • `code text` code text
  • [hyperlinked text](https://lter.github.io/ssecr/mod_reports.html) hyperlinked text

For a more complete glossary of fundamental Markdown syntax options see here. You may also want to explore the ‘back end’ of this course’s website as every page is built using computational notebook files.

The code chunks embedded in notebooks are essentially a fragmented script containing runable code. These chunks may contain code and/or comments and share an environment with one another when rendered (i.e., if you load a particular library in one chunk you’ll be able to use functions from that library in subsequent chunks). When used in concert with the Markdown text in a given notebook the code chunks can be used to effectively demonstrate a workflow while providing as much human-readable context as is desired.

In Quarto documents, code chunks look like this2:

\```{r demo-chunk}
#| echo: true

# Round pi to 2 digits
round(x = pi, digits = 2)
\```
1
You may specify chunk-specific options using this syntax (i.e., #| option_name: option_setting). Options you want to apply to all chunks across a notebook should be specified once in the YAML and can exclude the leading #| bit of the format.
2
If your Markdown text provides sufficient context you may exclude comments in code chunks but opinions differ on which is “more” appropriate
[1] 3.14

Script vs. Report Decision

See here for more information.

Applications

  • Static PDF/HTML files
  • Full manuscripts
  • Deployed website

Deployment & GitHub

Additional Resources

Papers & Documents

Workshops & Courses

Websites

Footnotes

  1. Note that with the addition of the “Visual” tab in RStudio there are button options for many text format changes. Markdown syntax is still useful to know for general knowledge reasons though!↩︎

  2. Normally code chunks start and end with three backticks (```) but to embed this code chunk example we need to “escape” the first backtick (using a backslash) so that the notebook interprets it correctly.↩︎