Standardize a Single Dataset via a Column Key

A "column key" is meant to streamline harmonization of disparate datasets. This key must include three columns containing: (1) the name of each raw data file to be harmonized, (2) the name of all of the columns in each of those files, and (3) the "tidy name" that corresponds to each raw column name. This function accepts that key and a list of datasets that can be standardized with that key. The function standardizes the specified dataset out of any number of datasets in the key or list. While usable on its own, this function is intended to streamline internal operations of ltertools::harmonize – which is the recommended tool for key-based harmonization.

Usage

standardize(focal_file = NULL, key = NULL, df_list = NULL)

Arguments

focal_file: (character) filename corresponding to one value of "source" column of "key" data and to one name in "df_list".
key: (dataframe) key object including a "source", "raw_name" and "tidy_name" column. Additional columns are allowed but ignored
df_list: (list) named list of dataframe-like objects where each name is the filename initially containing that data

Value

(dataframe) single standardized dataframe including all columns defined in the "tidy_name" column of the key object

Examples

#' # Generate two simple tables
## Dataframe 1
df1 <- data.frame("xx" = c(1:3),
                  "unwanted" = c("not", "needed", "column"),
                  "yy" = letters[1:3])
## Dataframe 2
df2 <- data.frame("LETTERS" = letters[4:7],
                  "NUMBERS" = c(4:7),
                  "BONUS" = c("plantae", "animalia", "fungi", "protista"))

# Generate a local folder for exporting
temp_folder <- tempdir()

# Export both files to that folder
utils::write.csv(x = df1, file = file.path(temp_folder, "df1.csv"), row.names = FALSE)
utils::write.csv(x = df2, file = file.path(temp_folder, "df2.csv"), row.names = FALSE)

# Read in list of these data files
data_list <- ltertools::read(raw_folder = temp_folder, data_format = "csv")
 
# Generate a column key object manually
key_obj <- data.frame("source" = c(rep("df1.csv", 3), 
                                   rep("df2.csv", 3)),
                      "raw_name" = c("xx", "unwanted", "yy",
                                     "LETTERS", "NUMBERS", "BONUS"),
                    "tidy_name" = c("numbers", NA, "letters",
                                    "letters", "numbers", "kingdom"))
# Standardize one dataset
ltertools::standardize(focal_file = "df1.csv", key = key_obj, df_list = data_list)      
#>    source numbers   <NA> letters
#> 1 df1.csv       1    not       a
#> 2 df1.csv       2 needed       b
#> 3 df1.csv       3 column       c