Lab 02 - CS01 Data Wrangling

Published

October 17, 2024

Introduction

The main goals of this lab are for you to 1) get more familiar with the CS01 data and 2) for you to have one clean dataset by the end of lab - combining the blood, oral fluid, and blood datasets.

Getting started

To get started, accept the lab02 assignment (link on Canvas), clone the repo (using SSH) into RStudio on datahub. And, then you’re ready to go!

Packages

The only package required for completion of this lab is tidyverse, as dplyr (which you’ll be using a lot in this lab) is one of the packages in the tidyverse. Be sure to import the tidyverse prior to completing the lab.

Data

The three data files you’ll be using in this lab have been provided in the data/ folder. You’ll see the following three lines in the provided .Rmd template for this lab, which, when run, will read each of the files into R for you:

WB <- read_csv("data/WB")
OF <- read_csv("data/OF.csv")
BR <- read_csv("data/Breath.csv")

Exercises

Exercise 1: Whole Blood

Using the CS01 wrangling code from the dplyr lecture here, wrangle the whole blood data as we did in class.

Exercise 2: Oral Fluid

In a similar manner to how the whole blood data were wrangled and cleaned, carry out wrangling on the oral fluid dataset, accomplishing the following:

  • Treatment re-coded and re-leveled as was done with WB
  • Group re-coded to match the levels in WB (Frequent user; Occasional user)
  • column names cleaned using clean_names() as with WB
  • The following columns renamed: thcoh = x11_oh_thc; thcv = thc_v; fluid_type=fluid
  • The following timepoints (note these are different than WB): "pre-smoking","0-30 min","31-90 min","91-180 min", "181-210 min", "211-240 min","241-270 min", "271+ min"

Exercise 3: Breath

In a similar manner to both previous datatasets, carry out wrangling on the breath dataset, accomplishing the following:

  • Treatment re-coded and re-leveled as was done with WB/OF
  • Group re-coded to match the levels in WB (Frequent user; Occasional user)
  • column names cleaned using clean_names() as with WB/OF
  • The following columns renamed: thc = thc_pg_pad; fluid_type=fluid
  • The following timepoints (note these are different than WB/OF): "pre-smoking","0-40 min","41-90 min", "91-180 min", "181-210 min", "211-240 min", "241-270 min", "271+ min"

Exercise 4: Combine (time-permitting)

Combine all three datasets into a single dataframe. Note that there is a function in dplyr called bind_rows() for accomplishing such a task.

Save the file as a CSV file using the write_csv() function so that you have this file to start from in the future. (You do not need to submit this to GH; we’re doing it so that you have it.)

You’ll likely have to look up the documentation for each of these functions as they have not been directly covered in class yet.

Exercise 5: Summarize

In your own words (no code needed), describe the wrangling that was done overall, focusing on why it was done or what it achieved, rather than on specific lines of code.

Submit

Important

You’ll always want to knit your RMarkdown document to HTML and review that HTML document to ensure it includes all the information you want and looks as you intended, as we grade from the knit HTML.

Yay, you’re done! To finish up and submit, first knit your file to HTML. Be sure to select both your .Rmd and .html documents when choosing what to commit! Then, commit all remaining changes, use the commit message “Done with Lab 2! 💪”, and push. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.