Lab 02 - CS01 Data Wrangling
Introduction
The main goals of this lab are for you to 1) get more familiar with the CS01 data and 2) for you to have one clean dataset by the end of lab - combining the blood, oral fluid, and blood datasets.
Getting started
To get started, accept the lab02 assignment (link on Canvas), clone the repo (using SSH) into RStudio on datahub. And, then you’re ready to go!
Packages
The only package required for completion of this lab is tidyverse
, as dplyr
(which you’ll be using a lot in this lab) is one of the packages in the tidyverse
. Be sure to import the tidyverse
prior to completing the lab.
Data
The three data files you’ll be using in this lab have been provided in the data/
folder. You’ll see the following three lines in the provided .Rmd template for this lab, which, when run, will read each of the files into R for you:
<- read_csv("data/WB")
WB <- read_csv("data/OF.csv")
OF <- read_csv("data/Breath.csv") BR
Exercises
Exercise 1: Whole Blood
Using the CS01 wrangling code from the dplyr
lecture here, wrangle the whole blood data as we did in class.
Exercise 2: Oral Fluid
In a similar manner to how the whole blood data were wrangled and cleaned, carry out wrangling on the oral fluid dataset, accomplishing the following:
Treatment
re-coded and re-leveled as was done with WBGroup
re-coded to match the levels in WB (Frequent user; Occasional user)- column names cleaned using
clean_names()
as with WB - The following columns renamed:
thcoh = x11_oh_thc
;thcv = thc_v
;fluid_type=fluid
- The following
timepoints
(note these are different than WB):"pre-smoking","0-30 min","31-90 min","91-180 min", "181-210 min", "211-240 min","241-270 min", "271+ min"
Exercise 3: Breath
In a similar manner to both previous datatasets, carry out wrangling on the breath dataset, accomplishing the following:
Treatment
re-coded and re-leveled as was done with WB/OFGroup
re-coded to match the levels in WB (Frequent user; Occasional user)- column names cleaned using
clean_names()
as with WB/OF - The following columns renamed:
thc = thc_pg_pad
;fluid_type=fluid
- The following
timepoints
(note these are different than WB/OF):"pre-smoking","0-40 min","41-90 min", "91-180 min", "181-210 min", "211-240 min", "241-270 min", "271+ min"
Exercise 4: Combine (time-permitting)
Combine all three datasets into a single dataframe. Note that there is a function in dplyr
called bind_rows()
for accomplishing such a task.
Save the file as a CSV file using the write_csv()
function so that you have this file to start from in the future. (You do not need to submit this to GH; we’re doing it so that you have it.)
You’ll likely have to look up the documentation for each of these functions as they have not been directly covered in class yet.
Exercise 5: Summarize
In your own words (no code needed), describe the wrangling that was done overall, focusing on why it was done or what it achieved, rather than on specific lines of code.
Submit
You’ll always want to knit your RMarkdown document to HTML and review that HTML document to ensure it includes all the information you want and looks as you intended, as we grade from the knit HTML.
Yay, you’re done! To finish up and submit, first knit your file to HTML. Be sure to select both your .Rmd and .html documents when choosing what to commit! Then, commit all remaining changes, use the commit message “Done with Lab 2! 💪”, and push. Before you wrap up the assignment, make sure all documents are updated on your GitHub repo.