Lecture 00
UC San Diego
COGS 137 - Fall 2024
Practical Data Science in R
Please take one green sticky and one pink sticky as they come around. If you’re able, try and save these. We’ll use them most classes. (But, I’ll always have extra!)
: R is a statistical programming language.
While R has most/all of the functionality of YFPL (your favorite programming language), it was designed for the specific use of analyzing data.
: Data science is the scientific process of using data to answer interesting questions and/or solve important problems.
Shannon Ellis: Associate Teaching Professor, Mom & wife, volleyball-obsessed, and baking & cooking lover
sellis@ucsd.edu
shannon-ellis.com
CSB 002
Tu/Th 2-3:20PM (Lab: Fri 2-2:50 (RWAC 0103) or Mon 5-5:50 (DIB 121))
Quirine (Q) van-Engen (TA) | Eric Song (IA) |
---|---|
Everything you want to know about the course, and everything you will need for the course will be posted at: https://cogs137-fa24.github.io/cogs137-fa24/
Nope! The first few weeks of the course will be all about getting comfortable using the R programming language!
After that, we’ll focus on delving into interesting statistical analyses through case studies.
Artwork by @allison_horst
Note: This course has historically been back-loaded, as that’s when group work has happened historically. I’ve tried to fix that this quarter by teaching from case studies from the beginning.
Class Meetings
In-person, synchronous learning
The (Dreaded) Waitlist
Lab & Office Hours
Course Materials
Q: Which Q&A Platform should we use this quarter? Discuss with your neighbor!
A: Piazza, ClassQuestion, Slack, Canvas? Something else?
Put a green sticky on the front of your computer when you’re done discussing. Put a pink if you have a question.
Goal: every student be well-served by this course
Philosophy: The diversity of students in this class is a huge asset to our learning community; our differences provide opportunities for learning and understanding.
Plan: Present course materials that are conscious of and respectful to diversity (gender identity, sexuality, disability, age, socioeconomic status, ethnicity, race, nationality, religion, politics, and culture)
But… if I ever fall short or if you ever have suggestions for improvement, please do share with me! There is also an anonymous Google Form if you’re more comfortable there.
(based on feedback):
A few (Q&A) guidelines:
1. No duplicates.
2. Public posts are best.
3. Posts should include your question, what you've tried so far, & resources used.
4. Helping others is encouraged.
5. No assignment code in public posts.
6. We're not robots.
Artwork by @allison_horst
Don’t cheat.
Teamwork is allowed, but you should be able to answer “Yes” to each of the following:
The Internet (including LLMS/ChatGPT) is a great resource. Cite your sources.
For anything in this course.
Probably never first or right away.
To learn: Think first. Try first. Then use external resources.
Always read/think about/understand the output.
Your final grade will be comprised of the following:
Assignment (#) | % of grade |
---|---|
Labs (7) | 21% (3pt each) |
Homework (3) | 24% (8pt each) |
Case Study Projects* (2) | 30% (15pt each) |
Final Project Proposal* (1) | 3% |
Peer Review* (1) | 3% |
Final Report* (1) | 11% |
Final Presentation* (1) | 4% |
Team Evaluation Surveys (3) | 3% (1pt each) |
Homework and case study projects: accepted up to 3 days (72 hours) after the assigned deadline for a 25% deduction
No late deadlines for labs or the final project
Note: Prof Ellis is a reasonable person; reach out to her if you have an extenuating circumstance at any point in the quarter.
Datahub is a platform hosted by UCSD that gives students access to computational resources.
This means that while you’ll be typing on your keyboard, you’ll be using UCSD’s computers in this class.
Website: https://datahub.ucsd.edu/
Launch Environment
When working on “stuff” for this course, select the COGS 137 environment.
Q: Do I have to use datahub?
A: Nope. You could download and install all the packages we use and complete the course locally! However, many packages have already been installed for you on datahub, so it will be a tiny bit more work up front…but you won’t be dependent on the internet/datahub!
Scriptability \(\rightarrow\) R
Literate programming (code, narrative, output in one place) \(\rightarrow\) R Markdown
Version control \(\rightarrow\) Git / GitHub
The Internet (Google/ChatGPT/etc.)
R & RStudio
Concepts introduced:
Your Turn
airquality
dataframeairquality
dataframePut a green sticky on the front of your computer when you’re done. Put a pink if you want help/have a question.
Keep the R Markdown cheat sheet and Markdown Quick Reference (Help -> Markdown Quick Reference) handy, we’ll refer to it often as the course progresses
The workspace of your R Markdown document is separate from the Console
with human readable messages
We’ll cover this time permitting, you’ll see it again in lab next week
Concepts introduced:
There is a bit more of GitHub that we’ll use in this class, but for today this is enough.
Consider ggplot2
(a package we’ll learn a lot)
Imagine: You’ve been asked to carry out a number of wrangling operations on a dataset and make a plot…
Can you answer these questions?
git
Resourcesgit
from the command line
git
(Part 1), by COGS 108 TA Ganesh (youtube, 22min tutorial)git
with GitHub Desktop, by COGS 108 TA Sidharth Suresh (youtube, 13min tutorial)Note: This code will not run for you because you don’t have access to the roster for this course.
(required)Student Survey - complete by Fri 10/4 at 11:59 PM.
This is required and completion will be used for CAA/#finaid. DO complete this even if you’re on the waitlist, please.
(optional) Daily Post-Lecture Feedback