Final Project

Published

December 10, 2024

Introduction

For the final project, you and your group mates (groups of 4-5 people) will carry out a data analysis on a topic of your choosing.

Each group will be provided with a private repo that all members as well as course instructional staff will have access to. Final projects will be “submitted” by pushing the project requirements to the group repo by the deadline.

Presentations will be submitted on Canvas OR completed in class during week 10. Groups that opt to present in class will 1) need to have all group members present and 2) receive extra credit

Written, visual, and presented content will be graded on their technical merits as well as the effectiveness of their communication.

The Project

Each group will carry out a full data science project. This will include question formation, finding the data, doing background research, wrangling the data, doing EDA, analyzing the data, and answering your question of interest.

You can think of this as a mini case report in the fact that the process is the same, but we would not expect the work completed to be as extensive as what you did in a case study. That said, we want to see demonstration of the skills you’ve learned in the class, so we will be looking for some data wrangling in your case study. If you have a single dataset that requires no wrangling, consider if additional datasets could be incorporated to answer your question(s) of interest more deeply.

You are strongly encouraged to think of your topic/question before looking for datasets. More interesting case studies start with the topic/question. Boring case studies look for the dataset first.

Deliverable: Report (.Rmd + HTML)

Your analysis will be submitted as an .Rmd document and rendered to HTML (both of which should be pushed to GitHub).

This will likely not be quite as long as a case study in this course, but will likely have the same sections.

Deliverable: Presentation

Students must present their case study in a presentation that is 3-5min long. What you use to visually support this presentation (slides, or something else) is up to you but should follow the effective communication aspects discussed in class. This presentation will be either 1) presented in class during week 10 (all members present) or 2) pre-recorded and submitted on Canvas (at least one group member must present the project; in other words, not everyone has to “speak” but everyone in the group is responsible for the contents).

Deliverable: General Communication

This will be a communication targeted to the general public (non-technical, non-data scientists) conveying the most important finding(s) from your project.

Group Feedback

There will be a form to submit upon submission of the final project to provide feedback about working with your group mates. As with the case studies, this is meant to motivate not scare. Most groups work out really really well and everyone contributes to the best of their ability. However, if and when that doesn’t happen, I want to be sure I’m aware of the circumstances and follow up as necessary.