05-viz

Author
Affiliation

Professor Shannon Ellis

UC San Diego
COGS 137 - Fall 2024

Published

October 17, 2024

Effective Data Visualization

Slides modified from datascienceinabox.org

Q&A

Q: Does ggplot2 allow access to the same color schemes that we use in Python? I’m familiar with using Matplot.lib, but it seems slightly different for ggplot2.
A: You can directly control color in R. We’ll discuss that, but you can also use a ton of available color palletes. See here.You’ll find what you largely have in matplotlib but also additional palettes.

Q: What are some common aesthetics options for graphs?
A: Discussing this today!

Q: Why in the world anyone would think that %>% is an acceptable operator
A: Hadley Wickham (the developer) has made SO many incredible updates to the consistency and syntax in R. This was just NOT one of them.

Course Announcements

Due Dates:

  • Lab 02 due tonight (11:59 PM)
    • Group re-coded as with WB/OF <- removed from instructions
  • Lecture Participation survey “due” after class

Notes:

  • Lab03 now available
  • Lab01 scores posted; Scores on Canvas; feedback as “issue” on GH
  • Case Study Groups will be sent tomorrow; will discuss case study in class on Tues.

Suggested Reading

Keep it simple

Use color to draw attention

Tell a story

Credit: Angela Zoss and Eric Monson, Duke DVS

Principles for effective visualizations

Principles for effective visualizations

  • Order matters
  • Put long categories on the y-axis
  • Keep scales consistent
  • Select meaningful colors
  • Use meaningful and nonredundant labels

Data

In September 2019, YouGov survey asked 1,639 GB adults the following question:

In hindsight, do you think Britain was right/wrong to vote to leave EU?

  • Right to leave
  • Wrong to leave
  • Don’t know

Source: YouGov Survey Results, retrieved Oct 7, 2019

The Data: Code

brexit <- tibble(
  opinion = c(
    rep("Right", 664), rep("Wrong", 787), rep("Don't know", 188)
  ),
  region = c(
    rep("london", 63), rep("rest_of_south", 241), rep("midlands_wales", 145), rep("north", 176), rep("scot", 39),
    rep("london", 110), rep("rest_of_south", 257), rep("midlands_wales", 152), rep("north", 176), rep("scot", 92),
    rep("london", 24), rep("rest_of_south", 49), rep("midlands_wales", 57), rep("north", 48), rep("scot", 10)
  )
)

Order matters

Alphabetical is rarely ideal

ggplot(brexit, aes(x = opinion)) +
  geom_bar()

Order by frequency

fct_infreq: Reorder factors’ levels by frequency

ggplot(brexit, aes(x = fct_infreq(opinion))) + 
  geom_bar()

Clean up labels

ggplot(brexit, aes(x = opinion)) +
  geom_bar() +
  labs( 
    x = "Opinion", 
    y = "Count" 
  ) 

Avoiding Alphabetical Order

ggplot(brexit, aes(x = region)) +
  geom_bar()

Use inherent level order

fct_relevel: Reorder factor levels using a custom order

brexit <- brexit |>
  mutate(
    region = fct_relevel( 
      region,
      "london", "rest_of_south", "midlands_wales", "north", "scot"
    )
  )


Clean up labels

fct_recode: Change factor levels by hand

brexit <- brexit |>
  mutate(
    region = fct_recode( 
      region,
      London = "london",
      `Rest of South` = "rest_of_south",
      `Midlands / Wales` = "midlands_wales",
      North = "north",
      Scotland = "scot"
    )
  )

Put long categories on the y-axis

Long categories can be hard to read

Move them to the y-axis

ggplot(brexit, aes(y = region)) + 
  geom_bar()

And reverse the order of levels

fct_rev: Reverse order of factor levels

ggplot(brexit, aes(y = fct_rev(region))) + 
  geom_bar()

Clean up labels

ggplot(brexit, aes(y = fct_rev(region))) +
  geom_bar() +
  labs( 
    x = "Count", 
    y = "Region" 
  ) 

Pick a purpose

Segmented bar plots can be hard to read

ggplot(brexit, aes(y = region, fill = opinion)) + 
  geom_bar()

Use facets

ggplot(brexit, aes(y = opinion, fill = region)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) 

Avoid redundancy?

Redundancy can help tell a story

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1)

Be selective with redundancy

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") 

Use informative labels

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?", 
    x = NULL, y = NULL
  )

A bit more info

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?",
    subtitle = "YouGov Survey Results, 2-3 September 2019", 
    caption = "Source: https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/x0msmggx08/YouGov%20-%20Brexit%20and%202019%20election.pdf", 
    x = NULL, y = NULL
  )

Let’s do better

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?",
    subtitle = "YouGov Survey Results, 2-3 September 2019",
    caption = "Source: bit.ly/2lCJZVg", 
    x = NULL, y = NULL
  )

Fix up facet labels

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region,
    nrow = 1,
    labeller = label_wrap_gen(width = 12) 
  ) + 
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?",
    subtitle = "YouGov Survey Results, 2-3 September 2019",
    caption = "Source: bit.ly/2lCJZVg",
    x = NULL, y = NULL
  )

Select meaningful colors

Rainbow colors not always best

Manually choose colors when needed

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c( 
    "Wrong" = "red", 
    "Right" = "green", 
    "Don't know" = "gray" 
  )) 

Choosing better colors

colorbrewer2.org

Use better colors

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c(
    "Wrong" = "#ef8a62", 
    "Right" = "#67a9cf", 
    "Don't know" = "gray" 
  ))

Select theme

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c("Wrong" = "#ef8a62",
                               "Right" = "#67a9cf",
                               "Don't know" = "gray")) +
  theme_minimal() 

ggthemes described here

Customize theme

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c("Wrong" = "#ef8a62",
                               "Right" = "#67a9cf",
                               "Don't know" = "gray")) +
  theme_minimal(base_size = 16) + 
  theme(plot.title.position = "plot", 
        panel.grid.major.y = element_blank()) 

Your Turn

  • Read in the data (Data)
  • Think of at least three different ways to tell slightly different stories with these data
  • Try to implement at least one of these ideas!

Recap

  • Can you determine what needs to be done to improve the effectiveness of your visualizations?
  • Can you execute said improvements using ggplot2?
  • Can you tell a story with data?