# load required library
library(tidyverse)
library(colorspace)
<- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
temperatures mutate(
location = factor(
levels = c("Death Valley", "Houston", "San Diego", "Chicago")
location,
)%>%
) select(location, day_of_year, month, temperature)
<- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
temps_months group_by(location, month_name) %>%
summarize(mean = mean(temperature)) %>%
mutate(
month = factor(
month_name,levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
),location = factor(
levels = c("Death Valley", "Houston", "San Diego", "Chicago")
location,
)%>%
) select(-month_name)
Customizing data visualizations using colorspace
, ggplot2
, and patchwork
Learn how to change ggplot’s default choices for color and style and take control of final figure presentation.
Color Scales
Coding exercise 7.1
In this worksheet, we will discuss how to change and customize color scales.
We will be using the R package tidyverse, which includes ggplot()
and related functions. We will also be using the R package colorspace for the scale functions it provides.
We will be working with the dataset temperatures
that we have used in previous worksheets. This dataset contains the average temperature for each day of the year for four different locations.
temperatures
# A tibble: 1,464 × 4
location day_of_year month temperature
<fct> <dbl> <chr> <dbl>
1 Death Valley 1 01 51
2 Death Valley 2 01 51.2
3 Death Valley 3 01 51.3
4 Death Valley 4 01 51.4
5 Death Valley 5 01 51.6
6 Death Valley 6 01 51.7
7 Death Valley 7 01 51.9
8 Death Valley 8 01 52
9 Death Valley 9 01 52.2
10 Death Valley 10 01 52.3
# ℹ 1,454 more rows
We will also be working with an aggregated version of this dataset called temps_months
, which contains the mean temperature for each month for the same locations.
temps_months
# A tibble: 48 × 3
# Groups: location [4]
location mean month
<fct> <dbl> <fct>
1 Chicago 50.4 Apr
2 Chicago 74.1 Aug
3 Chicago 29 Dec
4 Chicago 28.9 Feb
5 Chicago 24.8 Jan
6 Chicago 75.8 Jul
7 Chicago 71.0 Jun
8 Chicago 38.8 Mar
9 Chicago 60.9 May
10 Chicago 41.6 Nov
# ℹ 38 more rows
As a challenge, try to create this above table yourself using group_by()
and summarize()
like we learned about Wednesday., and then make a month
column which is a factor with levels froing from “Jan” to “Dec”, and make the location column a factor with levels “Death Valley”, “Houston”, “San Diego”, “Chicago”. If you are having trouble, the solution is at the end of this page, make sure you copy it into your code so the rest of the exercise works.
# check solution at the end before moving on!
<- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
temps_months group_by(___) %>%
summarize(___) %>%
mutate(
month = factor(
month_name,
___
),location = factor(
location, ___
)%>%
) select(-month_name)
Built in ggplot2 color scales
We will start with built-in ggplot2 color scales, which require no additional packages. The scale functions are always named scale_color_*()
or scale_fill_*()
, depending on whether they apply to the color
or fill
aesthetic. The *
indicates some other words specifying the type of the scale, for example scale_color_brewer()
or scale_color_distiller()
for discrete or continuous scales from the ColorBrewer project, respectively. You can find all available built-in scales here: https://ggplot2.tidyverse.org/reference/index.html#section-scales
Now consider the following plot.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE)
If you wanted to change the color scale to one from the ColorBrewer project, which scale function would you have to add? scale_color_brewer()
, scale_color_distiller()
, scale_fill_brewer()
, scale_fill_distiller()
?
# answer the question above to yourself
Now try this out.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
___
Most color scale functions have additional customizations. How to use them depends on the specific scale function. For the ColorBrewer scales you can set direction = 1
or direction = -1
to set the direction of the scale (light to dark or dark to light). You can also set the palette via a numeric argument, e.g. palette = 1
, palette = 2
, palette = 3
etc.
Try this out by setting the direction of the scale from light to dark and using palette #4.
# build all the code for this exercise
A popular set of scales are the viridis scales, which are provided by scale_*_viridis_c()
for continuous data and scale_*_viridis_d()
for discrete data. Change the above plot to use a viridis scale.
# build all the code for this exercise
The viridis scales can be customized with direction
(as before), option
(which can be "A"
, "B"
, "C"
, "D"
, or "E"
), and begin
and end
which are numerical values between 0 and 1 indicating where in the color scale the data should begin or end. For example, begin = 0.2
means that the lowest data value is mapped to the 20th percentile in the scale.
Try different choices for option
, begin
, and end
to see how they change the plot.
# build all the code for this exercise
Customizing scale title and labels
In a previous worksheet, we used arguments such as name
, breaks
, labels
, and limits
to customize the axis. For color scales, instead of an axis we have a legend, and we can use the same arguments inside the scale function to customize how the legend looks.
Try this out. Set the scale limits from 10 to 110 and set the name of the scale and the breaks as you wish.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c(
name = ___,
breaks = ___,
limits = ___
)
Note: Color scales ignore the expand
argument, so you cannot use it to expand the scale beyond the data values as you can for position scales.
Binned scales
Research into human perception has shown that continuous coloring can be difficult to interpret. Therefore, it is often preferable to use a small number of discrete colors to indicate ranges of data values. You can do this in ggplot with binned scales. For example, scale_fill_viridis_b()
provides a binned version of the viridis scale. Try this out.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
___
You can change the number of bins by providing the n.breaks
argument or alternatively by setting breaks explicitly. Try this out.
# build all the code for this exercise
Scales from the colorspace package
The color scales provided by the colorspace package follow a simple naming scheme of the form scale_<aesthetic>_<datatype>_<colorscale>()
, where <aesthetic>
is the name of the aesthetic (fill
, color
, colour
), <datatype>
indicates the type of variable plotted (discrete
, continuous
, binned
), and colorscale
stands for the type of the color scale (qualitative
, sequential
, diverging
, divergingx
).
For the mean temperature plot we have been using throughout this worksheet, which 2 color scales from the colorspace package is/are appropriate?
scale_fill_binned_sequential()
, scale_fill_discrete_qualitative()
, scale_fill_continuous_sequential()
, scale_color_continuous_sequential()
, scale_color_continuous_diverging()
>
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
___
You can customize the colorspace scales with the palette
argument, which takes the name of a palette (e.g., "Inferno"
, "BluYl"
, "Lajolla"
). Try this out. Also try reversing the scale direction with rev = TRUE
or rev = FALSE
. (The colorspace scales use rev
instead of direction
.) You can find the names of all supported scales here (consider specifically single-hue and multi-hue sequential palettes).
# build all the code for this exercise
You can also use begin
and end
just like in the viridis scales.
Manual scales
For discrete data with a small number of categories, it’s usually best to set colors manually. This can be done with the scale functions scale_*_manual()
. These functions take an argument values
that specifies the color values to use.
To see how this works, let’s go back to this plot of temperatures over time for four locations.
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(size = 1.5)
Let’s use the following four colors: "gold2"
, "firebrick"
, "blue3"
, "springgreen4"
. We can visualize this palette using the function swatchplot()
from the colorspace package.
::swatchplot(c("gold2", "firebrick", "blue3", "springgreen4")) colorspace
Now apply this color palette to the temperatures plot, by using the manual color scale. Hint: use the values
argument to provide the colors to the manual scale.
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(size = 1.5) +
___
One problem with this approach is that we can’t easily control which data value gets assigned to which color. What if we wanted San Diego to be shown in green and Chicago in blue? The simplest way to resolve this issue is to use a named vector. A named vector in R is a vector where each value has a name. Named vectors are created by writing c(name1 = value1, name2 = value2, ...)
. See the following example.
# regular vector
c("cat", "mouse", "house")
[1] "cat" "mouse" "house"
# named vector
c(A = "cat", B = "mouse", C = "house")
A B C
"cat" "mouse" "house"
The names in the second example are A
, B
, and C
. Notice that the names are not in quotes. However, if you need a name containing a space (such as Death Valley
), you need to enclose the name in backticks. Thus, our named vector of colors could be written like so:
c(`Death Valley` = "gold2", Houston = "firebrick", Chicago = "blue3", `San Diego` = "springgreen4")
Death Valley Houston Chicago San Diego
"gold2" "firebrick" "blue3" "springgreen4"
Now try to use this color vector in the figure showing temperatures over time.
# build all the code for this exercise
Try some other colors also. For example, you could use the Okabe-Ito colors:
# Okabe-Ito colors
::swatchplot(c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7", "#000000")) colorspace
Alternatively, you can find a list of all named colors here. You can also run the command colors()
in your R console to get a list of all available color names.
Hint: It’s a good idea to never use the colors "red"
, "green"
, "blue"
, "cyan"
, "magenta"
, "yellow"
. They are extreme points in the RGB color space and tend to look unnatural and too crazy. Try this by making a swatch plot of these colors, and compare for example to the color scale containing the colors "firebrick"
, "springgreen4"
, "blue3"
, "turquoise3"
, "darkorchid2"
, "gold2"
. Do you see the difference?
# build all the code for this exercise
Solution to the challenge to make the summary table of mean temperature by month:
# paste this below the "temperatures" code-chunk
<- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
temps_months group_by(location, month_name) %>%
summarize(mean = mean(temperature)) %>%
mutate(
month = factor(
month_name,levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
),location = factor(
levels = c("Death Valley", "Houston", "San Diego", "Chicago")
location,
)%>%
) select(-month_name)
Figure Design
Coding exercise 7.2
In this worksheet, we will discuss how to change and customize themes.
We will be using the R package tidyverse, which includes ggplot()
and related functions. We will also be using the packages cowplot for themes and the package palmerpenguins for the penguins
dataset.
# load required library
library(tidyverse)
library(cowplot)
library(palmerpenguins)
We will be working with the dataset penguins
containing data on individual penguins on Antarctica.
penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
Ready-made themes
Let’s start with this simple plot with no specific styling.
ggplot(penguins, aes(flipper_length_mm, body_mass_g, color = species)) +
geom_point(na.rm = TRUE) # na.rm = TRUE prevents warning about missing values
The default ggplot theme is theme_gray()
. Verify that adding this theme to the plot makes no difference in the output. Then change the overall font size by providing the theme function with a numeric font size argument, e.g. theme_gray(16)
.
ggplot(penguins, aes(flipper_length_mm, body_mass_g, color = species)) +
geom_point(na.rm = TRUE) +
___
The ggplot2 package has many built-in themes, including theme_minimal()
, theme_bw()
, theme_void()
, theme_dark()
. Try these different themes on the above plot. Also try again changing the font size. You can see all themes provided by ggplot2 here: https://ggplot2.tidyverse.org/reference/ggtheme.html
# build all the code for this exercise
Many other packages also provide themes. For example, the cowplot package provides themes theme_half_open()
, theme_minimal_grid()
, theme_minimal_hgrid()
, and theme_minimal_vgrid()
. You can see all cowplot themes here: https://wilkelab.org/cowplot/articles/themes.html
# build all the code for this exercise
Compare the visual appearance of theme_minimal()
from ggplot2 to theme_minimal_grid()
from cowplot. What similarities and differences to you notice? Which do you prefer? (There is no correct answer here, just be aware of the differences and of your preferences.)
# build all the code for this exercise
Modifying theme elements
You can modify theme elements by adding a theme()
call to the plot. Inside the theme()
call you specify which theme element you want to modify (e.g., axis.title
, axis.text.x
, panel.background
, etc) and what changes you want to make. For example, to make axis titles blue, you would write:
theme(
axis.title = element_text(color = "blue")
)
There are many theme settings, and for each one you need to know what type of an element it is (element_text()
, element_line()
, element_rect()
for text, lines, or rectangles, respectively). A complete description of the available options is available at the ggplot2 website: https://ggplot2.tidyverse.org/reference/theme.html
Here, we will only try a few simple things. For example, see if you can make the legend title blue and the legend text red.
# make the legend title blue and the legend text red
ggplot(penguins, aes(flipper_length_mm, body_mass_g, color = species)) +
geom_point(na.rm = TRUE)
Now color the area behind the legend in "aliceblue"
. Hint: The theme element you need to change is called legend.background
. There is also an element legend.box.background
but it is only visible if legend.background
is not shown, and in the default ggplot2 themes that is not the case.
# build all the code for this exercise
Another commonly used feature in themes are margins. Many parts of the plot theme can understand customized margins, which control how much spacing there is between different parts of a plot. Margins are typically specified with the function margin()
, which takes four numbers specifying the margins in points, in the order top, right, bottom, left. So, margin(10, 5, 5, 10)
would specify a top margin of 10pt, a right margin of 5pt, a bottom margin of 5pt, and a left margin of 10pt.
Try this out by setting the legend margin (element legend.margin
) such that there is no top and no bottom margin but 10pt left and right margin.
# build all the code for this exercise
There are many other things you can do. Try at least some of the following:
- Change the horizontal or vertical justification of text with
hjust
andvjust
. - Change the font family with
family
.1 - Change the panel grid. For example, create only horizontal lines, or only vertical lines.
- Change the overall margin of the plot with
plot.margin
. - Move the position of the legend with
legend.position
andlegend.justification
. - Turn off some elements by setting them to
element_blank()
.
1 Getting fonts to work well can be tricky in R. Which specific fonts work depends on the graphics device and the operating system. You can try some of the following fonts and see if they work on app.terra.bio: "Palatino"
, "Times"
, "Helvetica"
, "Courier"
, "ITC Bookman"
, "ITC Avant Garde Gothic"
, "ITC Zapf Chancery"
.
Writing your own theme
You can write a theme by
<-
theme_colorful theme_bw() +
theme(
text = element_text(color = "mediumblue"),
axis.text = element_text(color = "springgreen4"),
legend.text = element_text(color = "firebrick4")
)
Hint: Do you have to add theme_colorful
or theme_colorful()
? Do you understand which option is correct and why? If you are unsure, try both and see what happens.
# build all the code for this exercise
Now write your own theme and then add it to the penguing plot.
# build all the code for this exercise
Compound Figures
Coding exercise 7.3
In this worksheet, we will discuss how to combine several ggplot2 plots into one compound figure.
We will be using the R package tidyverse, which includes ggplot()
and related functions. We will also be using the package patchwork for plot composition.
# load required library
library(tidyverse)
library(patchwork)
We will be working with the dataset mtcars
, which contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Combining plots
First we set up four different plots that we will subsequently combine. The plots are stored in variables p1
, p2
, p3
, p4
.
<- ggplot(mtcars) +
p1 geom_point(aes(mpg, disp))
p1
<- ggplot(mtcars) +
p2 geom_boxplot(aes(gear, disp, group = gear))
p2
<- ggplot(mtcars) +
p3 geom_smooth(aes(disp, qsec))
p3
<- ggplot(mtcars) +
p4 geom_bar(aes(carb))
p4
To show plots side-by-side, we use the operator |
, as in p1 | p2
. Try this by making a compound plot of plots p1
, p2
, p3
side-by-side.
# build all the code for this exercise
To show plots on top of one-another, we use the operator /
, as in p1 / p2
. Try this by making a compound plot of plots p1
, p2
, p3
on top of each other.
# build all the code for this exercise
We can also use parentheses to group plots with respect to the operators |
and /
. For example, we can place several plots side-by-side and then place this entire row of plots on top of another plot. Try putting p1
, p2
, p3
, on the top row, and p4
on the bottom row.
/ ___ (___)
Plot annotations
The patchwork package provides a powerful annotation system via the plot_annotation()
function that can be added to a plot assembly. For example, we can add plot tags (the labels in the upper left corner identifying the plots) via the plot annotation tag_levels
. You can set tag_levels = "A"
to generate tags A, B, C, etc. Try this out.
| p2 | p3 ) / p4 (p1
Try also tag levels such as "a"
, "i"
, or "1"
.
You can also add elements such as titles, subtitles, and captions, by setting the title
, subtitle
, or caption
argument in plot_annotation()
. Try this out by adding an overall title to the figure from the previous exercise.
# build all the code for this exercise
Also set a subtitle and a caption.
Finally, you can change the theme of all plots in the plot assembly via the &
operator, as in (p1 | p2) & theme_bw()
. Try this out.
# build all the code for this exercise
What happens if you write this expression without parentheses? Do you understand why?
(Big) Challenge Problem:
If you have time this morning, or if you want to work on it this afternoon, try analyzing a new dataset to test your R skills. First, learn what the columns mean, what missing values you see, and then start to ask questions about patterns in the data by making plots.
You can browse the datasets at https://github.com/rfordatascience/tidytuesday/tree/master/data (click on a year folder and go to the README to read about the datasets). Or pick from one of the ones below:
# fish consumption in different countries
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-12/fish-and-seafood-consumption-per-capita.csv')
consumption
# world cup Cricket matches from 1996 to 2005
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-11-30/matches.csv')
matches
# malaria deaths by age across the world and time.
<- readr::read_csv('https://github.com/rfordatascience/tidytuesday/blob/master/data/2018/2018-11-13/malaria_deaths_age.csv')
malaria_deaths
#meteorites and/or volcanos:
# note to plot a map, try the following:
<- ggplot2::map_data("world")
countries_map <-ggplot() +
world_mapgeom_map(data = countries_map,
map = countries_map,aes(x = long, y = lat, map_id = region, group = group),
fill = "white", color = "black", size = 0.1) # then add geom_point() to this map to add lat/long points
<- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-11/meteorites.csv")
meteorites <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-12/volcano.csv")
volcano
# or pick any dataset from https://github.com/rfordatascience/tidytuesday/tree/master/data ,
# just click on a year folder and go to the README to read about the datasets
# if you have trouble loading a dataset there, ask for help!