Introduction
In this worksheet, we will discuss how to change and customize color scales.
We will be using the R package tidyverse, which
includes ggplot()
and related functions. We will also be
using the R package colorspace for the scale functions
it provides.
```{r library-calls}
# load required library
library(tidyverse)
library(colorspace)
temperatures <- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
mutate(
location = factor(
location, levels = c("Death Valley", "Houston", "San Diego", "Chicago")
)
) %>%
select(location, day_of_year, month, temperature)
temps_months <- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
group_by(location, month_name) %>%
summarize(mean = mean(temperature)) %>%
mutate(
month = factor(
month_name,
levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
),
location = factor(
location, levels = c("Death Valley", "Houston", "San Diego", "Chicago")
)
) %>%
select(-month_name)
```
We will be working with the dataset temperatures
that we
have used in previous worksheets. This dataset contains the average
temperature for each day of the year for four different locations.
```{r temperatures}
temperatures
```
# A tibble: 1,464 × 4
location day_of_year month temperature
<fct> <dbl> <chr> <dbl>
1 Death Valley 1 01 51
2 Death Valley 2 01 51.2
3 Death Valley 3 01 51.3
4 Death Valley 4 01 51.4
5 Death Valley 5 01 51.6
6 Death Valley 6 01 51.7
7 Death Valley 7 01 51.9
8 Death Valley 8 01 52
9 Death Valley 9 01 52.2
10 Death Valley 10 01 52.3
# … with 1,454 more rows
We will also be working with an aggregated version of this dataset
called temps_months
, which contains the mean temperature
for each month for the same locations.
```{r temps_months}
temps_months
```
# A tibble: 48 × 3
# Groups: location [4]
location mean month
<fct> <dbl> <fct>
1 Chicago 50.4 Apr
2 Chicago 74.1 Aug
3 Chicago 29 Dec
4 Chicago 28.9 Feb
5 Chicago 24.8 Jan
6 Chicago 75.8 Jul
7 Chicago 71.0 Jun
8 Chicago 38.8 Mar
9 Chicago 60.9 May
10 Chicago 41.6 Nov
# … with 38 more rows
As a challenge, try to create this above table yourself using
group_by()
and summarize()
like we learned
about Wednesday., and then make a month
column which is a
factor with levels froing from “Jan” to “Dec”, and make the location
column a factor with levels “Death Valley”, “Houston”, “San Diego”,
“Chicago”. If you are having trouble, the solution is at the end of this
page, make sure you copy it into your code so the rest of the exercise
works.
```{r temp_months_by_user}
# check solution at the end before moving on!
temps_months <- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
group_by(___) %>%
summarize(___) %>%
mutate(
month = factor(
month_name,
___
),
location = factor(
location, ___
)
) %>%
select(-month_name)
# solution
# paste this below the "temperatures" code-chunk
temps_months <- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
group_by(location, month_name) %>%
summarize(mean = mean(temperature)) %>%
mutate(
month = factor(
month_name,
levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
),
location = factor(
location, levels = c("Death Valley", "Houston", "San Diego", "Chicago")
)
) %>%
select(-month_name)
```
Built in ggplot2 color scales
We will start with built-in ggplot2 color scales,
which require no additional packages. The scale functions are always
named scale_color_*()
or scale_fill_*()
,
depending on whether they apply to the color
or
fill
aesthetic. The *
indicates some other
words specifying the type of the scale, for example
scale_color_brewer()
or
scale_color_distiller()
for discrete or continuous scales
from the ColorBrewer project, respectively. You can find all available
built-in scales here: https://ggplot2.tidyverse.org/reference/index.html#section-scales
Now consider the following plot.
```{r temps-months-default, fig.height = 3}
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE)
```
If you wanted to change the color scale to one from the ColorBrewer
project, which scale function would you have to add?
scale_color_brewer()
, scale_color_distiller()
,
scale_fill_brewer()
,
scale_fill_distiller()
?
# answer question above to yourself
#solution:
scale_fill_distiller()
Now try this out.
```{r temps-months-distiller, fig.height = 3}
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
___
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_distiller()
```
Most color scale functions have additional customizations. How to use
them depends on the specific scale function. For the ColorBrewer scales
you can set direction = 1
or direction = -1
to
set the direction of the scale (light to dark or dark to light). You can
also set the palette via a numeric argument,
e.g. palette = 1
, palette = 2
,
palette = 3
etc.
Try this out by setting the direction of the scale from light to dark and using palette #4.
```{r temps-months-distiller2, fig.height = 3}
# build all the code for this exercise
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_distiller(direction = 1, palette = 4)
```
A popular set of scales are the viridis scales, which are provided by
scale_*_viridis_c()
for continuous data and
scale_*_viridis_d()
for discrete data. Change the above
plot to use a viridis scale.
```{r temps-months-viridis, fig.height = 3}
# build all the code for this exercise
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c()
```
The viridis scales can be customized with direction
(as
before), option
(which can be "A"
,
"B"
, "C"
, "D"
, or
"E"
), and begin
and end
which are
numerical values between 0 and 1 indicating where in the color scale the
data should begin or end. For example, begin = 0.2
means
that the lowest data value is mapped to the 20th percentile in the
scale.
Try different choices for option
, begin
,
and end
to see how they change the plot.
```{r temps-months-viridis2, fig.height = 3}
# build all the code for this exercise
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c(option = "B", begin = 0.15)
```
Customizing scale title and labels
In a previous worksheet, we used arguments such as name
,
breaks
, labels
, and limits
to
customize the axis. For color scales, instead of an axis we have a
legend, and we can use the same arguments inside the scale function to
customize how the legend looks.
Try this out. Set the scale limits from 10 to 110 and set the name of the scale and the breaks as you wish.
```{r temps-months-viridis3, fig.height = 3}
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c(
name = ___,
breaks = ___,
limits = ___
)
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c(
name = "temperature (F)",
breaks = c(25, 50, 75, 100),
limits = c(10, 110)
)
```
Note: Color scales ignore the expand
argument, so you cannot use it to expand the scale beyond the data
values as you can for position scales.
Binned scales
Research into human perception has shown that continuous coloring can
be difficult to interpret. Therefore, it is often preferable to use a
small number of discrete colors to indicate ranges of data values. You
can do this in ggplot with binned scales. For example,
scale_fill_viridis_b()
provides a binned version of the
viridis scale. Try this out.
```{r temps-months-viridis4, fig.height = 3}
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
___
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_b()
```
You can change the number of bins by providing the
n.breaks
argument or alternatively by setting breaks
explicitly. Try this out.
```{r temps-months-viridis5, fig.height = 3}
# build all the code for this exercise
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_b(
n.breaks = 7
)
```
Scales from the colorspace package
The color scales provided by the colorspace package
follow a simple naming scheme of the form
scale_<aesthetic>_<datatype>_<colorscale>()
,
where <aesthetic>
is the name of the aesthetic
(fill
, color
, colour
),
<datatype>
indicates the type of variable plotted
(discrete
, continuous
, binned
),
and colorscale
stands for the type of the color scale
(qualitative
, sequential
,
diverging
, divergingx
).
For the mean temperature plot we have been using throughout this worksheet, which 2 color scales from the colorspace package is/are appropriate?
scale_fill_binned_sequential()
,
scale_fill_discrete_qualitative()
,
scale_fill_continuous_sequential()
,
scale_color_continuous_sequential()
,
scale_color_continuous_diverging()
>
# answer question above to yourself
# solution
scale_fill_binned_sequential()
scale_fill_continuous_sequential()
Try both options out.
```{r temps-months-colorspace, fig.height = 3}
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
___
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_continuous_sequential()
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_binned_sequential()
```
You can customize the colorspace scales with the palette
argument, which takes the name of a palette (e.g.,
"Inferno"
, "BluYl"
, "Lajolla"
).
Try this out. Also try reversing the scale direction with
rev = TRUE
or rev = FALSE
. (The
colorspace scales use rev
instead of
direction
.) You can find the names of all supported scales
here
(consider specifically single-hue and multi-hue sequential
palettes).
```{r temps-months-colorspace2, fig.height = 3}
# build all the code for this exercise
# solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_continuous_sequential(
palette = "Lajolla"
)
```
You can also use begin
and end
just like in
the viridis scales.
Manual scales
For discrete data with a small number of categories, it’s usually
best to set colors manually. This can be done with the scale functions
scale_*_manual()
. These functions take an argument
values
that specifies the color values to use.
To see how this works, let’s go back to this plot of temperatures over time for four locations.
```{r temps-locations, echo = TRUE}
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(size = 1.5)
```
Let’s use the following four colors: "gold2"
,
"firebrick"
, "blue3"
,
"springgreen4"
. We can visualize this palette using the
function swatchplot()
from the colorspace
package.
```{r swatchplot, fig.height = 3.5}
colorspace::swatchplot(c("gold2", "firebrick", "blue3", "springgreen4"))
```
Now apply this color palette to the temperatures plot, by using the
manual color scale. Hint: use the values
argument
to provide the colors to the manual scale.
```{r temps-locations-manual, exercise = TRUE}
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(size = 1.5) +
___
# solution
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(size = 1.5) +
scale_color_manual(
values = c("gold2", "firebrick", "blue3", "springgreen4")
)
```
One problem with this approach is that we can’t easily control which
data value gets assigned to which color. What if we wanted San Diego to
be shown in green and Chicago in blue? The simplest way to resolve this
issue is to use a named vector. A named vector in R is a vector where
each value has a name. Named vectors are created by writing
c(name1 = value1, name2 = value2, ...)
. See the following
example.
```{r named-vector}
# regular vector
c("cat", "mouse", "house")
# named vector
c(A = "cat", B = "mouse", C = "house")
```
[1] "cat" "mouse" "house"
A B C
"cat" "mouse" "house"
The names in the second example are A
, B
,
and C
. Notice that the names are not in quotes. However, if
you need a name containing a space (such as Death Valley
),
you need to enclose the name in backticks. Thus, our named vector of
colors could be written like so:
```{r named-vector2}
c(`Death Valley` = "gold2", Houston = "firebrick", Chicago = "blue3", `San Diego` = "springgreen4")
```
Death Valley Houston Chicago San Diego
"gold2" "firebrick" "blue3" "springgreen4"
Now try to use this color vector in the figure showing temperatures over time.
```{r temps-locations-manual2, exercise = TRUE}
# build all the code for this exercise
# solution
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(size = 1.5) +
scale_color_manual(
values = c(
`Death Valley` = "gold2",
Houston = "firebrick",
Chicago = "blue3",
`San Diego` = "springgreen4"
)
)
```
```{r swatchplot2, fig.height = 3.5}
# Okabe-Ito colors
colorspace::swatchplot(c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7", "#000000"))
```
Alternatively, you can find a list of all named colors here.
You can also run the command colors()
in your R console to
get a list of all available color names.
Hint: It’s a good idea to never use the colors
"red"
, "green"
, "blue"
,
"cyan"
, "magenta"
, "yellow"
. They
are extreme points in the RGB color space and tend to look unnatural and
too crazy. Try this by making a swatch plot of these colors, and compare
for example to the color scale containing the colors
"firebrick"
, "springgreen4"
,
"blue3"
, "turquoise3"
,
"darkorchid2"
, "gold2"
. Do you see the
difference?
```{r swatchplot3, fig.height = 3.5}
# build all the code for this exercise
# solution
colorspace::swatchplot(c("red", "green", "blue", "cyan", "magenta", "yellow"))
colorspace::swatchplot(c("firebrick", "springgreen4", "blue3", "turquoise3", "darkorchid2", "gold2"))
```
Solution to the challenge to make the summary table of mean temperature by month:
```{r solution_to_challenge}
# paste this below the "temperatures" code-chunk
temps_months <- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") %>%
group_by(location, month_name) %>%
summarize(mean = mean(temperature)) %>%
mutate(
month = factor(
month_name,
levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
),
location = factor(
location, levels = c("Death Valley", "Houston", "San Diego", "Chicago")
)
) %>%
select(-month_name)
```