10:00
30:00
tidyverse
remixed from Claus O. Wilke’s SDS375 course
Workshop materials are at:
https://elsherbini.github.io/durban-data-science-for-biology/
Get the big picture of data visualization
Learn how to wrangle data and make plots with the tidyverse
data wrangling (n.) - the art of taking data in one format and filtering, reshaping, and deriving values to make the data format you need.
Ask questions at #workshop-questions on https://discord.gg/UDAsYTzZE.
During an activity, place a yellow sticky on your laptop if you’re good to go and a pink sticky if you want help.
Image by Megan Duffy
WiFi:
Network: KTB Free Wifi (no password needed)
Network AHRI Password: @hR1W1F1!17
Network CAPRISA-Corp Password: corp@caprisa17
Bathrooms are out the lobby to your left
10:00
30:00
Get with your group. Go to the activity
Have one member from your group present the plot to everyone! 3 minute limit!
03:00
02_visit_clinical_measurements_UKZN_workshop_2023.csv
pid | time_point | arm | nugent_score | crp_blood | ph |
---|---|---|---|---|---|
pid_01 | baseline | placebo | 8 | 0.44 | 5.7 |
pid_01 | week_1 | placebo | 7 | 1.66 | 5.2 |
pid_01 | week_7 | placebo | 7 | 1.44 | 5.4 |
pid_02 | baseline | placebo | 7 | 1.55 | 5.2 |
pid_02 | week_1 | placebo | 7 | 0.75 | 4.8 |
pid_02 | week_7 | placebo | 4 | 1.17 | 4.2 |
ggplot
aes()
Long form, all arguments are named:
Abbreviated form, common arguments remain unnamed:
color
and fill
apply to different elementscolor
Applies color to points, lines, text, borders
fill
Applies color to any filled areas
color
and fill
aestheticscolor
and fill
aestheticscolor
and fill
aesthetics30:00
Time to try it yourself. Go to the first coding exercise.
During an activity, place a blue sticky on your laptop if you’re good to go and a pink sticky if you want help.
Example: Highest grossing movies 2023 to date
rank | title | amount |
---|---|---|
1 | Barbie | 1437.8 |
2 | The Super Mario Bros Movie | 1361.9 |
3 | Oppenheimer | 939.3 |
4 | Guardians of the Galaxy 3 | 845.5 |
5 | The Little Mermaid | 569.6 |
Millions USD. Data source: Box Office Mojo
bar lengths do not accurately represent the data values
key features of the data are obscured
Go to the event on wooclap
M3. Do you think it makes sense to truncate the axes for the life expectancy data?
ggplot2
rank | title | amount |
---|---|---|
1 | Barbie | 1437.8 |
2 | The Super Mario Bros Movie | 1361.9 |
3 | Oppenheimer | 939.3 |
4 | Guardians of the Galaxy 3 | 845.5 |
5 | The Little Mermaid | 569.6 |
pid | time_point | arm | nugent_score | crp_blood | ph |
---|---|---|---|---|---|
pid_01 | baseline | placebo | 8 | 0.44 | 5.7 |
pid_01 | week_1 | placebo | 7 | 1.66 | 5.2 |
pid_01 | week_7 | placebo | 7 | 1.44 | 5.4 |
pid_02 | baseline | placebo | 7 | 1.55 | 5.2 |
pid_02 | week_1 | placebo | 7 | 0.75 | 4.8 |
pid_02 | week_7 | placebo | 4 | 1.17 | 4.2 |
pid_03 | baseline | placebo | 6 | 1.78 | 4.8 |
pid_03 | week_1 | placebo | 10 | 0.57 | 5.3 |
pid_03 | week_7 | placebo | 7 | 1.79 | 5.2 |
pid_04 | baseline | placebo | 5 | 1.76 | 4.8 |
pid_04 | week_1 | placebo | 9 | 2.58 | 5.1 |
pid_04 | week_7 | placebo | 7 | 5.68 | 5.4 |
pid_05 | baseline | treatment | 8 | 0.95 | 4.9 |
pid_05 | week_1 | treatment | 3 | 0.19 | 3.2 |
pid_05 | week_7 | treatment | 2 | 0.45 | 3.5 |
pid_06 | baseline | placebo | 10 | 4.03 | 5.3 |
pid_06 | week_1 | placebo | 8 | 1.72 | 5.6 |
pid_06 | week_7 | placebo | 8 | 3.19 | 5.0 |
pid_07 | baseline | placebo | 7 | 0.10 | 5.2 |
pid_07 | week_1 | placebo | 7 | 1.36 | 4.9 |
pid_07 | week_7 | placebo | 5 | 0.38 | 5.1 |
pid_08 | baseline | placebo | 9 | 3.18 | 5.4 |
pid_08 | week_1 | placebo | 5 | 1.55 | 4.8 |
pid_08 | week_7 | placebo | 7 | 1.77 | 5.0 |
pid_09 | baseline | treatment | 5 | 2.13 | 4.9 |
pid_09 | week_1 | treatment | 3 | 0.27 | 3.6 |
pid_09 | week_7 | treatment | 4 | 1.04 | 4.2 |
pid_10 | baseline | treatment | 8 | 0.98 | 4.9 |
pid_10 | week_1 | treatment | 0 | 0.01 | 3.5 |
pid_10 | week_7 | treatment | 1 | 2.87 | 2.9 |
pid_11 | baseline | treatment | 7 | 0.31 | 5.0 |
pid_11 | week_1 | treatment | 1 | 0.10 | 3.3 |
pid_11 | week_7 | treatment | 4 | 1.15 | 5.1 |
pid_12 | baseline | placebo | 8 | 2.42 | 5.0 |
pid_12 | week_1 | placebo | 6 | 0.64 | 4.5 |
pid_12 | week_7 | placebo | 9 | 4.36 | 5.2 |
pid_13 | baseline | placebo | 8 | 2.69 | 5.1 |
pid_13 | week_1 | placebo | 7 | 2.57 | 5.5 |
pid_13 | week_7 | placebo | 8 | 1.98 | 4.8 |
pid_14 | baseline | placebo | 7 | 0.34 | 5.3 |
pid_14 | week_1 | placebo | 5 | 2.07 | 4.2 |
pid_14 | week_7 | placebo | 7 | 5.06 | 5.1 |
pid_15 | baseline | treatment | 7 | 0.29 | 4.8 |
pid_15 | week_1 | treatment | 3 | 0.84 | 3.4 |
pid_15 | week_7 | treatment | 3 | 0.68 | 3.5 |
pid_16 | baseline | treatment | 6 | 1.91 | 5.7 |
pid_16 | week_1 | treatment | 0 | 0.03 | 3.7 |
pid_16 | week_7 | treatment | 2 | 0.50 | 3.2 |
pid_17 | baseline | treatment | 5 | 1.39 | 4.8 |
pid_17 | week_1 | treatment | 2 | 0.00 | 3.3 |
pid_17 | week_7 | treatment | 3 | 0.90 | 3.7 |
pid_18 | baseline | treatment | 6 | 0.45 | 4.3 |
pid_18 | week_1 | treatment | 1 | 1.81 | 3.6 |
pid_18 | week_7 | treatment | 6 | 0.41 | 3.9 |
pid_19 | baseline | placebo | 7 | 1.34 | 5.3 |
pid_19 | week_1 | placebo | 5 | 2.91 | 4.3 |
pid_19 | week_7 | placebo | 5 | 1.27 | 4.5 |
pid_20 | baseline | placebo | 4 | 0.86 | 4.3 |
pid_20 | week_1 | placebo | 8 | 1.45 | 5.2 |
pid_20 | week_7 | placebo | 5 | 3.95 | 4.9 |
pid_21 | baseline | treatment | 5 | 0.50 | 4.6 |
pid_21 | week_1 | treatment | 1 | 1.60 | 3.4 |
pid_21 | week_7 | treatment | 4 | 1.23 | 4.8 |
pid_22 | baseline | treatment | 6 | 1.10 | 4.0 |
pid_22 | week_1 | treatment | 3 | 0.58 | 4.2 |
pid_22 | week_7 | treatment | 6 | 1.67 | 5.1 |
pid_23 | baseline | placebo | 8 | 0.99 | 5.4 |
pid_23 | week_1 | placebo | 8 | 0.80 | 5.5 |
pid_23 | week_7 | placebo | 3 | 3.67 | 3.1 |
pid_24 | baseline | placebo | 5 | 4.91 | 3.8 |
pid_24 | week_1 | placebo | 7 | 0.94 | 5.1 |
pid_24 | week_7 | placebo | 4 | 1.03 | 4.5 |
pid_25 | baseline | treatment | 3 | 2.84 | 3.9 |
pid_25 | week_1 | treatment | 4 | 3.52 | 4.7 |
pid_25 | week_7 | treatment | 2 | 0.49 | 3.7 |
pid_26 | baseline | treatment | 7 | 0.94 | 5.6 |
pid_26 | week_1 | treatment | 0 | 0.11 | 3.0 |
pid_26 | week_7 | treatment | 4 | 0.29 | 4.8 |
pid_27 | baseline | placebo | 7 | 1.17 | 5.5 |
pid_27 | week_1 | placebo | 5 | 1.62 | 4.7 |
pid_27 | week_7 | placebo | 8 | 0.76 | 4.7 |
pid_28 | baseline | treatment | 3 | 0.67 | 2.9 |
pid_28 | week_1 | treatment | 1 | 0.05 | 3.3 |
pid_28 | week_7 | treatment | 1 | 0.22 | 3.5 |
pid_29 | baseline | placebo | 7 | 2.39 | 5.8 |
pid_29 | week_1 | placebo | 4 | 4.09 | 4.5 |
pid_29 | week_7 | placebo | 3 | 3.13 | 3.5 |
pid_30 | baseline | placebo | 7 | 0.85 | 4.8 |
pid_30 | week_1 | placebo | 8 | 2.56 | 5.1 |
pid_30 | week_7 | placebo | 7 | 1.62 | 5.2 |
pid_31 | baseline | treatment | 6 | 1.78 | 4.4 |
pid_31 | week_1 | treatment | 2 | 0.41 | 3.5 |
pid_31 | week_7 | treatment | 2 | 1.36 | 2.8 |
pid_32 | baseline | treatment | 5 | 4.83 | 4.9 |
pid_32 | week_1 | treatment | 1 | 0.03 | 3.3 |
pid_32 | week_7 | treatment | 3 | 0.21 | 3.8 |
pid_33 | baseline | treatment | 6 | 5.26 | 4.6 |
pid_33 | week_1 | treatment | 1 | 0.07 | 3.6 |
pid_33 | week_7 | treatment | 2 | 1.92 | 3.3 |
pid_34 | baseline | placebo | 8 | 3.16 | 5.4 |
pid_34 | week_1 | placebo | 4 | 1.12 | 4.7 |
pid_34 | week_7 | placebo | 7 | 2.34 | 5.3 |
pid_35 | baseline | placebo | 8 | 0.74 | 5.3 |
pid_35 | week_1 | placebo | 5 | 0.16 | 4.4 |
pid_35 | week_7 | placebo | 3 | 1.97 | 3.9 |
pid_36 | baseline | placebo | 8 | 1.21 | 5.1 |
pid_36 | week_1 | placebo | 5 | 2.28 | 4.3 |
pid_36 | week_7 | placebo | 8 | 1.10 | 4.8 |
pid_37 | baseline | treatment | 5 | 1.16 | 4.8 |
pid_37 | week_1 | treatment | 1 | 0.07 | 3.6 |
pid_37 | week_7 | treatment | 2 | 0.70 | 3.2 |
pid_38 | baseline | placebo | 8 | 0.41 | 5.1 |
pid_38 | week_1 | placebo | 5 | 1.55 | 4.8 |
pid_38 | week_7 | placebo | 4 | 3.22 | 4.5 |
pid_39 | baseline | treatment | 6 | 1.61 | 4.6 |
pid_39 | week_1 | treatment | 2 | 0.09 | 3.6 |
pid_39 | week_7 | treatment | 5 | 0.77 | 4.7 |
pid_40 | baseline | treatment | 3 | 1.48 | 3.1 |
pid_40 | week_1 | treatment | 2 | 0.17 | 3.1 |
pid_40 | week_7 | treatment | 6 | 0.21 | 4.5 |
pid_41 | baseline | treatment | 4 | 1.51 | 4.3 |
pid_41 | week_1 | treatment | 2 | 0.64 | 3.4 |
pid_41 | week_7 | treatment | 4 | 0.78 | 4.4 |
pid_42 | baseline | placebo | 6 | 0.91 | 4.7 |
pid_42 | week_1 | placebo | 5 | 0.88 | 4.3 |
pid_42 | week_7 | placebo | 7 | 3.06 | 5.3 |
pid_43 | baseline | placebo | 6 | 1.08 | 4.7 |
pid_43 | week_1 | placebo | 6 | 0.94 | 4.1 |
pid_43 | week_7 | placebo | 6 | 1.79 | 4.1 |
pid_44 | baseline | treatment | 6 | 0.48 | 4.4 |
pid_44 | week_1 | treatment | 1 | 1.67 | 3.5 |
pid_44 | week_7 | treatment | 3 | 0.60 | 3.4 |
geom_bar()
to count before plottingposition = "dodge"
: Place bars for subgroups side-by-side
position = "stack"
: Place bars for subgroups on top of each other
position = "fill"
: Like "stack"
, but scale to 100%
Go to the event on wooclap
2 questions: M3. What’s the difference between geom_col and geom_bar? and M3. What patterns did you see in the smoker CRP data (slide 49)?
30:00
Time to try it yourself. Go to back to the module.
During an activity, place a yellow sticky on your laptop if you’re good to go and a pink sticky if you want help.