GGALLUVIAL ---- Cheatsheet
Introduction to ggalluvial
The ggalluvial package is a ggplot2 extension for producing alluvial plots.
Alluvial plots use variable-width ribbons and stacked bar plots to represent multi-dimensional or repeated-measures data with categorical or ordinal variables.
There are two types of alluvial format:
Alluvial (Wide) Format & Lodes (Long) Format
Five Essential Components
A dimension (variable) along which the data are vertically grouped at a fixed horizontal position.
Horizontal (x-) splines called alluvia span the width of the plot.
The groups at each axis are depicted as opaque blocks called strata.
The alluvia intersect the strata at lodes.
The segments of the alluvia between pairs of adjacent axes are flows.
Basic Alluvial Wide Format
gg <- ggplot(as.data.frame
aes(y = Freq, axis1 = Gender,
axis2 = Dept, axis3 = Admit))
+ geom_text(stat = "stratum", aes(label = paste(after_stat(stratum))))
+ ggtitle("UC Berkeley admissions and rejections")
The dataset "UCBAdmissions" is an aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex.
It is a 3-dimensional array resulting from cross-tabulating 4526 observations on 3 variables.
No Name Levels
1 Admit Admitted, Rejected
2 Gender Male, Female
3 Dept A, B, C, D, E, F
the fill of alluvium
the fill of stratum
Setting the different fills of alluvium and stratum can help analysts easily analyze the data from different aspects.
Graph after changing Color & Fill by Dept
If using "fill=Dept", it means we are using colors grouped by each department.
It can help analyst to see the formation of each department: how many males and females in each department.
Also it shows how many people in each department are admitted and rejected.
Graph after changing Color & Fill by Gender
If using "fill=Gender", it means we are using colors grouped by different gender.
It can help analyst to see how many males and females apply for each department and finally admitted or rejected.
Graph after changing Color & Fill by Admit
If using "fill=Admit", it means we are using colors grouped by admitted or rejected.
It can help analyst to see the formation admitted students: how many admitted students are from each department and of different gender.
the width of alluvium
the width of stratum
Graph after changing Width
Flip cartesian coordinates so that horizontal becomes vertical, and vertical, horizontal. This is primarily useful for converting geoms and statistics which display y conditional on x, to x conditional on y.
geom_alluvium vs geom_flow
The graph is using geom_flow.
We can see the difference between geom_alluvium and geom_flow.
After we use "flow", all males apply for department A came together, which is also the same as other departments. It makes the graph much clearer than before since there is less cross alluviums between each axises.
More coding help
Adding the names of each axis
+scale_x_discrete(limits = c("Gender", "Dept","Admit"))
Changing the fill of stratum
+scale_fill_brewer(type = "qual", palette = "Set1")
Basic Lodes (Long) Format
Convert data to Lodes format
to_lodes_form(as.data.frame(UCBAdmissions),axes = 1:3,id = "Cohort")
majors$curriculum <- as.factor(majors$curriculum)
ggplot(majors,aes(x = semester, stratum = curriculum, alluvium = student,fill = curriculum, label = curriculum))
+geom_flow(stat = "alluvium", lode.guidance = "frontback",color = "darkgray") +
+ggtitle("student curricula across several semesters")
The long format requires an additional indexing column that links the rows corresponding to a common cohort.
The data follows the major curricula of 10 students across 8 academic semesters. Missing values indicate undeclared majors.
A data frame with 80 rows and 3 variables:
1. student: student identifier
2. semester: character tag for odd-numbered semesters
3. curriculum: declared major program
Graph of Lodes Format
This graph clearly shows a set of students’ academic curricula over the course of several semesters.
The lode format gives us the option to aggregate the flows between adjacent axes, which may be appropriate when the transitions between adjacent axes are of primary importance.