Show Menu

ggalluvial Cheat Sheet by

introduction to ggalluvial

GGAL­LUVIAL ---- Cheats­heet

Introd­uction to ggalluvial
The ggalluvial package is a ggplot2 extension for producing alluvial plots.
Alluvial plots use variab­le-­width ribbons and stacked bar plots to represent multi-­dim­ens­ional or repeat­ed-­mea­sures data with catego­rical or ordinal variables.
There are two types of alluvial format:
Alluvial (Wide) Format & Lodes (Long) Format
More inform­ation

Five Essential Components

A dimension (variable) along which the data are vertically grouped at a fixed horizontal position.
Horizontal (x-) splines called alluvia span the width of the plot.
The groups at each axis are depicted as opaque blocks called strata.
The alluvia intersect the strata at lodes.
The segments of the alluvia between pairs of adjacent axes are flows.

Basic Alluvial Wide Format

load packages
basic ggplot
gg <- ggplot(
aes(y = Freq, axis1 = Gender,
axis2 = Dept, axis3 = Admit))
add alluvium
+ geom_a­llu­vium()
add stratum
+ geom_s­­tr­a­tum()
add text
+ geom_t­­ex­t­(stat = "­­st­r­a­tu­­m", aes(label = paste(­­af­t­e­r_­­sta­­t(­s­t­ra­­tum))))
add title
+ ggtitl­e("UC Berkeley admissions and reject­ion­s")
The dataset "­UCB­Adm­iss­ion­s" is an aggregate data on applicants to graduate school at Berkeley for the six largest depart­ments in 1973 classified by admission and sex.
It is a 3-dime­nsional array resulting from cross-­tab­ulating 4526 observ­ations on 3 variables.
No   Name   Levels
1   Admit  ­Adm­itted, Rejected
2   Gender   Male, Female
3   Dept   A, B, C, D, E, F

Graph of Wide Format


Change Color

the border
of alluvium
the border
of stratum
the fill of alluvium
the fill of stratum
Setting the different fills of alluvium and stratum can help analysts easily analyze the data from different aspects.

Graph after changing Color & Fill by Dept

If using "­fil­l=D­ept­", it means we are using colors grouped by each depart­ment.
It can help analyst to see the formation of each depart­ment: how many males and females in each depart­ment.
Also it shows how many people in each department are admitted and rejected.

Graph after changing Color & Fill by Gender

If using "­fil­l=G­end­er", it means we are using colors grouped by different gender.
It can help analyst to see how many males and females apply for each department and finally admitted or rejected.

Graph after changing Color & Fill by Admit

If using "­fil­l=A­dmi­t", it means we are using colors grouped by admitted or rejected.
It can help analyst to see the formation admitted students: how many admitted students are from each department and of different gender.

Change Width

the width of alluvium
the width of stratum

Graph after changing Width

Flip Coordi­nates

Adding coord_­flip()
Flip cartesian coordi­nates so that horizontal becomes vertical, and vertical, horizo­ntal. This is primarily useful for converting geoms and statistics which display y condit­ional on x, to x condit­ional on y.

Adding lode

Adding geom_l­ode()

geom_a­lluvium vs geom_flow

The graph is using geom_flow.
We can see the difference between geom_a­lluvium and geom_flow.
After we use "­flo­w", all males apply for department A came together, which is also the same as other depart­ments. It makes the graph much clearer than before since there is less cross alluviums between each axises.

More coding help

Adding the names of each axis
+scale­_x_­dis­cre­te(­limits = c("G­end­er", "­Dep­t","A­dmi­t"))
Changing the fill of stratum
+scale­_fi­ll_­bre­wer­(type = "­qua­l", palette = "­Set­1")

Basic Lodes (Long) Format

Convert data to Lodes format
to_lod­es_­for­m(a­s.d­­ame­(UC­BAd­mis­sio­ns)­,axes = 1:3,id = "­Coh­ort­")
load data
majors­$cu­rri­culum <- as.fac­tor­(ma­jor­s$c­urr­iculum)
basic ggplot
ggplot­(ma­jor­s,aes(x = semester, stratum = curric­ulum, alluvium = studen­t,fill = curric­ulum, label = curric­ulum))
add flow
+geom_­flo­w(stat = "­all­uvi­um", lode.g­uidance = "­fro­ntb­ack­"­,color = "­dar­kgr­ay") +
add stratum
add title
+ggtit­le(­"­student curricula across several semest­ers­")
The long format requires an additional indexing column that links the rows corres­ponding to a common cohort.
The data follows the major curricula of 10 students across 8 academic semesters. Missing values indicate undeclared majors.
A data frame with 80 rows and 3 variables:
1. student: student identifier
2. semester: character tag for odd-nu­mbered semesters
3. curric­ulum: declared major program

Graph of Lodes Format

This graph clearly shows a set of students’ academic curricula over the course of several semesters.
The lode format gives us the option to aggregate the flows between adjacent axes, which may be approp­riate when the transi­tions between adjacent axes are of primary import­ance.


No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.