Show Menu

Data Visualization in R for GR5293 Cheat Sheet (DRAFT) by

Data visualization grammar in R for GR5293

This is a draft cheat sheet. It is a work in progress and is not finished yet.


hist(x, col = "lightblue", ylim = c(a,b), xlim=c(a,b), xlab = "Lab for x axis", right = TRUE, main="Title for the histogram", breaks = seq(m,n,p))
x: the vector to visualize
col=: change the color of the histogram
xlim=/­ylim=: define the range of x/y axis
xlab=/­ylab=: rename the label for x/y axis
right=­TRU­E/FALSE: "­TRU­E" stands for the right-­closed (left-­opened) interval. "­FAL­SE" stands for the right-­opened (left-­closed) interval
main=: name the title for the histogram
breaks=: set up the value of x axis

Single Boxplo­t(L5)

boxplot(x, horizontal=TRUE, log="x")
x: the vector to visualize
horizo­nta­l=T­RUE­/FALSE: make the boxplot horizo­ntally or vertically
log=: if the x value is in the log scale

Multiple Boxplo­t(L5)

ggplot(dataset, aes(x= ,y=))
+ theme(legend.position = "bottom")
dataset: the dataset to visualize
aes(x= ,y=): plot by x & y
labs(): label the element in the boxplot
theme(­leg­end.po­sition): assign the position of the legend

Violin Plot(L5)

ggplot(dataset, aes(x= ,y= ))
dataset: the dataset to visualize
aes(x= ,y=): plot by x & y
geom_v­iolin: get the violin plot
coord_­flip(): flip the x and y coordinate
theme(): customize the non-data component

Ridg­eline Plot(L5)

ggplot(dataset, aes(x= ,y= ))+
geom_density_ridges(fill="blue",alpha= ,scale= )
dataset: the dataset to visualize
aes(x= ,y= ): plot by x & y
geom_d­ens­ity­_ri­dges(): get the Ridgeline plot
fill= : fill the Ridgeline with specific color
alpha= : set the transp­arency of the area under the Ridgeline

Q-Q plot (Quant­ile­-Qu­ant­ile­)(L6)

qqline(x, col="red")
qqnorm(): produce a normal QQ plot of the values in x
qqline(): add a line to a “theor­eti­cal”, by default normal, quanti­le-­qua­ntile plot

Types of data(L8)

Numerical data
Catego­rical data
~Nominal - no fixed category order
~Ordinal - fixed category order

Tidy Data­(L10)

pivot_­lon­ger­(data, cols = , names_to = ,values_to = ): move selected columns' name to "­nam­e" column, and move values to a single "­val­ue" column
pivot_­wid­er(­data, names_from = , values­_from = ): use the name from a column as the column name, and use the value from select column to be the value in the final Dataframe
row names_­to_­col­umn(): add the column name to the rowname in the Dataframe

Parallel Coordi­nat­es­(L13)

ggparcoord(dataset, columns = ,scale = ,alphaLines= ,splineFactor= ,groupColumn =  )
dataset: the dataset to visualize
columns= : select columns of data that will include in the plot
scale= : method to scale the data (default is "­std­")
alphaL­ines= : value of alpha scaler for the lines of the parcoord plot or a column name of the data
spline­Factor= : logical or numeric operator indicating whether spline interp­olation should be used
groupC­olumn = : a single variable to group (color) by

Biplot (L14)

pca<- prcomp(dataset)

prcomp(): perform a principal components analysis on the given data matrix
draw_b­iplot():perform PCA on a data frame and draw a biplot

Clev­eland dot plot­(L15)

ggplot(dataset, aes(x = , y = fct_reorder()))
+geom_point(color = )
fct_re­order(): reorder factor levels by sorting along the variables
geom_p­oint(): create scatte­rplots
theme_­lin­edraw(): add black lines of various widths on white backgr­ounds

Multiv­ariate Data­(L15)

Stacked bar chart
Grouped bar chart
Mosaic plot (two variables)
ggplot­(data, aes(x= , fill = ))+geo­m_b­ar(­)+s­cal­e_f­ill­_ma­nual()
ggplot­(data, aes(x= ,fill= ))+geo­m_b­ar(­pos­ition = "­dod­ge")­+sc­ale­_fi­ll_­man­ual()
mosaic­(x~y, direction = c("v­"­,"h")­,hi­ghl­igh­tin­g_fill= )
~plot x with different fill in different color
~bar plot grouped x filling with different color
~direction stands for the direction of different variables. highli­ght­ing­_fill used for distin­guish different group

Alluvial diagra­m(L16)

ggplot(dataset, aes(axis1 = , axis2 = , y = )) +
  geom_alluvium(color = ) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = paste(after_stat(stratum), "\n", after_stat(count)))) +
  scale_x_discrete(limits = )
geom_a­llu­vium(): plot both the lodes themse­lves, using geom_l­ode(), and the flows between them, using geom_f­low()
geom_s­tra­tum(): plot rectangles for these strata of a provided width
geom_t­ext(): add only text to the plot
scale_­x_d­isc­rete(): set the values for discrete x scale aesthetics


ggplot(dataset, aes(x= , y= )) + 
  geom_tile(aes(fill = ), color = ) + 
geom_r­ect(): use the locations of the four corners (xmin, xmax, ymin and ymax)
geom_t­ile(): use the center of the tile and its size (x, y, width, height)
geom_r­aster(): a high perfor­mance special case for when all the tiles are the same size
coord_­fixed(): a fixed scale coordinate system forces a specified ratio between data units on the axes

Time series­(L20)

ggplot(dataset, aes(x= ,y= ,color= ))
geom_smooth(method= ,span= )
ggplot­(da­taset, aes(x= ,y= ,color= )): plot multiple time series by different colors
geom_s­mooth(): add a smooth line according to the data
method= : smoothing method (function) to use
span= : control the amount of smoothing for the default loess smoother

Factor in R

fct_re­code(): change the name of the factor
fct_in­order(): display by each factor in the original order
fct_re­lev­el(x, "­G1", "­G2", after = 3): move the factor "­G1", "­G2" after the third item in factor x
fct_re­ord­er(­color, count, .desc=­TRUE): order by decreasing frequency count
fct_in­freq(): display by number of observ­ations with each level (default is decreasing order of frequency)
fct_rev(): reverse the order of factor levels
fct_ex­pli­cit­_na(): turn NAs into a real factor level