Show Menu
Cheatography

Pandas cheat sheet Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Imports

import pandas as pd
import matplotlib
import matplo­tli­b.p­yplot as plt
plt.ion() setting intera­ctive mode

Load a CSV file

pract=pd.rea­d_csv('prac­tic­e-d­ata­fra­me.c­sv­',i­nde­x_c­ol=0)
index_­col=0 the first column contains the row names

Displaying values

Example dataframe:

age height sex
Ann 22 170 female
Bob 19 182 male
Carla 20 165 dona

For columns:
df.age
df.height
df.sex

For row names:
df.index

Selecting rows that have a particular value in some column:
df[df.a­ge­<22]
df[df.h­ei­ght­>170]
df[df.s­ex­==f­emale]
These commands return you all the inform­ation for the rows.

If we want only the row names rather than all the inform­ation:
df[df.a­ge­==2­2].i­ndex
The second part ('sex') selects the 'sex' column of the dataframe.

If we only want their gender:
df[df.a­ge­==2­2].sex
grad_v­alu­es=­all­adj­s.g­radGlob
we select the column and assign it to a variable (gradG­lob). Then, we can compute statistics for this column.

If you don't rememb­er...

Here you have the comparison operators
 

Compute the...

mean
median
standard deviation
variab­le.m­ean()
variab­le.m­ed­ian()
variab­le.s­td()

Visual­izing data

If we want to return the first rows of the data:
stud.h­ead()

If we want to visualize all the data, in a single boxplot:
stud.b­oxp­lot()
plt.show()
IMPORTANT: close the graph window with your mouse to continue or use:
plt.cl­ose()

If we want to see one boxplot per variable:
stud.b­oxp­lot­(by­='t­eac­her')
We can use the same formulas with .histogram command

Drawing a...

Histogram:
grad_s­mal­l.p­lot­(ki­nd=­'hist')

Boxplot:
grad_s­mal­l.p­lot­(ki­nd=­'box')

Barplot:
grad_s­mal­l.p­lot­(ki­nd=­'bar')

Saving a figure:
plt.sa­vef­ig(­"­sma­ll-­his­tog­ram.pd­f")
the command .plot can draw different kinds of plots

Selecting a...

Selecting rows based on +1 condition:

sm=sma­ll[­(sm­all.se­man­tic­_cl­ass­=="q­ual­ita­tiv­e") | (small.se­man­tic­_cl­ass­=="r­ela­tio­nal­")]

Selecting columns that we need to be able to do a boxplot:
pred_a­nd_­der­typ­e=a­ll[­['p­red­Glo­b',­'de­rTy­pe']]

Selecting all the partic­ipial adjectives of the database:
part=a­ll[­all.de­rTy­pe=­='p­art­icipi']

If you want to check what you obtain:
pred_a­nd_­der­typ­e.h­ead()

If you want to select a concrete variable, for instance, "­par­tic­ipial adject­ive­s":
part=a­ll[­all.de­rTy­pe=­='p­art­icipi']
symbol ''|'' means 'or'
 

How to return a value distri­bution

sex_va­r=p­rac­t.sex
sex_va­r.v­alu­e_c­ounts()

Contin­gency table

If we need to compare two catego­rical variables:
first.h­ead()
pd.cro­sst­ab(­fir­st.t­ea­che­r,f­irs­t.s­tud­ent­_pa­ssed)
we are cross-­tab­ulating the teacher with whether the student passed the exam or not

How to...

sample
get a random sample
first5­0=a­djs.he­ad(50)
random­_sa­mpl­e=a­djs.sa­mpl­e(50)
first5­0.h­ead()
random­_sa­mpl­e.h­ead()
first5­0.index
random­_sa­mpl­e.index