Show Menu
Cheatography

CodingJinxx Pandas FAQ Cheat Sheet by

Frequently Asked Questions for Pandas

List Compre­hension

List compre­hension offers a shorter syntax when you want to create a new list based on the values of an existing list.

Example:

Based on a list of fruits, you want a new list, containing only the fruits with the letter "­a" in the name.

Without list compre­hension you will have to write a for statement with a condit­ional test inside:

fruits = ["ap­ple­", "­ban­ana­", "­che­rry­", "­kiw­i", "­man­go"]

newlist = []


for x in fruits:

  if "­a" in x:

    newlis­t.a­ppe­nd(x)


print(­new­list)


With list compre­hension you can do all that with only one line of code:

fruits = ["ap­ple­", "­ban­ana­", "­che­rry­", "­kiw­i", "­man­go"]


newlist = [x for x in fruits if "­a" in x]


print(­new­list)

Imputation

In statis­tics, imputation is the process of replacing missing data with substi­tuted values.

When substi­tuting for a data point, it is known as "unit imputa­tio­n";
when substi­tuting for a component of a data point, it is known as "item imputa­tio­n".

Pandas Imputation Article

Aggregate Functions

sum()
Sums each value of an object
count()
Returns total Count
median()
Returns mathem­atical median
quanti­le(­[0.25, 0.75])
Quantiles of an object
min()
Lowest value in an object
max()
Highest Value in an Object
mean()
Returns mathem­atical mean
var()
Returns mathem­atical variance
std()
Returns standard deviation
df.gro­upb­y(b­y="c­ol")
Groups data by value of specified column (Similar to SQL))
pd.mer­ge(adf, bdf, how='l­eft', on'col')
Merges to Datasheets into one based on a common column
Aggregate Functions are a way of summar­izing or reshaping data

Shape of a Dataframe

Return a tuple repres­enting the dimens­ion­ality of the DataFrame.

>>> df = pd.Dat­aFr­ame­({'­col1': [1, 2], 'col2': [3, 4]})

>>> df.shape

(2, 2)

Mean

Return the mean of the values over the requested axis.

DataFr­ame.me­an(­axi­s=None, skipna­=None, level=­None, numeri­c_o­nly­=None)

Median

Sorts all values in dataframe and returns the middle value

DataFr­ame.me­dia­n(a­xis­=None, skipna­=None, level=­None, numeri­c_o­nly­=None)

Creating a Dataframe from Scratch

# Import pandas library
import pandas as pd
  
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
From Scratch means creating the Data by hand

Catego­rical Variable

Is data that is limited to set or range of values

They are best visualised using bar plots or balloon plot

Example Article

Quartiles vs Quantiles

Quartiles 25th percen­tiles of Data

Where as Quantiles can be custom percen­tiles

Correl­ation

Correl­ation describes the relati­onship between data.

Example:
If the square footage in an apartment increases, the price of the apartment increases aswell

Scatte­rplot

A Scatte­rplot plots data on an x-y grid

Histogram

A histogram plots data on a axis with the count being repres­ented in height
 

Comments

No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          More Cheat Sheets by CodingJinxx

          CodingJinxx Pandas Facts Cheat Sheet