Frequently Asked Questions for Pandas

List Compre­hension

List compre­hension offers a shorter syntax when you want to create a new list based on the values of an existing list.


Based on a list of fruits, you want a new list, containing only the fruits with the letter "­a" in the name.

Without list compre­hension you will have to write a for statement with a condit­ional test inside:

fruits = ["ap­ple­", "­ban­ana­", "­che­rry­", "­kiw­i", "­man­go"]

newlist = []

for x in fruits:

  if "­a" in x:



With list compre­hension you can do all that with only one line of code:

fruits = ["ap­ple­", "­ban­ana­", "­che­rry­", "­kiw­i", "­man­go"]

newlist = [x for x in fruits if "­a" in x]



In statis­tics, imputation is the process of replacing missing data with substi­tuted values.

When substi­tuting for a data point, it is known as "unit imputa­tio­n";
when substi­tuting for a component of a data point, it is known as "item imputa­tio­n".



Aggregate Functions

Sums each value of an object
Returns total Count
Returns mathem­atical median
quanti­le(­[0.25, 0.75])
Quantiles of an object
Lowest value in an object
Highest Value in an Object
Returns mathem­atical mean
Returns mathem­atical variance
Returns standard deviation
Groups data by value of specified column (Similar to SQL))
pd.mer­ge(adf, bdf, how='l­eft', on'col')
Merges to Datasheets into one based on a common column
Aggregate Functions are a way of summar­izing or reshaping data

Shape of a Dataframe

Return a tuple repres­enting the dimens­ion­ality of the DataFrame.

>>> df = pd.Dat­aFr­ame­({'­col1': [1, 2], 'col2': [3, 4]})

>>> df.shape

(2, 2)


Return the mean of the values over the requested axis.

DataFr­­an(­axi­s=None, skipna­=None, level=­None, numeri­c_o­nly­=None)


Sorts all values in dataframe and returns the middle value

DataFr­­dia­n(a­xis­=None, skipna­=None, level=­None, numeri­c_o­nly­=None)

Creating a Dataframe from Scratch

# Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
From Scratch means creating the Data by hand

Catego­rical Variable

Is data that is limited to set or range of values

They are best visualised using bar plots or balloon plot



Quartiles vs Quantiles

Quartiles 25th percen­tiles of Data

Where as Quantiles can be custom percen­tiles


Correl­ation describes the relati­onship between data.

If the square footage in an apartment increases, the price of the apartment increases aswell


A Scatte­rplot plots data on an x-y grid


A histogram plots data on a axis with the count being repres­ented in height


