This is a draft cheat sheet. It is a work in progress and is not finished yet.
Read and Write
|
pd.read_csv(filepath_or_buffer, sep=', ', names=None, index_col=None)
|
df.to_csv(path_or_buf, sep, columns=None, header=True, index=True, index_label=None, mode='w')
|
Numerical Features and Nans
df.sort_values( by=feat, ascending=False)
|
sort table according to the values of a columns |
|
remove lines with a NaN value |
|
check if the value of feat is NaN in the table |
Categorical Features
pd.get_dummies(df[feature]) |
Transform a categorical feature into dummy variables |
Features Visualization
df[feats].plot(kind=['density' | 'bar'], subplots=True, layout=(1, 2), sharex=, figsize=);
|
distibution of numeric features |
|
|
General Infos and Basic Statistics
|
general infos |
|
basic statistics on numerical features |
df.describe(include=[ 'object', 'bool'])
|
include non-numerical features |
Apply Functions
df.apply(my_function) # ex: df.apply(lambda x: )
|
apply a function |
df['feat'] = df['feat'].map(d) # or df = df.replace({'feat': d})
|
replace values in a column according to dict d |
Group by
df.groupby(['feat']) [columns_to_keep].func()
|
group by a feature |
df.groupby([feat]) [columns_to_keep]. agg([list_of_functions])
|
group by a feature and apply several functions |
Cross Tables
pd.crosstab(df['feat1'], df['feat2'], normalize=)
|
confusion matrix |
df.pivot_table( ['features_to_analyze'], ['grouping_feat'], aggfunc='mean')
|
pivot table |
|