Hotchpotch
pd.cut |
numeric to category |
pd.groupby |
index |
df.pivot |
df.get_dummy |
apply cols rows.. |
For sklearn
reg = RandomForestRegressor()
X = df_rnd[features].as_matrix()
y = df_rnd['recovery'].values
reg.fit(X, y) |
Deletes
DEL df['col'] |
df = df.drop('col', axis=1) |
Subsetting
surveys_df[(surveys_df.year >= 1980) & (surveys_df.year <= 1985)] |
surveys_df[pd.isnull(surveys_df).any(axis=1)] |
surveys_df[surveys_df['species_id'].isin([listGoesHere])] |
|
|
Indexing and Selecting Data
Series and DataFrame |
Python and NumPy indexing operators [] and attribute operator . provide quick and easy access to pandas data structures |
.loc label or boolean array |
.iloc integer position or boolean array |
.loc, .iloc, and also [] indexing can accept a callable as indexer |
Series s.loc[indexer] |
DataFrame df.loc[row_indexer,column_indexer] |
X_neg[:, df_rnd.columns.get_loc('medication')] = 0 |
.copy() |
.drop('recovery', axis=1) |
.apply(lambda x: other_defined_fun(*x), axis=1) |
Columns: df.iloc[:,0:2] |
Rows: df.iloc[5:100,:] |
|
|
Index
Index |
The base pandas Index type |
|
Immutable ndarray. Ordered, sliceable set. |
MultiIndex |
A multi-level, or hierarchical, index object |
|
~ Array of tuples where each tuple is unique |
Slice Notation
lst[i:j] |
i to j-1 |
lst[i:] |
i to end |
lst[:j] |
start to j-1 |
lst[:] |
All |
lst[i:j:step] |
i to j-1 by step |
df[i:j] | df.iloc[i:j,:] |
Row i to j-1 |
df.iloc[0:3, 1:4] |
iloc[row slicing, col slicing] |
df.iloc[row, col] |
Specific element |
df.loc[lbl1:lbl2,:] |
Also with labels |
|