Show Menu
Cheatography

Numpy Cheat Sheet (DRAFT) by

A cheat sheet explaining important concepts in numpy

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Array Slicing

Defiti­nition
Array slicing allows you to extract specific parts of an array. It works similarly to list slicing in Python.
Example
arr = np.arr­ay([0, 1, 2, 3, 4, 5])
Slicing syntax
arr[st­art­:st­op:­step]
Basic slicing
slice_1 = arr[1:4]      # [1, 2, 3] 
slice_2 = arr[:3] # [0, 1, 2]
slice_3 = arr[3:] # [3, 4, 5]
Negative indexing
slice_4 = arr[-3:]      # [3, 4, 5] 
slice_5 = arr[:-2] # [0, 1, 2]
Step slicing
slice_6 = arr[::2]      # [0, 2, 4] 
slice_7 = arr[1::2] # [1, 3, 5]
Reverse array
slice_8 = arr[::-1]     # [5, 4, 3, 2, 1, 0]
Slicing 2D arrays
arr_2d = np.arr­ay([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 
slice_9 = arr_2d[:2, 1:] # [[2, 3], [5, 6]]

Perfor­mance Tips and Tricks

Vector­ization
Utilize NumPy's built-in vectorized operations whenever possible. These operations are optimized and signif­icantly faster than equivalent scalar operat­ions.
Avoiding Loops
Minimize the use of Python loops when working with NumPy arrays. Instead, try to express operations as array operat­ions. Loops in Python can be slow compared to vectorized operat­ions.
Use Broadc­asting
Take advantage of NumPy's broadc­asting rules to perform operations on arrays of different shapes effici­ently. Broadc­asting allows NumPy to work with arrays of different sizes without making unnece­ssary copies of data.
Avoid Copies
Be mindful of unnece­ssary array copies, especially when working with large datasets. NumPy arrays share memory when possible, but certain operations may create copies, which can impact perfor­mance and memory usage.
Use In-Place Operations
Whenever feasible, use in-place operations (+=, *=, etc.) to modify arrays without creating new ones. This reduces memory overhead and can improve perfor­mance.
Memory Layout
Understand how memory layout affects perfor­mance, especially for large arrays. NumPy arrays can be stored in different memory orders (C-order vs. Fortra­n-o­rder). Choosing the approp­riate memory layout can sometimes lead to better perfor­mance, especially when performing operations along specific axes.
Data Types
Choose approp­riate data types for your arrays to minimize memory usage and improve perfor­mance. Using smaller data types (e.g., np.float32 instead of np.flo­at64) can reduce memory overhead and may lead to faster comput­ations, especially on platforms with limited memory bandwidth.
NumExpr and Numba
Consider using specia­lized libraries like NumExpr or Numba for perfor­man­ce-­cri­tical sections of your code. These libraries can often provide signif­icant speedups by compiling expres­sions or functions to native machine code.
Parall­elism
NumPy itself doesn't provide built-in parall­elism, but you can leverage multi-­thr­eading or multi-­pro­cessing libraries like concur­ren­t.f­utures or joblib to parall­elize certain operat­ions, especially when working with large datasets or comput­ati­onally intensive tasks.
Profiling
Use profiling tools like cProfile or specia­lized profilers such as line_p­rofiler or memory­_pr­ofiler to identify perfor­mance bottle­necks in your code. Optimizing code based on actual profiling results can lead to more signif­icant perfor­mance improv­ements.

Array Concat­enation and Splitting

Concat­enation
array1 = np.arr­ay([[1, 2, 3], [4, 5, 6]]) 
array2 = np.arr­ay([[7, 8, 9]])
concatenated_array = np.con­cat­ena­te(­(ar­ray1, array2), axis=0)
# vertically
print(concatenated_array)
numpy.c­on­cat­enate()
Concat­enates arrays along a specified axis.
numpy.v­st­ack() and numpy.h­st­ack()
Stack arrays vertically and horizo­ntally, respec­tively.
numpy.d­st­ack()
Stack arrays depth-­wise.
Splitting
split_­arrays = np.spl­it(­con­cat­ena­ted­_array, 
[2],
axis=0)
# split after the second row
print(split_arrays)
numpy.s­plit()
Split an array into multiple sub-arrays along a specified axis.
numpy.h­sp­lit() and numpy.v­sp­lit()
Split arrays horizo­ntally and vertic­ally, respec­tively.
numpy.d­sp­lit()
Split arrays depth-­wise.

Basic Operations

Addition
array1 + array2
Subtra­ction
array1 - array2
Multip­lic­ation
array1 * array2
Division
array1 / array2
Floor Division
array1 // array2
Modulus
array1 % array2
Expone­nti­ation
array1 ** array2
Absolute
np.abs­(array)
Negative
-array
Reciprocal
1 / array
Sum
np.sum­(array)
Minimum
np.min­(array)
Maximum
np.max­(array)
Mean
np.mea­n(a­rray)
Median
np.med­ian­(array)
Standard Deviation
np.std­(array)
Variance
np.var­(array)
Dot Product
np.dot­(ar­ray1, array2)
Cross Product
np.cro­ss(­array1, array2)

NaN Handling

Identi­fying NaNs
Use np.isnan() function to check for NaN values in an array.
Removing NaNs
Use np.isnan() to create a boolean mask, then use boolean indexing to select non-NaN values.
Replacing NaNs
Use np.nan­_to­_num() to replace NaNs with a specified value. Use np.nan­mean(), np.nan­med­ian(), etc., to compute mean, median, etc., ignoring NaNs.
Interp­olating NaNs
Sure, here's a short content for "NaN Handli­ng" on your NumPy cheat sheet: NaN Handling: Identi­fying NaNs: Use np.isnan() function to check for NaN values in an array. Removing NaNs: Use np.isnan() to create a boolean mask, then use boolean indexing to select non-NaN values. Replacing NaNs: Use np.nan­_to­_num() to replace NaNs with a specified value. Use np.nan­mean(), np.nan­med­ian(), etc., to compute mean, median, etc., ignoring NaNs. Interp­olating NaNs
Ignoring NaNs in Operations
Many NumPy functions have NaN-aware counte­rparts, like np.nan­mean(), np.nan­sum(), etc., that ignore NaNs in comput­ations.
Handling NaNs in Aggreg­ations
Aggreg­ation functions (np.sum(), np.mean(), etc.) typically return NaN if any NaNs are present in the input array. Use skipna­=True parameter in pandas DataFrame functions for NaN handling.
Dealing with NaNs in Linear Algebra
NumPy's linear algebra functions (np.li­nal­g.i­nv(), np.lin­alg.so­lve(), etc.) handle NaNs by raising LinAlg­Error.

Broadc­asting

Broadc­asting is a powerful feature in NumPy that allows arrays of different shapes to be combined in arithmetic operat­ions.
When operating on arrays of different shapes, NumPy automa­tically broadcasts the smaller array across the larger array so that they have compatible shapes.
This eliminates the need for explicit looping over array elements, making code more concise and efficient.
Broadc­asting is partic­ularly useful for performing operations between arrays of different dimensions or sizes without needing to reshape them explic­itly.

Mathem­atical Functions

Definition
NumPy provides a wide range of mathem­atical functions that operate elemen­t-wise on arrays, allowing for efficient comput­ation across large datasets.
Trigon­ometric Functions
np.sin(), np.cos(), np.tan(), np.arc­sin(), np.arc­cos(), np.arc­tan()
Hyperbolic Functions
np.sinh(), np.cosh(), np.tanh(), np.arc­sinh(), np.arc­cosh(), np.arc­tanh()
Expone­ntial and Logari­thmic Functions
np.exp(), np.log(), np.log2(), np.log10()
Rounding
np.rou­nd(), np.flo­or(), np.ceil(), np.trunc()
Absolute Value
np.abs()
Factorial and Combin­ations
np.fac­tor­ial(), np.comb()
Gamma and Beta Functions
np.gam­ma(), np.beta()
Sum, Mean, Median
np.sum(), np.mean(), np.med­ian()
Standard Deviation, Variance
np.std(), np.var()
Matrix Operations
np.dot(), np.inn­er(), np.out­er(), np.cross()
Eigenv­alues and Eigenv­ectors
np.lin­alg.eig(), np.lin­alg.ei­gh(), np.lin­alg.ei­gvals()
Matrix Decomp­osi­tions
np.lin­alg.svd(), np.lin­alg.qr(), np.lin­alg.ch­ole­sky()

Array Creation

numpy.a­rray()
Create an array from a Python list or tuple.
Example
arr = np.arr­ay([1, 2, 3])
numpy.z­eros()
Create an array filled with zeros.
Example
zeros_arr = np.zer­os((3, 3))
numpy.o­nes()
Create an array filled with ones.
Example
ones_arr = np.one­s((2, 2))
numpy.a­ra­nge()
Create an array with a range of values.
Example
range_arr = np.ara­nge(0, 10, 2)  # array([0, 2, 4, 6, 8])
numpy.l­in­space()
Create an array with evenly spaced values.
Example
linspa­ce_arr = np.lin­spa­ce(0, 10, 5)  # array([ 0.,  2.5,  5.,  7.5, 10.])
numpy.e­ye()
Create an identity matrix.
Example
identi­ty_mat = np.eye(3)
numpy.r­an­dom.rand()
Create an array with random values from a uniform distri­bution.
Example
random_arr = np.ran­dom.ra­nd(2, 2)
numpy.r­an­dom.ra­ndn()
Create an array with random values from a standard normal distri­bution.
Example
random­_no­rma­l_arr = np.ran­dom.ra­ndn(2, 2)
numpy.r­an­dom.ra­ndint()
Create an array with random integers.
Example
random­_in­t_arr = np.ran­dom.ra­ndi­nt(0, 10, size=(2, 2))
numpy.f­ull()
Create an array filled with a specified value.
Example
full_arr = np.ful­l((2, 2), 7)
numpy.e­mpty()
Create an uninit­ialized array (values are not set, might be arbitr­ary).
Example
empty_arr = np.emp­ty((2, 2))

Linear Algebra

Matrix Multip­lic­ation
np.dot() or @ operator for matrix multip­lic­ation.
Transpose
np.tra­nsp­ose() or .T attribute for transp­osing a matrix.
Inverse
np.lin­alg.inv() for calcul­ating the inverse of a matrix.
Determ­inant
np.lin­alg.det() for computing the determ­inant of a matrix.
Eigenv­alues and Eigenv­ectors
np.lin­alg.eig() for computing eigenv­alues and eigenv­ectors.
Matrix Decomp­osi­tions
Functions like np.lin­alg.qr(), np.lin­alg.svd(), and np.lin­alg.ch­ole­sky() for various matrix decomp­osi­tions.
Solving Linear Systems
np.lin­alg.so­lve() for solving systems of linear equations.
Vector­ization
Leveraging NumPy's broadc­asting and array operations for efficient linear algebra comput­ations.
 

Statis­tical Functions

mean
Computes the arithmetic mean along a specified axis.
median
Computes the median along a specified axis.
average
Computes the weighted average along a specified axis.
std
Computes the standard deviation along a specified axis.
var
Computes the variance along a specified axis.
amin
Finds the minimum value along a specified axis.
amax
Finds the maximum value along a specified axis.
argmin
Returns the indices of the minimum value along a specified axis.
argmax
Returns the indices of the maximum value along a specified axis.
percentile
Computes the q-th percentile of the data along a specified axis.
histogram
Computes the histogram of a set of data.

Comparison with Python Lists

Perfor­mance
NumPy arrays are faster and more memory efficient compared to Python lists, especially for large datasets. This is because NumPy arrays are stored in contiguous blocks of memory and have optimized functions for mathem­atical operat­ions, whereas Python lists are more flexible but slower due to their dynamic nature.
Vectorized Operations
NumPy allows for vectorized operat­ions, which means you can perform operations on entire arrays without the need for explicit looping. This leads to concise and efficient code compared to using loops with Python lists.
Multid­ime­nsional Arrays
NumPy supports multid­ime­nsional arrays, whereas Python lists are limited to one-di­men­sional arrays or nested lists, which can be less intuitive for handling multi-­dim­ens­ional data.
Broadc­asting
NumPy arrays support broadc­asting, which enables operations between arrays of different shapes and sizes. In contrast, performing similar operations with Python lists would require explicit looping or list compre­hen­sions.
Type Stability
NumPy arrays have a fixed data type, which leads to better perfor­mance and memory effici­ency. Python lists can contain elements of different types, leading to potential type conversion overhead.
Rich Set of Functions
NumPy provides a wide range of mathem­atical and statis­tical functions optimized for arrays, whereas Python lists require manual implem­ent­ation or the use of external libraries for similar functi­ona­lity.
Memory Usage
NumPy arrays typically consume less memory compared to Python lists, especially for large datasets, due to their fixed data type and efficient storage format.
Indexing and Slicing
NumPy arrays offer more powerful and convenient indexing and slicing capabi­lities compared to Python lists, making it easier to manipulate and access specific elements or subarrays.
Parallel Processing
NumPy operations can leverage parallel processing capabi­lities of modern CPUs through libraries like Intel MKL or OpenBLAS, resulting in signif­icant perfor­mance gains for certain operations compared to Python lists.
Intero­per­ability
NumPy arrays can be easily integrated with other scientific computing libraries in Python ecosystem, such as SciPy, Pandas, and Matplo­tlib, allowing seamless data exchange and intero­per­abi­lity.

Masked Arrays

Why?
Masked arrays in NumPy allow you to handle missing or invalid data effici­ently.
What are Masked Arrays?
Masked arrays are arrays with a companion boolean mask array, where elements that are marked as "­mas­ked­" are ignored during comput­ations.
Creating Masked Arrays
You can create masked arrays using the numpy.m­a.m­as­ked­_array function, specifying the data array and the mask array.
Masking
Masking is the process of marking certain elements of an array as invalid or missing. You can manually create masks or use functions like numpy.m­a.m­as­ked­_where to create masks based on condit­ions.
Operations with Masked Arrays
Operations involving masked arrays automa­tically handle masked values by ignoring them in comput­ations. This allows for easy handling of missing data without explicitly removing or replacing them.
Masked Array Methods
NumPy provides methods for masked arrays to perform various operations like calcul­ating statis­tics, manipu­lating data, and more. These methods are similar to regular array methods but handle masked values approp­ria­tely.
Applic­ations
Masked arrays are useful in scenarios where datasets contain missing or invalid data points. They are commonly used in scientific computing, data analysis, and handling time series data where missing values are prevalent.

Random Number Generation

np.ran­dom.rand
Generates random numbers from a uniform distri­bution over [0, 1).
np.ran­dom.randn
Generates random numbers from a standard normal distri­bution (mean 0, standard deviation 1).
np.ran­dom.ra­ndint
Generates random integers from a specified low (inclu­sive) to high (exclu­sive) range.
np.ran­dom.ra­ndo­m_s­ample or np.ran­dom.random
Generates random floats in the half-open interval [0.0, 1.0).
np.ran­dom.choice
Generates random samples from a given 1-D array or list.
np.ran­dom.sh­uffle
Shuffles the elements of an array in place.
np.ran­dom.pe­rmu­tation
Randomly permutes a sequence or returns a permuted range.
np.ran­dom.seed
Sets the random seed to ensure reprod­uci­bility of results.

Filtering Arrays

Filtering Arrays
NumPy provides powerful tools for filtering arrays based on certain condit­ions. Filtering allows you to select elements from an array that meet specific criteria.
Syntax
filter­ed_­array = array[­con­dition]
Example
import numpy as np  
arr = np.arr­ay([1, 2, 3, 4, 5])
filtered = arr[arr > 2]
# Select elements greater than 2
print(filtered)
# Output: [3 4 5]
Combining Conditions
Conditions can be combined using logical operators like & (and) and | (or).
Example
arr = np.arr­ay([1, 2, 3, 4, 5]) 
filtered = arr[(arr > 2) & (arr < 5)]
# Select elements between 2 and 5
print(filtered)
# Output: [3 4]
Using Functions
NumPy also provides functions like np.where() and np.ext­ract() for more complex filtering.
Example
arr = np.arr­ay([1, 2, 3, 4, 5]) 
filtered = np.whe­re(arr % 2 == 0, arr, 0)
# Replace odd elements with 0
print(filtered)
# Output: [0 2 0 4 0]

Array Iteration

For Loops
Iterate over arrays using tradit­ional for loops. This is useful for basic iteration but might not be the most efficient method for large arrays.
nditer
The nditer object allows iterating over arrays in a more efficient and flexible way. It provides options to specify the order of iteration, data type casting, and external loop handling.
Flat Iteration
The flat attribute of NumPy arrays returns an iterator that iterates over all elements of the array as if it were a flattened 1D array. This is useful for simple elemen­t-wise operat­ions.
Broadc­asting
When performing operations between arrays of different shapes, NumPy automa­tically broadcasts the arrays to compatible shapes. Unders­tanding broadc­asting rules can help effici­ently iterate over arrays without explicit loops.
Vectorized Operations
Instead of explicit iteration, utilize NumPy's built-in vectorized operations which operate on entire arrays rather than individual elements. This often leads to faster and more concise code.

Array Reshaping

Array Reshaping
Reshaping arrays in NumPy allows you to change the shape or dimensions of an existing array without changing its data. This is useful for tasks like converting a 1D array into a 2D array or vice versa, or for preparing data for certain operations like matrix multip­lic­ation.
reshape()
The reshape() function in NumPy allows you to change the shape of an array to a specified shape.
For example:
import numpy as np

arr = np.arr­ay([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.re­sha­pe((2, 3))
Explan­ation
This will reshape the array arr into a 2x3 matrix.
resize()
Similar to reshape(), resize() changes the shape of an array, but it also modifies the original array if necessary to accomm­odate the new shape.
Example
arr = np.arr­ay([[1, 2], [3, 4]]) 
resized_arr = np.res­ize­(arr, (3, 2))
Explan­ation
If the new shape requires more elements than the original array has, resize() repeats the original array to fill in the new shape.
flatten()
The flatten() method collapses a multi-­dim­ens­ional array into a 1D array by iterating over all elements in row-major (C-style) order.
Example
arr = np.arr­ay([[1, 2], [3, 4]]) 
flattened_arr = arr.fl­atten()
Explan­ation
This will flatten the 2D array into a 1D array.
ravel()
Similar to flatten(), ravel() also flattens multi-­dim­ens­ional arrays into a 1D array, but it returns a view of the original array whenever possible.
Example
arr = np.arr­ay([[1, 2], [3, 4]]) 
raveled_arr = arr.ra­vel()
Explan­ation
This method can be more efficient in terms of memory usage than flatten().
transp­ose()
The transp­ose() method rearranges the dimensions of an array. For 2D arrays, it effect­ively swaps rows and columns.
Example
arr = np.arr­ay([[1, 2], [3, 4]]) 
transposed_arr = arr.tr­ans­pose()
Explan­ation
This will transpose the 2x2 matrix, swapping rows and columns.

Sorting Arrays

np.sor­t(arr)
Returns a sorted copy of the array.
arr.sort()
Sorts the array in-place.
np.arg­sor­t(arr)
Returns the indices that would sort the array.
np.lex­sort()
Performs an indirect sort using a sequence of keys.
np.sor­t_c­omp­lex­(arr)
Sorts the array of complex numbers based on the real part first, then the imaginary part.
np.par­tit­ion­(arr, k)
Rearranges the elements in such a way that the kth element will be in its correct position in the sorted array, with all smaller elements to its left and all larger elements to its right.
np.arg­par­tit­ion­(arr, k)
Returns the indices that would partition the array.

Array Indexing

Single Element Access
Use square brackets [] to access individual elements of an array by specifying the indices for each dimension. For example, arr[0, 1] accesses the element at the first row and second column of the array arr.
Negative Indexing
Negative indices can be used to access elements from the end of the array. For instance, arr[-1] accesses the last element of the array arr.
Slice Indexing
NumPy arrays support slicing similar to Python lists. You can use the colon : operator to specify a range of indices. For example, arr[1:3] retrieves elements from index 1 to index 2 (inclu­sive) along the first axis.
Integer Array Indexing
You can use arrays of integer indices to extract specific elements from an array. For example, arr[[0, 2, 4]] retrieves elements at indices 0, 2, and 4 along the first axis.
Boolean Array Indexing (Boolean Masking)
You can use boolean arrays to filter elements from an array based on a condition. For example, arr[arr > 0] retrieves all elements of arr that are greater than zero.
Fancy Indexing
Fancy indexing allows you to select multiple elements from an array using arrays of indices or boolean masks. This method can be used to perform advanced selection operations effici­ently.