Numpy Cheat Sheet

Array Slicing

Defitinition	Array slicing allows you to extract specific parts of an array. It works similarly to list slicing in Python.
Example	`arr = np.array([0, 1, 2, 3, 4, 5])`
Slicing syntax	arr[start:stop:step]
Basic slicing	`slice_1 = arr[1:4] # [1, 2, 3] slice_2 = arr[:3] # [0, 1, 2] slice_3 = arr[3:] # [3, 4, 5]`
Negative indexing	`slice_4 = arr[-3:] # [3, 4, 5] slice_5 = arr[:-2] # [0, 1, 2]`
Step slicing	`slice_6 = arr[::2] # [0, 2, 4] slice_7 = arr[1::2] # [1, 3, 5]`
Reverse array	`slice_8 = arr[::-1] # [5, 4, 3, 2, 1, 0]`
Slicing 2D arrays	`arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) slice_9 = arr_2d[:2, 1:] # [[2, 3], [5, 6]]`

Performance Tips and Tricks

Vectorization	Utilize NumPy's built-in vectorized operations whenever possible. These operations are optimized and significantly faster than equivalent scalar operations.
Avoiding Loops	Minimize the use of Python loops when working with NumPy arrays. Instead, try to express operations as array operations. Loops in Python can be slow compared to vectorized operations.
Use Broadcasting	Take advantage of NumPy's broadcasting rules to perform operations on arrays of different shapes efficiently. Broadcasting allows NumPy to work with arrays of different sizes without making unnecessary copies of data.
Avoid Copies	Be mindful of unnecessary array copies, especially when working with large datasets. NumPy arrays share memory when possible, but certain operations may create copies, which can impact performance and memory usage.
Use In-Place Operations	Whenever feasible, use in-place operations (+=, *=, etc.) to modify arrays without creating new ones. This reduces memory overhead and can improve performance.
Memory Layout	Understand how memory layout affects performance, especially for large arrays. NumPy arrays can be stored in different memory orders (C-order vs. Fortran-order). Choosing the appropriate memory layout can sometimes lead to better performance, especially when performing operations along specific axes.
Data Types	Choose appropriate data types for your arrays to minimize memory usage and improve performance. Using smaller data types (e.g., np.float32 instead of np.float64) can reduce memory overhead and may lead to faster computations, especially on platforms with limited memory bandwidth.
NumExpr and Numba	Consider using specialized libraries like NumExpr or Numba for performance-critical sections of your code. These libraries can often provide significant speedups by compiling expressions or functions to native machine code.
Parallelism	NumPy itself doesn't provide built-in parallelism, but you can leverage multi-threading or multi-processing libraries like concurrent.futures or joblib to parallelize certain operations, especially when working with large datasets or computationally intensive tasks.
Profiling	Use profiling tools like cProfile or specialized profilers such as line_profiler or memory_profiler to identify performance bottlenecks in your code. Optimizing code based on actual profiling results can lead to more significant performance improvements.

Array Concatenation and Splitting

Concatenation	`array1 = np.array([[1, 2, 3], [4, 5, 6]]) array2 = np.array([[7, 8, 9]]) concatenated_array = np.concatenate((array1, array2), axis=0) # vertically print(concatenated_array)`
numpy.concatenate()	Concatenates arrays along a specified axis.
numpy.vstack() and numpy.hstack()	Stack arrays vertically and horizontally, respectively.
numpy.dstack()	Stack arrays depth-wise.
Splitting	`split_arrays = np.split(concatenated_array, [2], axis=0) # split after the second row print(split_arrays)`
numpy.split()	Split an array into multiple sub-arrays along a specified axis.
numpy.hsplit() and numpy.vsplit()	Split arrays horizontally and vertically, respectively.
numpy.dsplit()	Split arrays depth-wise.

Basic Operations

Addition	array1 + array2
Subtraction	array1 - array2
Multiplication	array1 * array2
Division	array1 / array2
Floor Division	array1 // array2
Modulus	array1 % array2
Exponentiation	array1 ** array2
Absolute	np.abs(array)
Negative	-array
Reciprocal	1 / array
Sum	np.sum(array)
Minimum	np.min(array)
Maximum	np.max(array)
Mean	np.mean(array)
Median	np.median(array)
Standard Deviation	np.std(array)
Variance	np.var(array)
Dot Product	np.dot(array1, array2)
Cross Product	np.cross(array1, array2)

NaN Handling

Identifying NaNs	Use np.isnan() function to check for NaN values in an array.
Removing NaNs	Use np.isnan() to create a boolean mask, then use boolean indexing to select non-NaN values.
Replacing NaNs	Use np.nan_to_num() to replace NaNs with a specified value. Use np.nanmean(), np.nanmedian(), etc., to compute mean, median, etc., ignoring NaNs.
Interpolating NaNs	Sure, here's a short content for "NaN Handling" on your NumPy cheat sheet: NaN Handling: Identifying NaNs: Use np.isnan() function to check for NaN values in an array. Removing NaNs: Use np.isnan() to create a boolean mask, then use boolean indexing to select non-NaN values. Replacing NaNs: Use np.nan_to_num() to replace NaNs with a specified value. Use np.nanmean(), np.nanmedian(), etc., to compute mean, median, etc., ignoring NaNs. Interpolating NaNs
Ignoring NaNs in Operations	Many NumPy functions have NaN-aware counterparts, like np.nanmean(), np.nansum(), etc., that ignore NaNs in computations.
Handling NaNs in Aggregations	Aggregation functions (np.sum(), np.mean(), etc.) typically return NaN if any NaNs are present in the input array. Use skipna=True parameter in pandas DataFrame functions for NaN handling.
Dealing with NaNs in Linear Algebra	NumPy's linear algebra functions (np.linalg.inv(), np.linalg.solve(), etc.) handle NaNs by raising LinAlgError.

Broadcasting

Broadcasting is a powerful feature in NumPy that allows arrays of different shapes to be combined in arithmetic operations.

When operating on arrays of different shapes, NumPy automatically broadcasts the smaller array across the larger array so that they have compatible shapes.

This eliminates the need for explicit looping over array elements, making code more concise and efficient.

Broadcasting is particularly useful for performing operations between arrays of different dimensions or sizes without needing to reshape them explicitly.

Mathematical Functions

Definition	NumPy provides a wide range of mathematical functions that operate element-wise on arrays, allowing for efficient computation across large datasets.
Trigonometric Functions	np.sin(), np.cos(), np.tan(), np.arcsin(), np.arccos(), np.arctan()
Hyperbolic Functions	np.sinh(), np.cosh(), np.tanh(), np.arcsinh(), np.arccosh(), np.arctanh()
Exponential and Logarithmic Functions	np.exp(), np.log(), np.log2(), np.log10()
Rounding	np.round(), np.floor(), np.ceil(), np.trunc()
Absolute Value	np.abs()
Factorial and Combinations	np.factorial(), np.comb()
Gamma and Beta Functions	np.gamma(), np.beta()
Sum, Mean, Median	np.sum(), np.mean(), np.median()
Standard Deviation, Variance	np.std(), np.var()
Matrix Operations	np.dot(), np.inner(), np.outer(), np.cross()
Eigenvalues and Eigenvectors	np.linalg.eig(), np.linalg.eigh(), np.linalg.eigvals()
Matrix Decompositions	np.linalg.svd(), np.linalg.qr(), np.linalg.cholesky()

Array Creation

numpy.array()	Create an array from a Python list or tuple.
Example	`arr = np.array([1, 2, 3])`
numpy.zeros()	Create an array filled with zeros.
Example	`zeros_arr = np.zeros((3, 3))`
numpy.ones()	Create an array filled with ones.
Example	`ones_arr = np.ones((2, 2))`
numpy.arange()	Create an array with a range of values.
Example	`range_arr = np.arange(0, 10, 2) # array([0, 2, 4, 6, 8])`
numpy.linspace()	Create an array with evenly spaced values.
Example	`linspace_arr = np.linspace(0, 10, 5) # array([ 0., 2.5, 5., 7.5, 10.])`
numpy.eye()	Create an identity matrix.
Example	`identity_mat = np.eye(3)`
numpy.random.rand()	Create an array with random values from a uniform distribution.
Example	`random_arr = np.random.rand(2, 2)`
numpy.random.randn()	Create an array with random values from a standard normal distribution.
Example	`random_normal_arr = np.random.randn(2, 2)`
numpy.random.randint()	Create an array with random integers.
Example	`random_int_arr = np.random.randint(0, 10, size=(2, 2))`
numpy.full()	Create an array filled with a specified value.
Example	`full_arr = np.full((2, 2), 7)`
numpy.empty()	Create an uninitialized array (values are not set, might be arbitrary).
Example	`empty_arr = np.empty((2, 2))`

Linear Algebra

Matrix Multiplication	np.dot() or @ operator for matrix multiplication.
Transpose	np.transpose() or .T attribute for transposing a matrix.
Inverse	np.linalg.inv() for calculating the inverse of a matrix.
Determinant	np.linalg.det() for computing the determinant of a matrix.
Eigenvalues and Eigenvectors	np.linalg.eig() for computing eigenvalues and eigenvectors.
Matrix Decompositions	Functions like np.linalg.qr(), np.linalg.svd(), and np.linalg.cholesky() for various matrix decompositions.
Solving Linear Systems	np.linalg.solve() for solving systems of linear equations.
Vectorization	Leveraging NumPy's broadcasting and array operations for efficient linear algebra computations.

Statistical Functions

mean	Computes the arithmetic mean along a specified axis.
median	Computes the median along a specified axis.
average	Computes the weighted average along a specified axis.
std	Computes the standard deviation along a specified axis.
var	Computes the variance along a specified axis.
amin	Finds the minimum value along a specified axis.
amax	Finds the maximum value along a specified axis.
argmin	Returns the indices of the minimum value along a specified axis.
argmax	Returns the indices of the maximum value along a specified axis.
percentile	Computes the q-th percentile of the data along a specified axis.
histogram	Computes the histogram of a set of data.

Comparison with Python Lists

Performance	NumPy arrays are faster and more memory efficient compared to Python lists, especially for large datasets. This is because NumPy arrays are stored in contiguous blocks of memory and have optimized functions for mathematical operations, whereas Python lists are more flexible but slower due to their dynamic nature.
Vectorized Operations	NumPy allows for vectorized operations, which means you can perform operations on entire arrays without the need for explicit looping. This leads to concise and efficient code compared to using loops with Python lists.
Multidimensional Arrays	NumPy supports multidimensional arrays, whereas Python lists are limited to one-dimensional arrays or nested lists, which can be less intuitive for handling multi-dimensional data.
Broadcasting	NumPy arrays support broadcasting, which enables operations between arrays of different shapes and sizes. In contrast, performing similar operations with Python lists would require explicit looping or list comprehensions.
Type Stability	NumPy arrays have a fixed data type, which leads to better performance and memory efficiency. Python lists can contain elements of different types, leading to potential type conversion overhead.
Rich Set of Functions	NumPy provides a wide range of mathematical and statistical functions optimized for arrays, whereas Python lists require manual implementation or the use of external libraries for similar functionality.
Memory Usage	NumPy arrays typically consume less memory compared to Python lists, especially for large datasets, due to their fixed data type and efficient storage format.
Indexing and Slicing	NumPy arrays offer more powerful and convenient indexing and slicing capabilities compared to Python lists, making it easier to manipulate and access specific elements or subarrays.
Parallel Processing	NumPy operations can leverage parallel processing capabilities of modern CPUs through libraries like Intel MKL or OpenBLAS, resulting in significant performance gains for certain operations compared to Python lists.
Interoperability	NumPy arrays can be easily integrated with other scientific computing libraries in Python ecosystem, such as SciPy, Pandas, and Matplotlib, allowing seamless data exchange and interoperability.

Masked Arrays

Why?	Masked arrays in NumPy allow you to handle missing or invalid data efficiently.
What are Masked Arrays?	Masked arrays are arrays with a companion boolean mask array, where elements that are marked as "masked" are ignored during computations.
Creating Masked Arrays	You can create masked arrays using the numpy.ma.masked_array function, specifying the data array and the mask array.
Masking	Masking is the process of marking certain elements of an array as invalid or missing. You can manually create masks or use functions like numpy.ma.masked_where to create masks based on conditions.
Operations with Masked Arrays	Operations involving masked arrays automatically handle masked values by ignoring them in computations. This allows for easy handling of missing data without explicitly removing or replacing them.
Masked Array Methods	NumPy provides methods for masked arrays to perform various operations like calculating statistics, manipulating data, and more. These methods are similar to regular array methods but handle masked values appropriately.
Applications	Masked arrays are useful in scenarios where datasets contain missing or invalid data points. They are commonly used in scientific computing, data analysis, and handling time series data where missing values are prevalent.

Random Number Generation

np.random.rand	Generates random numbers from a uniform distribution over [0, 1).
np.random.randn	Generates random numbers from a standard normal distribution (mean 0, standard deviation 1).
np.random.randint	Generates random integers from a specified low (inclusive) to high (exclusive) range.
np.random.random_sample or np.random.random	Generates random floats in the half-open interval [0.0, 1.0).
np.random.choice	Generates random samples from a given 1-D array or list.
np.random.shuffle	Shuffles the elements of an array in place.
np.random.permutation	Randomly permutes a sequence or returns a permuted range.
np.random.seed	Sets the random seed to ensure reproducibility of results.

Filtering Arrays

Filtering Arrays	NumPy provides powerful tools for filtering arrays based on certain conditions. Filtering allows you to select elements from an array that meet specific criteria.
Syntax	`filtered_array = array[condition]`
Example	`import numpy as np arr = np.array([1, 2, 3, 4, 5]) filtered = arr[arr > 2] # Select elements greater than 2 print(filtered) # Output: [3 4 5]`
Combining Conditions	Conditions can be combined using logical operators like & (and) and \| (or).
Example	`arr = np.array([1, 2, 3, 4, 5]) filtered = arr[(arr > 2) & (arr < 5)] # Select elements between 2 and 5 print(filtered) # Output: [3 4]`
Using Functions	NumPy also provides functions like np.where() and np.extract() for more complex filtering.
Example	`arr = np.array([1, 2, 3, 4, 5]) filtered = np.where(arr % 2 == 0, arr, 0) # Replace odd elements with 0 print(filtered) # Output: [0 2 0 4 0]`

Array Iteration

For Loops	Iterate over arrays using traditional for loops. This is useful for basic iteration but might not be the most efficient method for large arrays.
nditer	The nditer object allows iterating over arrays in a more efficient and flexible way. It provides options to specify the order of iteration, data type casting, and external loop handling.
Flat Iteration	The flat attribute of NumPy arrays returns an iterator that iterates over all elements of the array as if it were a flattened 1D array. This is useful for simple element-wise operations.
Broadcasting	When performing operations between arrays of different shapes, NumPy automatically broadcasts the arrays to compatible shapes. Understanding broadcasting rules can help efficiently iterate over arrays without explicit loops.
Vectorized Operations	Instead of explicit iteration, utilize NumPy's built-in vectorized operations which operate on entire arrays rather than individual elements. This often leads to faster and more concise code.

Array Reshaping

Array Reshaping	Reshaping arrays in NumPy allows you to change the shape or dimensions of an existing array without changing its data. This is useful for tasks like converting a 1D array into a 2D array or vice versa, or for preparing data for certain operations like matrix multiplication.
reshape()	The reshape() function in NumPy allows you to change the shape of an array to a specified shape.
For example:	import numpy as np arr = np.array([1, 2, 3, 4, 5, 6]) reshaped_arr = arr.reshape((2, 3))
Explanation	This will reshape the array arr into a 2x3 matrix.
resize()	Similar to reshape(), resize() changes the shape of an array, but it also modifies the original array if necessary to accommodate the new shape.
Example	`arr = np.array([[1, 2], [3, 4]]) resized_arr = np.resize(arr, (3, 2))`
Explanation	If the new shape requires more elements than the original array has, resize() repeats the original array to fill in the new shape.
flatten()	The flatten() method collapses a multi-dimensional array into a 1D array by iterating over all elements in row-major (C-style) order.
Example	`arr = np.array([[1, 2], [3, 4]]) flattened_arr = arr.flatten()`
Explanation	This will flatten the 2D array into a 1D array.
ravel()	Similar to flatten(), ravel() also flattens multi-dimensional arrays into a 1D array, but it returns a view of the original array whenever possible.
Example	`arr = np.array([[1, 2], [3, 4]]) raveled_arr = arr.ravel()`
Explanation	This method can be more efficient in terms of memory usage than flatten().
transpose()	The transpose() method rearranges the dimensions of an array. For 2D arrays, it effectively swaps rows and columns.
Example	`arr = np.array([[1, 2], [3, 4]]) transposed_arr = arr.transpose()`
Explanation	This will transpose the 2x2 matrix, swapping rows and columns.

Sorting Arrays

np.sort(arr)	Returns a sorted copy of the array.
arr.sort()	Sorts the array in-place.
np.argsort(arr)	Returns the indices that would sort the array.
np.lexsort()	Performs an indirect sort using a sequence of keys.
np.sort_complex(arr)	Sorts the array of complex numbers based on the real part first, then the imaginary part.
np.partition(arr, k)	Rearranges the elements in such a way that the kth element will be in its correct position in the sorted array, with all smaller elements to its left and all larger elements to its right.
np.argpartition(arr, k)	Returns the indices that would partition the array.

Array Indexing

Single Element Access	Use square brackets [] to access individual elements of an array by specifying the indices for each dimension. For example, arr[0, 1] accesses the element at the first row and second column of the array arr.
Negative Indexing	Negative indices can be used to access elements from the end of the array. For instance, arr[-1] accesses the last element of the array arr.
Slice Indexing	NumPy arrays support slicing similar to Python lists. You can use the colon : operator to specify a range of indices. For example, arr[1:3] retrieves elements from index 1 to index 2 (inclusive) along the first axis.
Integer Array Indexing	You can use arrays of integer indices to extract specific elements from an array. For example, arr[[0, 2, 4]] retrieves elements at indices 0, 2, and 4 along the first axis.
Boolean Array Indexing (Boolean Masking)	You can use boolean arrays to filter elements from an array based on a condition. For example, arr[arr > 0] retrieves all elements of arr that are greater than zero.
Fancy Indexing	Fancy indexing allows you to select multiple elements from an array using arrays of indices or boolean masks. This method can be used to perform advanced selection operations efficiently.

Numpy Cheat Sheet (DRAFT) by Arshdeep

Array Slicing

Performance Tips and Tricks

Array Concatenation and Splitting

Basic Operations

NaN Handling

Broadcasting

Mathematical Functions

Array Creation

Linear Algebra

Statistical Functions

Comparison with Python Lists

Masked Arrays

Random Number Generation

Filtering Arrays

Array Iteration

Array Reshaping

Sorting Arrays

Array Indexing

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Numpy Cheat Sheet (DRAFT) by Arshdeep

Array Slicing

Perfor­mance Tips and Tricks

Array Concat­enation and Splitting

Basic Operations

NaN Handling

Broadc­asting

Mathem­atical Functions

Array Creation

Linear Algebra

Statis­tical Functions

Comparison with Python Lists

Masked Arrays

Random Number Generation

Filtering Arrays

Array Iteration

Array Reshaping

Sorting Arrays

Array Indexing

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Performance Tips and Tricks

Array Concatenation and Splitting

Broadcasting

Mathematical Functions

Statistical Functions