Some Helpful Big Oh Analysis
Expansion 
Summation 
Big Oh 
1+2+3+4+....+ N 
N(N+1)/2 
O(N^2) 
N+N+N+....+N 
N*N 
O(N^2) 
N+N+N+....+N+....+N+....+N 
3N*N 
O(N^2) 
1+2+4+....+ 
2^(N+1)1 
O(2^N) 
2^10 = 1024 ~~ 1000
O notation is an upper bound so N is O(N) but it is also O(N^2)
Order of Growth Classifications
Usually, nested for loops have a big O(N^2) because each of them runs n times. However, sometimes they can run less than n times.
for (int i =0; i<N; i++) > N times
for (int j =1; j<n; j  j*2)
Big O is n* log n times
Common Data Structure Operations
Arrays vs ArrayLists
Arrays 
ArrayLists 
They have a fixed size 
Size can change 
Much faster to add to 
Adding to an arraylist is usually N 

But when you reach the max, the computer doubles the limit every time you hit the limit so it takes O(N) times > This is why it takes longer 
Binary Search Tree
 Each node has a value
 Nodes with the values less than their parent are in the left
Nodes with values greater than their parent are in the right subtree
 If equal, choose a side and stay consistent
 Insert from top of binary search tree and move down
BSTS to Lists
Trees: are nodes with two pointers
Doubly linked lists: also nodes with two pointers (allows for constant time access with one pointing to front and one pointing to back)
Complete Binary Tree
 Every non leaf node has two children
 All the leaves are at the same level
 There are 2^{N} 1 or O(2^{N}) nodes with N levels
 There are 2^{N1} leaves with n levels 
Priority Queues
 Minimum is first out
Poll means remove the minimum each time
List [0] will be smallest
List [1] is smallest of all the ones that remain
While a queue is first in first out, a priority queue is minium out first
Shortest path
Heaps
 Heap is an arraybased implementation of a binary tree used for implementing priority queues and supports: insert, findMin, and deleteMin
Using array minimizes storage (no explicit pointers) , faster too because children are locatd by index/position in array
Deletion: remove root and replace with right most child and then bubble down filling left to right
Properties:
 shape: tree filled at all levels (except perhaps last) and filled in lefttoright (complete binary tree
 value: each node has value smaller than both children
Min Heap:
 Minimal element is at root, index 1
Maximal element has to be a leaf, because can't be greater than child
Complexity of finding maximal elements, half nodes are trees > O(n/2) so O(n)
 Second smallest element must be one level below root 
Using An Array For a Heap
 Store node values in array starting at index 1
 For node with index k:
 left child: index 2*k
right child: index 2*k +1
 parent: index k/2
Adding Values to Heap
 To maintain heap shape, must add new value in lefttoright order of last level
 This could violate heap property
 move value "up" if too small
 Change places with parent if heap property is violated and stop when parent is smaller and stop when root is reached
Pull parent down
Heap Add Implementation
Tries
 Tries support add, contains, delete in O(w) time for words of length w
 Each node in a trie has a subtrie for every valid letter than can follow
 
Priority Queue Implementations
Operations: O(log n)
add  add element to last spot and bubble up
remove/poll  remove root.min and take last element and bubble down
Graphs Vocabulary
 A collection of vertices and edges
 Edge connections two vertices
 Direction can be imported, directed edge, directed graph
 Edge may have associated weight/cost
 A vertex sequence is a path where vk and vk+1 are connected by an edge
 If some vertex is repeated, the path is a cycle
 A graph is connected if there is a path between any pair of vertices
Articulation Point breaks graph in two 
Graphs DFS
Envision each vertex as a room
Go into a room, mark the room, choose an unused door, exit
Don't go into room you've already been in> explore every vertex one time
qu is where we're going, visited is where we've been
Adjacency Lists and Matrix
Adjacency List: V+E spaces
Adjacency Matrix: V*E
Creating Adjacency Matrix
public int howLong(String [] connects, String [] costs){
int [] [] adjMatrix = new int [connects.length][connects.length]'
for (int i =0; i<connects.length; i++){
String [] edges  connects[i[.split(" ");
String [] weights = costs[i].split(" ");
for (int j =0; j<edges.length; j++){
adjMatrix[i][Integer.partseInt(edges[j])) = Integer.parseInt(weights[j]);



Analysis: Empirical vs. Mathematical
Empirical Analysis 
Mathematical Analysis 
Measure running times, plot, and fit curve 
Analyze algorithm to estimate # ops as a function of input size 
Easy to perform experiments 
May require advanced mathematics 
Model useful for predicting, but not for explaining 
Model useful for predicting and explaining 

Mathematical analysis is independent of a particular machine or compiler; applies to machines not yet built. 
Comparators
 Can't always access comparable method (implements .compareTo and uses Collections.sort and Arrays.sort)
Sometimes must implement comparators in which you pass two objects
 Must implement .compare(T a, T b)
 Return <0 when a<b
 Return ==0 when a ==b
Return >0 when a>b 
Linked List vs. ArrayList

Linked List 
ArrayList 
Both 

Separate elements in memory that all have pointers to each other 
A collection of elements in order in memory 
Collection of elements 



Add, remove, for loops, sort themselves, clear 
Remove First Element: 
N Time because you don't have to shift something when it is in the front of the list 
N^2 Time because everything stores sequentially so when you take something out you have to shift everything by 1 
Remove Middle Index 
Has a higher coefficient and thus is slower: To get there takes time but to remove it is instantaneous :O(N) 
Faster: To get to middle element is instantanenous but to remove it you have to shift it: O(N) 

Best for adding/removing front 
Best for adding/removing something from back/middle 
Trees
If you have N nodes, height is asking how many times you can divide by 2 > expressed as the log base 2 of n
Good search tree is height is log n
Bad search tree is n
Balanced if left and right subtrees are height balanced and left and right heights differ by at most 1
Autocomplete
BruteAutocomplete: stores data as a Term array and finds the top k matches by iterating through the array and pushes all terms starting with the prefix into a max priority queue sorted by weight and returns the top k terms from that priority queue
 not compared by weight and o organization
topMatches: O(n+mlogm)
 topMatch: O(n)
Improving BruteAutocomplete:
 had to iterate through every single term in the array because it did not know where the terms starting with the prefix were located aka array was unsorted.
 If we sort the array lexicographically, then all the terms with the same prefix will be adjacent (Sorting takes O(n log n)
 Term: encapsulates a word/term and its corresponding weight
BinarySearchAutocomplete: finds Terms with a given prefix by performing a binary search on a sorted array of Terms
TrieAutocomplete: finds Terms with a given prefix by building a trie to store the Terms. To be efficient should only look at words whos maxsubtree weight is greater than the minimum 
Autocomplete: Term class
 The term class encapsulates a comparable word weight pair
 WeightOrder: sorts in ascending weight order
ReverseWeightOrder: sorts in descending weight order
PrefixOrder: which sorts by the first r characters
If one or both words are shorter than r, we just use normal lexicographical sorting
 compare method must take O(r) 
Autocomplete: Binary Search
 Find all the range of all the terms comparator considers equal to key
 Quickly return the first and last index respectively of an element in the input array which the comparator considers to be equal to key
 We specify the first and last index because there could be multiple Terms in which the comparator consider to be equal to key
 Collections.binary search does not guarantee first index of terms that match key, it gives an index 
Autocomplete: Tries
 To completely eliminate terms which don't start with prefix, store in trie
 Navigate to the node representing the string. The trie rooted at this node only contains nodes starting with this trie
 No matter how many words are in our trie, navigating this node takes the same amount of time 
Autocomplete: BigOh
Class 
TopMatch 
TopMatches 
BruteAutocomplete 
O(n) 
O(mlogm + n) 
BinarySearchAutocomplete 
O(log(n) + m) 
O(log(n) + (m + k)log(k)) 
TrieAutocomplete 
O(w) 
O(w) 
n: number of terms in total
m: number of terms that match the prefix
k: desired number of terms
w: number of letters in the word
Common Recurrences and Their Solutions
Graphs BFS
Visit everything that is one away, then everything that is two away...
Used to find shortest distance
takes a lot of space > B^d
Bacon Number
Good Center  Has the most people closest to them
Chooses the best path, lowest number of edges
Actor Actor Representation//Vertices: actors or actresses//Edges: Two actors are adjacent (joined by a graph edge) if and only if they appear in the same movie
Movie Movie Representation// Vertices: Movies//Edges: Two movies are adjacent if they share a cast member
Actor Movie Representation//Vertices: Actors, actresses, and movies// Edges: An actor is connected to a movie if he or she appeared in that movie 
1. Most vertices: Actor to movie
a. All of the vertices you had in actor to actor and all vertices in movie
to movie
2. Most edges: Actor to Actor


Recursion
Efficient sorting algorithms are usually recursive
Base Case: does not make a recursive call
For Linked Lists: Base case is always empty list or singular node/Recursive calls make progress toward base case (list.next as argument) 
Percolation Overview
 System percolates if top and bottom are connected by open sites
 Given a NxN grid, where each is site is open with probability p*, what is the probability that the system percolates?
 if p>p*, system most likely percolates
 if p< p*, system does not percolate
All simulations, whether using PercolationDFS, PercolationDFSFast, or Percolation UF with any implementation of unionfind will be at least O(N^{2})
 Finding the threshold
 Initialize NxN grid of sites as blocked
 Randomly open sites until system percolates
 Percentage of pen sites gives an approximation of p*
How do you get random cells to open and not open same shell more than once:
 Make points out of the cells
 Shuffle them gets a random ordering of all the point where each one
occurs once time
 Go through and repeatedly open each 
Percolation Solution 1: Depth First Search
 Try searching from all of the open spots on the top row
 Search from all legal adjacent spots you have not visited
 Recurse until you can't search any further or have reached the bottom row
 Try all the legal adjacent spots (what makes it recursive is that we do the
same problem but at a different place)
 Base Cases:
 Out of bounds
 Blocked
 Already full/visited
PercolationDFS sets each grid cell to OPEN and runs a DFS from each open cell in the top (indexzero) row to mark the cells reachable from them as FULL. In the new model PercolationDFSFast, you'll make this implementation more efficient by only checking the cell being opened to see if it results in more FULL cells, rather than checking every cell reachable from the top row.
Percolation DFS and DFSFast run in O(N) because it iterates through only the bottom row to check if it is full 
Percolation DFSFast
 Why is this an Improvement: An improvement because we don't have to search from the top:
 Don't have to start from the top and go down
 For the cells that are adjacent, now search from that spot
 If one of my neighbors is full, I am full
Don't have to redfs things you've already seen
Methodology:
Percolation DFS Fast
1. Create a grid
2. Set them all to blocked
3. Protected void updateOnOpen
4. Clear everything from being full
5. Dfs checks base cases
a. If not in bounds, return
b. If cell is full or not open, return
Otherwise try all neighbors recursively 
Percolation Solution 2: Union Find
 Create an object for each site (each cell) (Vtop as N*N, Vbottom as N^2 +1)
 Percolates if vtop is connected to vbottom
 One call that you have to make > union find
 For every cell, give it an index
 Becomes problematic when n is too long
QuickUWPC:
 Look at ultimate parent making path short to find parent at constant time
 Run time is O(1) because we simply check if vtop and vbottom are in the same set
IUnionFInd.find is called from both connected and union to find sets that p and q belong to 
Percolation Method Score Board
Tree Traversals InOrder
Visit left subtree, process root, visit right subtree
Increasing order
 Follow path and In order is when you do outline and you hit it the second time
Tree Traversals PreOrder
Process root, then visit left subtree, then visit right subtree
Good for reading/writing trees
 When you follow the outline and preorder is just when you hit it for the first time
Tree Traversals: PostOrder
Visit left subtree, right subtree, process root
Good for deleting trees
When you follow the outline and postorder is when you hit it going up
Recursion with ListNodes in return statement
public ListNode<Integer> convertRec (ListNode<String> list):
if (list == null) return null;
return new ListNode<Integer> (list.info.length, convertRev(list.next);

Doubly Linked Lists
List Node first = new ListNode <"cherry", null, null>;
List Node fig = new ListNode <"fig", first, null>;
List Node mango = new ListNode <"mango", fig, null>;
first.right = fig;
fig.right = mango;

Data Compression
Types:
Lossless: Can recover exact data
Lossy: Can recover approximate data
 Use when you don't care, photos can't tell the difference, can compress it
more 
Huffman: Text Compression
In the trie, 0 is left, 1 is right
Make the ones that occur most often the shortest path
Ones that rarely occur can be long
Ones that never occur can be as long as we want
Look at it 8 bits at a time
Building: Combine minimally weighted trees > Greedy
Bad Huffman Tree: when different character occurs once
Good Hufman Tree: One character occurs multiple times 
Alphabet size and run time and compression rate:
 Alphabet size has a big impact on run time because alph size tell syou how big the tree will be
 The number of leaves is equal to the size of your allphabet, so you have 2^k nodes in your tree
 Amount of compression is frequency that it occurs
○ 256 characters that occur the same amount of time is bad compression
○ Huffman takes advantage of the fact that some characters occur more often than others
Huffman: TreeTighten APT
Big Oh for Huffman Encodings
Read in tree data: O(k)
Decode bit string with tree: O(n)
Creating Adjacency Matrix
public int howLong(String [] connects, String [] costs){
int [] [] adjMatrix = new int [connects.length][connects.length]'
for (int i =0; i<connects.length; i++){
String [] edges  connects[i[.split(" ");
String [] weights = costs[i].split(" ");
for (int j =0; j<edges.length; j++){
adjMatrix[i][Integer.partseInt(edges[j])) = Integer.parseInt(weights[j]);


Created By
Metadata
Favourited By
Comments
No comments yet. Add yours below!
Add a Comment