I.1.1 Union Find
Algorithm I Week 1: Dynamic Connectivity and Union Find
Union Find
Algorithm
Initialize
Union
Find
Worst-CaseTime
Quick-Find
n
n
1
Quick-Union
n
1 ~ n
n
Weighted QU
n
lg(n)
lg(n)
QU+PathComp
n
lg(n)
lg(n)
W QU + PathComp
n
Quick Find
Data Structure:
id = [0 1 2 3 .. N-1]
connected:
id[p] == id[q]
Union:
id[id==id[p]] = id[q]
Time Complexity:
Quick Union
Data Structure(Tree):
id = [0 1 2 3 .. N-1]
connected:
root(p) == root(q) := id[id[id..[p]]]
Union:
id[root(p)] = root(q)
Weighted
Compare the size of two unions and concatenate the smaller tree to the larger one.
Path Compression
is not acceptable for modern algorithms! Quadratic problems do not scale with technology.
Meaning: When Computers 10x faster, Memory 10x as much (we can fit larger N!), it takes 10x time to compute the same problem where N*=10!
API for the structure
Union Find is a process of
Use Case / Applications
Percolation (渗透) conductor, waterflow, social network...
Games (Go, Hex)
Dynamic Connectivity
Least common ancestor
Equivalence of finite state automata
Hoshen-Kopelman algorithm in physics
Hinley-Milner polymorphic type inference
Kruskal's minimal spanning tree
Compiling equivalence statements in Fortran
Morphological attribute openings and closings
Matlab's bwlabel() function in image processing
Percolation
For the unsolvable Mathematical Model, use a Computational Model to approximate
Use Cases
Remove highly correlated features from a dataset with correlation matrix. Collect the feature pairs with a threshold of 0.9, and retain the feature with least numbre of missing values. (pandas)
Coding Questions
Interview Questions
1. Social network connectivity. Given a social network containing members and a log file containing timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend ... of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be or better and use extra space proportional to .
when a sz[p] = n.
2. Union-find with specific canonical element. Add a method find()find() to the union-find data type so that find(i)
returns the largest element in the connected component containing . The operations, union()
, connected()
, find()
should all take logarithmic time or better.
For example, if one of the connected components is , then the find()
method should return for each of the four elements in the connected components.
3. Successor with delete. Given a set of integers and a sequence of requests of the following form:
Remove from
Find the successor of : the smallest in such that .
design a data type so that all operations (except construction) take logarithmic time or better in the worst case.
?Hint: maintain an extra array to the weighted quick-union data structure that stores for each root ii the large element in the connected component containing ii. use the modification of the union−find data discussed in the previous question.
Last updated