I.1.1 Union Find

Algorithm I Week 1: Dynamic Connectivity and Union Find

Union Find

Algorithm

Initialize

Union

Find

Worst-CaseTime

Quick-Find

$M N$

Quick-Union

1 ~ n

$M N$

Weighted QU

lg(n)

$N + M \lg N$

QU+PathComp

lg(n)

$N + M \lg N$

W QU + PathComp

$N + M \lg^*N$

Quick Find

Data Structure: id = [0 1 2 3 .. N-1]
connected: id[p] == id[q]
Union: id[id==id[p]] = id[q]
Time Complexity: $O(n^2)$

Quick Union

Data Structure(Tree): id = [0 1 2 3 .. N-1]
connected: root(p) == root(q) := id[id[id..[p]]]
Union: id[root(p)] = root(q)

Weighted

Compare the size of two unions and concatenate the smaller tree to the larger one.

Union

if sz[i] < sz[j]:
    id[i] = id[j]
    sz[j] += sz[i]
else:
    ...

Path Compression

Find

def find(i):
    if i!=id[i]:
        id[i] = find(id[i])
    return id[i]

$O(n^2)$ is not acceptable for modern algorithms! Quadratic problems do not scale with technology.

Meaning: When Computers 10x faster, Memory 10x as much (we can fit larger N!), it takes 10x time to compute the same problem where N*=10!

API for the structure

Union Find is a process of

class UF(int N){ // size N 
    // int data[N] = [0, 1, 2, ..., N-1];
    void union(int p, int q); 
    bool connected(int p, int q); 
    int find(int p); 
    int count(); 
}

Use Case / Applications

Percolation (渗透) conductor, waterflow, social network...
Games (Go, Hex)
Dynamic Connectivity
Least common ancestor
Equivalence of finite state automata
Hoshen-Kopelman algorithm in physics
Hinley-Milner polymorphic type inference
Kruskal's minimal spanning tree
Compiling equivalence statements in Fortran
Morphological attribute openings and closings
Matlab's bwlabel() function in image processing

Percolation

For the unsolvable Mathematical Model, use a Computational Model to approximate

Use Cases

Remove highly correlated features from a dataset with correlation matrix. Collect the feature pairs with a threshold of 0.9, and retain the feature with least numbre of missing values. (pandas)

a modified version of union find

correlated_union = {key:key for key in keys}
# find
def root(i):
    if i != correlated_union[i]:
        # path compression
        correlated_union[i] = root(correlated_union[i])
    return correlated_union[i]
    
# union
for x, y in correlated_tuples:
    x_rt, y_rt = root(x), root(y)
    if x_rt != y_rt:
        # set the root to be the one with fewer missing values
        if df_train[x_rt].isnull().sum() < df_train[y_rt].isnull().sum():
            correlated_union[y_rt] = x_rt
        else:
            correlated_union[x_rt] = y_rt

Coding Questions

https://leetcode.com/problemset/algorithms/?topicSlugs=union-find

Interview Questions

1. Social network connectivity. Given a social network containing $n$ members and a log file containing $m$ timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend ... of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be $m\log n$ or better and use extra space proportional to $n$ .

when a sz[p] = n.

2. Union-find with specific canonical element. Add a method find()find() to the union-find data type so that find(i) returns the largest element in the connected component containing $i$ . The operations, union(), connected(), find() should all take logarithmic time or better.

For example, if one of the connected components is $\{1,2,6,9\}$ , then the find() method should return $9$ for each of the four elements in the connected components.

if find(p) < find(q):
     id[find(p)] = find(q)
else:
     id[find(q)] = find(p)

3. Successor with delete. Given a set of $n$ integers $S={0,1,...,n−1}$ and a sequence of requests of the following form:

Remove $x$ from $S$
Find the successor of $x$ : the smallest $y$ in $S$ such that $y≥x$ .

design a data type so that all operations (except construction) take logarithmic time or better in the worst case.

?Hint: maintain an extra array to the weighted quick-union data structure that stores for each root ii the large element in the connected component containing ii. use the modification of the union−find data discussed in the previous question.

PreviousPrinceton Algorithms I Notebook NextI.1.2 Analysis of Algorithms

Last updated 6 years ago