I.1.1 Union Find

Algorithm I Week 1: Dynamic Connectivity and Union Find

Union Find

Quick Find

  • Data Structure: id = [0 1 2 3 .. N-1]

  • connected: id[p] == id[q]

  • Union: id[id==id[p]] = id[q]

Quick Union

  • Data Structure(Tree): id = [0 1 2 3 .. N-1]

  • connected: root(p) == root(q) := id[id[id..[p]]]

  • Union: id[root(p)] = root(q)

Weighted

Compare the size of two unions and concatenate the smaller tree to the larger one.

Union
if sz[i] < sz[j]:
    id[i] = id[j]
    sz[j] += sz[i]
else:
    ...

Path Compression

Find
def find(i):
    if i!=id[i]:
        id[i] = find(id[i])
    return id[i]

Meaning: When Computers 10x faster, Memory 10x as much (we can fit larger N!), it takes 10x time to compute the same problem where N*=10!

API for the structure

Union Find is a process of

class UF(int N){ // size N 
    // int data[N] = [0, 1, 2, ..., N-1];
    void union(int p, int q); 
    bool connected(int p, int q); 
    int find(int p); 
    int count(); 
}

Use Case / Applications

  • Percolation (渗透) conductor, waterflow, social network...

  • Games (Go, Hex)

  • Dynamic Connectivity

  • Least common ancestor

  • Equivalence of finite state automata

  • Hoshen-Kopelman algorithm in physics

  • Hinley-Milner polymorphic type inference

  • Kruskal's minimal spanning tree

  • Compiling equivalence statements in Fortran

  • Morphological attribute openings and closings

  • Matlab's bwlabel() function in image processing

Percolation

For the unsolvable Mathematical Model, use a Computational Model to approximate

Use Cases

  1. Remove highly correlated features from a dataset with correlation matrix. Collect the feature pairs with a threshold of 0.9, and retain the feature with least numbre of missing values. (pandas)

a modified version of union find
correlated_union = {key:key for key in keys}
# find
def root(i):
    if i != correlated_union[i]:
        # path compression
        correlated_union[i] = root(correlated_union[i])
    return correlated_union[i]
    
# union
for x, y in correlated_tuples:
    x_rt, y_rt = root(x), root(y)
    if x_rt != y_rt:
        # set the root to be the one with fewer missing values
        if df_train[x_rt].isnull().sum() < df_train[y_rt].isnull().sum():
            correlated_union[y_rt] = x_rt
        else:
            correlated_union[x_rt] = y_rt

Coding Questions

Interview Questions

when a sz[p] = n.

if find(p) < find(q):
     id[find(p)] = find(q)
else:
     id[find(q)] = find(p)

design a data type so that all operations (except construction) take logarithmic time or better in the worst case.

?Hint: maintain an extra array to the weighted quick-union data structure that stores for each root ii the large element in the connected component containing ii. use the modification of the union−find data discussed in the previous question.

Last updated