I.1.1 Union Find

Algorithm I Week 1: Dynamic Connectivity and Union Find

Union Find

Algorithm

Initialize

Union

Find

Worst-CaseTime

Quick-Find

n

n

1

Quick-Union

n

1 ~ n

n

Weighted QU

n

lg(n)

lg(n)

QU+PathComp

n

lg(n)

lg(n)

W QU + PathComp

n

Quick Find

  • Data Structure: id = [0 1 2 3 .. N-1]

  • connected: id[p] == id[q]

  • Union: id[id==id[p]] = id[q]

  • Time Complexity: O(n2)O(n^2)

Quick Union

  • Data Structure(Tree): id = [0 1 2 3 .. N-1]

  • connected: root(p) == root(q) := id[id[id..[p]]]

  • Union: id[root(p)] = root(q)

Weighted

Compare the size of two unions and concatenate the smaller tree to the larger one.

Union
if sz[i] < sz[j]:
    id[i] = id[j]
    sz[j] += sz[i]
else:
    ...

Path Compression

Find
def find(i):
    if i!=id[i]:
        id[i] = find(id[i])
    return id[i]

O(n2)O(n^2) is not acceptable for modern algorithms! Quadratic problems do not scale with technology.

Meaning: When Computers 10x faster, Memory 10x as much (we can fit larger N!), it takes 10x time to compute the same problem where N*=10!

API for the structure

Union Find is a process of

class UF(int N){ // size N 
    // int data[N] = [0, 1, 2, ..., N-1];
    void union(int p, int q); 
    bool connected(int p, int q); 
    int find(int p); 
    int count(); 
}

Use Case / Applications

  • Percolation (渗透) conductor, waterflow, social network...

  • Games (Go, Hex)

  • Dynamic Connectivity

  • Least common ancestor

  • Equivalence of finite state automata

  • Hoshen-Kopelman algorithm in physics

  • Hinley-Milner polymorphic type inference

  • Kruskal's minimal spanning tree

  • Compiling equivalence statements in Fortran

  • Morphological attribute openings and closings

  • Matlab's bwlabel() function in image processing

Percolation

For the unsolvable Mathematical Model, use a Computational Model to approximate

Use Cases

  1. Remove highly correlated features from a dataset with correlation matrix. Collect the feature pairs with a threshold of 0.9, and retain the feature with least numbre of missing values. (pandas)

a modified version of union find
correlated_union = {key:key for key in keys}
# find
def root(i):
    if i != correlated_union[i]:
        # path compression
        correlated_union[i] = root(correlated_union[i])
    return correlated_union[i]
    
# union
for x, y in correlated_tuples:
    x_rt, y_rt = root(x), root(y)
    if x_rt != y_rt:
        # set the root to be the one with fewer missing values
        if df_train[x_rt].isnull().sum() < df_train[y_rt].isnull().sum():
            correlated_union[y_rt] = x_rt
        else:
            correlated_union[x_rt] = y_rt

Coding Questions

Interview Questions

1. Social network connectivity. Given a social network containing nn members and a log file containing mm timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend ... of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be mlognm\log n or better and use extra space proportional to nn .

when a sz[p] = n.

2. Union-find with specific canonical element. Add a method find()find() to the union-find data type so that find(i) returns the largest element in the connected component containing ii . The operations, union(), connected(), find() should all take logarithmic time or better.

For example, if one of the connected components is {1,2,6,9}\{1,2,6,9\} , then the find() method should return 99 for each of the four elements in the connected components.

if find(p) < find(q):
     id[find(p)] = find(q)
else:
     id[find(q)] = find(p)

3. Successor with delete. Given a set of nn integers S=0,1,...,n1S={0,1,...,n−1} and a sequence of requests of the following form:

  • Remove xx from SS

  • Find the successor of xx: the smallest yy in SS such that yxy≥x.

design a data type so that all operations (except construction) take logarithmic time or better in the worst case.

?Hint: maintain an extra array to the weighted quick-union data structure that stores for each root ii the large element in the connected component containing ii. use the modification of the union−find data discussed in the previous question.

Last updated