# I.1.1 Union Find

## Union Find&#x20;

| Algorithm       | Initialize | Union  | Find  | Worst-CaseTime    |
| --------------- | ---------- | ------ | ----- | ----------------- |
| Quick-Find      | n          | n      | 1     | $$M N$$           |
| Quick-Union     | n          | 1 \~ n | n     | $$M N$$           |
| Weighted QU     | n          | lg(n)  | lg(n) | $$N + M \lg N$$   |
| QU+PathComp     | n          | lg(n)  | lg(n) | $$N + M \lg N$$   |
| W QU + PathComp | n          |        |       | $$N + M \lg^\*N$$ |

### Quick Find

* Data Structure: `id = [0 1 2 3 .. N-1]`&#x20;
* connected: `id[p] == id[q]`
* Union:  `id[id==id[p]] = id[q]`
* Time Complexity:  $$O(n^2)$$ &#x20;

### Quick Union

* Data Structure(Tree): `id = [0 1 2 3 .. N-1]`&#x20;
* connected: `root(p) == root(q) := id[id[id..[p]]]`
* Union:  `id[root(p)] = root(q)`

### Weighted

Compare the size of two unions and concatenate the smaller tree to the larger one.

{% code title="Union" %}

```python
if sz[i] < sz[j]:
    id[i] = id[j]
    sz[j] += sz[i]
else:
    ...
```

{% endcode %}

### Path Compression

{% code title="Find" %}

```python
def find(i):
    if i!=id[i]:
        id[i] = find(id[i])
    return id[i]
```

{% endcode %}

{% hint style="info" %}
&#x20; $$O(n^2)$$ is not acceptable for modern algorithms! Quadratic problems do not scale with technology.&#x20;

Meaning: When Computers 10x faster, Memory 10x as much (we can fit larger N!), it takes 10x time to compute the same problem where N\*=10! &#x20;
{% endhint %}

### API for the structure

Union Find is a process of&#x20;

```cpp
class UF(int N){ // size N 
    // int data[N] = [0, 1, 2, ..., N-1];
    void union(int p, int q); 
    bool connected(int p, int q); 
    int find(int p); 
    int count(); 
}
```

## Use Case / Applications

* Percolation (渗透) conductor, waterflow, social network...
* Games (Go, Hex)
* **Dynamic Connectivity**
* Least common ancestor
* **Equivalence of finite state automata**
* Hoshen-Kopelman algorithm in physics
* Hinley-Milner polymorphic type inference
* **Kruskal's minimal spanning tree**
* Compiling equivalence statements in Fortran
* Morphological attribute openings and closings
* Matlab's bwlabel() function in image processing

Percolation

For the unsolvable Mathematical Model, use a Computational Model to approximate

### Use Cases

1. Remove highly correlated features from a dataset with correlation matrix. Collect the feature pairs with a threshold of 0.9, and retain the feature with least numbre of missing values. (pandas)

{% code title="a modified version of union find" %}

```python
correlated_union = {key:key for key in keys}
# find
def root(i):
    if i != correlated_union[i]:
        # path compression
        correlated_union[i] = root(correlated_union[i])
    return correlated_union[i]
    
# union
for x, y in correlated_tuples:
    x_rt, y_rt = root(x), root(y)
    if x_rt != y_rt:
        # set the root to be the one with fewer missing values
        if df_train[x_rt].isnull().sum() < df_train[y_rt].isnull().sum():
            correlated_union[y_rt] = x_rt
        else:
            correlated_union[x_rt] = y_rt
```

{% endcode %}

### Coding Questions

1. <https://leetcode.com/problemset/algorithms/?topicSlugs=union-find>

### Interview Questions

**1. Social network connectivity.** Given a social network containing $$n$$ members and a log file containing $$m$$ timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend ... of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should be $$m\log n$$ or better and use extra space proportional to $$n$$.

> when a sz\[p] = n.

2\. **Union-find with specific canonical element.** Add a method find()find() to the union-find data type so that `find(i)` returns the largest element in the connected component containing $$i$$ . The operations, `union()`, `connected()`, `find()` should all take logarithmic time or better.

For example, if one of the connected components is $${1,2,6,9}$$ , then the `find()` method should return $$9$$ for each of the four elements in the connected components.

```python
if find(p) < find(q):
     id[find(p)] = find(q)
else:
     id[find(q)] = find(p)
```

**3. Successor with delete**. Given a set of $$n$$ integers $$S={0,1,...,n−1}$$ and a sequence of requests of the following form:

* Remove $$x$$ from $$S$$
* Find the *successor* of $$x$$: the smallest $$y$$ in $$S$$ such that $$y≥x$$.

design a data type so that all operations (except construction) take logarithmic time or better in the worst case.

{% hint style="warning" %}
?*Hint:* maintain an extra array to the weighted quick-union data structure that stores for each root ii the large element in the connected component containing ii. use the modification of the union−find data discussed in the previous question.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zike.gitbook.io/princeton-algorithms-notebook-python/algorithm-i-week1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
