30 January 2014

Read about minimum spanning tree algorithms, union-find data structures. (Russell is skipping this in lecture due to lack of time, but he wants you to read about it.) E.g. Kruskall's algorithm. Look at how it's implemented using the union-find data structure.

Approximation algorithms.

How can we analyze greedy algorithms when they're "incorrect", i.e. don't always find optimal solutions? Approximation ratio.

Vertex cover

Instance: Undirected graph. Solution type: \( C \subseteq V \). Constraint: \( \forall \{u, v\} \in E \), we have \( u \in C \) or \( v \in C \). Objective: Minimize \( |C| \).

Independent set

\( I \subseteq V \) where \( u \notin I \) or \( v \notin I \). Maximize \( |I| \).

\( I = V - C \).

Stupid (greedy) algorithm for Vertex Cover

Pick \( e \in E \), where \( e = (u, v) \). Put both \( u \) and \( v \) into \( C \). Delete \( u \) and \( v \) from \( G \). Repeat until there are no edges left.

This algorithm gets with a factor of 2 of the optimum.

Proof (Bound argument): Let \( e_1, e_2, ..., e_k \) be the edges our algorithm picks. Note these edges form a matching in the graph (the edges don't intersect, by construction). Any cover \( C' \) must contain at least one node from every \( e_i \). So \( |C'| \geq k \). Note the size of our cover is \( |C| = 2 k \leq 2 |C'| \). So the size of our cover is at most twice the optimum.

Abstracting the above proof

Consider algorithm A and instances I. A has an approximation ratio \( \rho \) if for every \( I \), \[ Cost(A(I)) \leq \rho Cost(Opt(I)). \]

Define \( LB(I) \).

Lemma: For every \( S \), we have \( Cost(S) \geq LB(I) \).

Lemma: \( Cost(A(I)) \leq \rho LB(I) \).

More complicated: \( LB_1, LB_2, \cdots, LB_k \). \[ Cost(A(I)) \leq \rho_1 LB_1 + \rho_2 LB_2 + \cdots + \rho_k LB_k. \] So \( \rho = \rho_1 + \rho_2 + \cdots + \rho_k \).

Example: Traveling Salesman Problem

The cost of a TSP solution is at least the cost of a minimum spanning tree. \( Cost (our tour) \leq Cost (in order traversal) = 2 MST \leq 2 Cost (TSP) \).

Idea for a better bound: determine when you can take shortcuts. Look at the odd degree nodes in the MST. There are always an even number of odd degree nodes. Can (apparently) form a cycle with the odd degree nodes and use it to replace some backtracking in the MST solution. This is the Christophides solution. We have \( Cost(Christophides) \leq MST + min(weighted matching) \). We have \( Cost(TSP) \geq 2 * min (weighted matching) \).

Another proof for the stupid algorithm (Change method): Let \( C_0 \) be any optimal cover, and let \( e_1 = (u, v) \) be the first edge our algorithm picks. There is a cover \( C_1 \) such that \( (u, v) \in C_1 \) and \( |C_1| \leq |C_0| + 1 \).

The opportunity cost of a decision: Cost(best solution that agrees with the decision) - Cost(best solution). For optimal algorithms, we need to show the opportunity cost is 0. In our case (proving an approximation factor of 2), we need to show the cost of each move is 1.

By strong induction proof, \[ C = (u, v) + OurCover(G - (u, v)) \\ C_1 = (u, v) + SomeCover(G - (u, v)) \]

\[ \begin{aligned} |C| &= 2 + |OurCover(G - (u, v))| \\ &\leq 2 + 2 | C_1 - (u, v) | \\ &= 2 + 2 (|C_1| - 2) \\ &illegible \\ &= 2 |C_0|. \end{aligned} \]

A variation on interval scheduling

Consider the interval scheduling problem, where you are given \( m \) rooms instead of just 1. This method is actually solvable, but we're going to look at a greedy approximation.

One greedy method: Take the events in order of duration, and schedule each in the first available room, if possible. Call this SDF (shortest duration first). Claim: This comes within a constant factor of the optimal.

Let Sched'0 be an optimal schedule. Let Sched'i be a schedule that agrees with SDF on the first i decisions.

Claim: |Sched'i| - |Sched'(i+1)| \leq 1. This is because the next event the schedule adds can conflict with at most 2 events, because it is the shortest remaining event. So |Sched'i| \geq |Sched'0| - i. Let be the number of intervals in SDF. So Sched'k \geq SDF. But no intervals left when SDF terminates, so Sched'k = SDF. So k = |SDF| = |Sched'k| \geq |Sched'0| - k. So 2k = 2 |SDF| \geq |Sched'0|. So |SDF| \geq |Sched'0| / 2. So it is a factor of 2 approximation.