28 January 2014

Fork me on GitHub

TODO: This lecture requires substantial work.

Greedy algorithms

Danger!

Proof methods:

  • Greedy stays ahead
  • Change method
  • Achieves the bound
  • Unique local minimum

Interval (event) scheduling

Instance: intervals \( (s_i, f_i), i \geq 1,...,n \). The solution is a subset \( J \subseteq \{ 1, 2, ..., n \} \) of intervals. Constraint: No two intervals can overlap. Objective: Maximize $|J|$. For example, this might correspond to scheduling a room in a hotel; you'd want to maximize use of the room, but you can't have multiple groups in the same room.

Some heuristics:

  • Choose the interval that starts earliest.
  • Choose the interval that finishes earliest (this is the greedy algorithm we're considering).

Lemma: There isa n optimal solution that agrees with the first \( i \) moves of the greedy algorithm.

Lemma': There is an optimal solution that agrees with the first move of the greedy algorithm.

Theorem: The greedy algorithm is optimal. (Prove this by induction on the instance size, using Lemma')

Let \( I = (s, f) \) be the first interval to finish. There is a maximum size set \( J_1 \) of non-intersecting intervals with \( I \in J_1 \). Let \( J_0 \) be any max size set. Let \( J_1 = J_0 \cup \{ I \} - \{ \text{all intervals } I' \text{ where } I \cap I' \neq \emptyset \} \). In the above, we're adding $I$ to $J0$ and removing all conflicting intervals to make it a legal solution. Now we need to prove $J0$ and $J_1$ have the same size.

Assume there were two intervals $I1', I2'$ in $J0$ that intersect $I$. Note $s1', s2' < f$ and $f \leq f1', f2'$ because $I$ is the first interval to finish. Thus $I1'$ and $I2'$ must intersect, so they could not both have been in $J0$. Contradiction.

Theorem proof: By strong induction on \( n \). \( GA(\{I1, ... In \}) = \{ I \} \cup GA( \{I1, ..., In \} - conflicts with I) \). There exists \( J \), a max set of non-intersecting intervals \( I \in J \), \( J = \{ I \} \cup J' \). |J'| \leq |GA (subset) |. |J| = |J'| + 1 \leq 1 + GA(subset) = |GA(all intervals)|.

Greedy-stays-ahead proof. Have greedy solution $G1$, $G2$, ..., $Gk$. Have optimal solution $O1$, $O2$, ..., $O{k'}$. (The above are intervals.) Want to show $k \geq k'$.

Lemma: $f_i \leq f_i'$, 1 \leq i \leq k$. Base case: \( f_1 \leq f_1' \) because we chose the first to finish. Induction step: Assume \( f_i \leq f_i' \). \( f_{i + 1} \) is the first finish time of an interval that starts after \( f_i \). Note \( s{i + 1}' \geq f\i' \geq f_i \). \( O{i + 1} \) is some interval that starts after \( f\i \). So \( f_{i + 1}' \geq f_{i + 1} \).

Applying to the end of the sequence, we have \( f_k' \geq f_k \) If we had an interval \( O{k + 1} \), \( s\{k + 1}' \geq f_k' \geq f_k \). But then the GA would not have stopped. So \( k' \leq k \), so the greedy algorithm is optimal.

For an efficient implementation. Sort the intervals by finish time. Insert \( I_1 \) to final schedule. Lazily delete all conflicts. To do this, maintain some number \( F \) which is the end of the last interval of the current solution. Scan right.

Another problem:

Same context of interval scheduling. Schedule all the events, but in as few rooms as possible.

Order the events by start time. Think of the rooms as numbered \( 1 ... k \). The greedy algorithm will schedule each event in the first unoccupied room. Here "first" means numerically smallest.

Note we can determine a lower bound on the number of rooms.

Lemma: If at any time \( t \), if \( k \) events are occurring at time \ t \), then any schedule must use at least \( k \) rooms. Proof is obvious.

We want to show the greedy algorithm achieves this bound. We'll show that if the greedy solution uses \( k \) rooms, there is a time with \( k \) simultaneuous events.

Let \( t \) be any time when the GA assigns an event to room \( k \) (the last room). At this time, all the smaller rooms were occupied. Therefore, \( k \) events (1 for all the smaller rooms plus the event now) are all going on at time \( t \).

By this achieves-the-bound lemma, the GA uses \( k \) rooms, and any other solution uses at least \( k \) rooms, so the GA is optimal.

To implement efficiently: Sort by start time. Keep track of stop times for the rooms we've used so far. Order occupied rooms by smallest finish time. Order unoccupied rooms by room number. Can use min heaps for both.

When a new event starts, DeleteMin from occupied heap and move to unoccupied heap. Top > start time.

Can also: When a new event starts, check the min occupied room. If we can schedule this event here, add it to this room and resort in heap. Otherwise take room from unoccupied heap.