32-1 String matching based on repetition factors

Let yiy^i denote the concatenation of string yy with itself ii times. For example, (ab)3=ababab(\text{ab})^3 = \text{ababab}. We say that a string xΣx \in \Sigma^* has repetition factor rr if x=yrx = y ^ r for some string yΣy \in \Sigma^* and some r>0r > 0. Let ρ(x)\rho(x) denote the largest rr such that xx has repetition factor rr.

a. Give an efficient algorithm that takes as input a pattern P[1m]P[1 \ldots m] and computes the value ρ(Pi)\rho(P_i) for i=1,2,,mi = 1, 2, \ldots, m. What is the running time of your algorithm?

b. For any pattern P[1m]P[1 \ldots m], let ρ(P)\rho^*(P) be defined as max1imρ(Pi)\max_{1 \le i \le m} \rho(P_i). Prove that if the pattern PP is chosen randomly from the set of all binary strings of length mm, then the expected value of ρ(P)\rho^*(P) is O(1)O(1).

c. Argue that the following string-matching algorithm correctly finds all occurrences of pattern PP in a text T[1n]T[1 \ldots n] in time O(ρ(P)n+m)O(\rho^*(P)n + m):

cpp REPETITION_MATCHER(P, T) m = P.length n = T.length k = 1 + ρ*(P) q = 0 s = 0 while s ≤ n - m if T[s + q + 1] == P[q + 1] q = q + 1 if q == m print "Pattern occurs with shift" s if q == m or T[s + q + 1] != P[q + 1] s = s + max(1, ceil(q / k)) q = 0

This algorithm is due to Galil and Seiferas. By extending these ideas greatly, they obtained a linear-time string-matching algorithm that uses only O(1)O(1) storage beyond what is required for PP and TT.

a. Compute π\pi, let l=mπ[m]l = m - \pi[m], if m mod l=0m ~\text{mod}~ l = 0 and for all p=mil>0p = m - i \cdot l > 0, pπ[p]=lp - \pi[p] = l, then ρ(Pi)=m/l\rho(P_i) = m / l, otherwise ρ(Pi)=1\rho(P_i) = 1. The running time is Θ(n)\Theta(n).

b.

P(ρ(P)2)=12+18+132+23P(ρ(P)3)=14+132+1256+27P(ρ(P)4)=18+1128+12048+215P(ρ(P)=1)=13P(ρ(P)=2)=821P(ρ(P)=3)=16105E[ρ(P)]=113+2821+316105+2.21 \begin{aligned} P(\rho^*(P) \ge 2) & = \frac{1}{2} + \frac{1}{8} + \frac{1}{32} + \cdots \approx \frac{2}{3} \\ P(\rho^*(P) \ge 3) & = \frac{1}{4} + \frac{1}{32} + \frac{1}{256} + \cdots \approx \frac{2}{7} \\ P(\rho^*(P) \ge 4) & = \frac{1}{8} + \frac{1}{128} + \frac{1}{2048} + \cdots \approx \frac{2}{15} \\ P(\rho^*(P) = 1) & = \frac{1}{3} \\ P(\rho^*(P) = 2) & = \frac{8}{21} \\ P(\rho^*(P) = 3) & = \frac{16}{105} \\ \text E[\rho^*(P)] & = 1 \cdot \frac{1}{3} + 2 \cdot \frac{8}{21} + 3 \cdot \frac{16}{105} + \ldots \approx 2.21 \end{aligned}

c.

(Omit!)