32-1 String matching based on repetition factors
Let yi denote the concatenation of string y with itself i times. For example, (ab)3=ababab. We say that a string x∈Σ∗ has repetition factor r if x=yr for some string y∈Σ∗ and some r>0. Let ρ(x) denote the largest r such that x has repetition factor r.
a. Give an efficient algorithm that takes as input a pattern P[1…m] and computes the value ρ(Pi) for i=1,2,…,m. What is the running time of your algorithm?
b. For any pattern P[1…m], let ρ∗(P) be defined as max1≤i≤mρ(Pi). Prove that if the pattern P is chosen randomly from the set of all binary strings of length m, then the expected value of ρ∗(P) is O(1).
c. Argue that the following string-matching algorithm correctly finds all occurrences of pattern P in a text T[1…n] in time O(ρ∗(P)n+m):
cpp
REPETITION_MATCHER(P, T)
m = P.length
n = T.length
k = 1 + ρ*(P)
q = 0
s = 0
while s ≤ n - m
if T[s + q + 1] == P[q + 1]
q = q + 1
if q == m
print "Pattern occurs with shift" s
if q == m or T[s + q + 1] != P[q + 1]
s = s + max(1, ceil(q / k))
q = 0
This algorithm is due to Galil and Seiferas. By extending these ideas greatly, they obtained a linear-time string-matching algorithm that uses only O(1) storage beyond what is required for P and T.
a. Compute π, let l=m−π[m], if m mod l=0 and for all p=m−i⋅l>0, p−π[p]=l, then ρ(Pi)=m/l, otherwise ρ(Pi)=1. The running time is Θ(n).
b.
P(ρ∗(P)≥2)P(ρ∗(P)≥3)P(ρ∗(P)≥4)P(ρ∗(P)=1)P(ρ∗(P)=2)P(ρ∗(P)=3)E[ρ∗(P)]=21+81+321+⋯≈32=41+321+2561+⋯≈72=81+1281+20481+⋯≈152=31=218=10516=1⋅31+2⋅218+3⋅10516+…≈2.21
c.
(Omit!)