[SOLVED] 代写 algorithm String Algorithms

30 $

File Name: 代写_algorithm_String_Algorithms.zip
File Size: 320.28 KB

SKU: 9351047162 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


String Algorithms
Jaehyun Park CS 97SI
Stanford University June 30, 2015

String Matching Problem
Hash Table
Knuth-Morris-Pratt (KMP) Algorithm Suffix Trie
Suffix Array
Outline
String Matching Problem 2

String Matching Problem
◮ Given a text T and a pattern P, find all occurrences of P within T
◮ Notations:
– nandm: lengthsofP andT
– Σ: set of alphabets (of constant size) – Pi: ith letter of P (1-indexed)
– a, b, c: single letters in Σ
– x, y, z: strings
String Matching Problem 3

Example
◮ T = AGCATGCTGCAGTCATGCTTAGGCTA ◮ P = GCT
◮ P appears three times in T
◮ A naive method takes O(mn) time
– Initiate string comparison at every starting point – Each comparison takes O(m) time
◮ We can do much better!
String Matching Problem 4

String Matching Problem
Hash Table
Knuth-Morris-Pratt (KMP) Algorithm Suffix Trie
Suffix Array
Outline
Hash Table 5

Hash Function
◮ A function that takes a string and outputs a number ◮ A good hash function has few collisions
– i.e., If x ̸= y, H(x) ̸= H(y) with high probability
◮ An easy and powerful hash function is a polynomial mod some prime p
– Consider each letter as a number (ASCII value is fine)
– H(x1 …xk) = x1ak−1 +x2ak−2 +···+xk−1a+xk (mod p) – How do we find H(x2 …xk+1) from H(x1 …xk)?
Hash Table 6

Hash Table
◮ Main idea: preprocess T to speedup queries – Hash every substring of length k
– k is a small constant
◮ For each query P , hash the first k letters of P to retrieve all the occurrences of it within T
◮ Don’t forget to check collisions!
Hash Table 7

◮ Pros:
– Easy to implement
Hash Table
– Significant speedup in practice
◮ Cons:
– Doesn’t help the asymptotic efficiency
◮ Can still take Θ(nm) time if hashing is terrible or data is difficult
– A lot of memory consumption
Hash Table 8

Outline
String Matching Problem
Hash Table
Knuth-Morris-Pratt (KMP) Algorithm Suffix Trie
Suffix Array
Knuth-Morris-Pratt (KMP) Algorithm 9

Knuth-Morris-Pratt (KMP) Matcher
◮ A linear time (!) algorithm that solves the string matching problem by preprocessing P in Θ(m) time
– Main idea is to skip some comparisons by using the previous comparison result
◮ Uses an auxiliary array π that is defined as the following:
– π[i] is the largest integer smaller than i such that P1 . . . Pπ[i] is
a suffix of P1 …Pi
◮ … It’s better to see an example than the definition
Knuth-Morris-Pratt (KMP) Algorithm 10

π Table Example (from CLRS)
◮ π[i]isthelargestintegersmallerthanisuchthatP1…Pπ[i] is a suffix of P1 …Pi
– e.g., π[6] = 4 since abab is a suffix of ababab
– e.g., π[9] = 0 since no prefix of length ≤ 8 ends with c
◮ Let’s see why this is useful
Knuth-Morris-Pratt (KMP) Algorithm 11

Using the π Table
◮ T = ABC ABCDAB ABCDABCDABDE
◮ P = ABCDABD
◮ π = (0,0,0,0,1,2,0)
◮ Start matching at the first position of T:
◮ Mismatchatthe4thletterofP!
Knuth-Morris-Pratt (KMP) Algorithm 12

Using the π Table
◮ Wematchedk=3letterssofar,andπ[k]=0
– Thus, there is no point in starting the comparison at T2, T3
(crucial observation)
◮ ShiftP byk−π[k]=3letters
◮ Mismatch at T4 again!
Knuth-Morris-Pratt (KMP) Algorithm 13

◮ Mismatch at T11!
Using the π Table
◮ We matched k = 0 letters so far
◮ ShiftP byk−π[k]=1letter(wedefineπ[0]=−1)
Knuth-Morris-Pratt (KMP) Algorithm 14

Using the π Table
◮ π[6]=2meansP1P2 isasuffixofP1…P6
◮ ShiftP by6−π[6]=4letters
◮ Again, no point in shifting P by 1, 2, or 3 letters Knuth-Morris-Pratt (KMP) Algorithm 15

◮ Mismatch at T11 again!
◮ Currently 2 letters are matched ◮ ShiftP by2−π[2]=2letters
Using the π Table
Knuth-Morris-Pratt (KMP) Algorithm 16

◮ Mismatch at T11 yet again!
◮ Currently no letters are matched ◮ ShiftP by0−π[0]=1letter
Using the π Table
Knuth-Morris-Pratt (KMP) Algorithm 17

◮ Mismatch at T18
◮ Currently 6 letters are matched ◮ ShiftP by6−π[6]=4letters
Using the π Table
Knuth-Morris-Pratt (KMP) Algorithm 18

Using the π Table ◮ Finally, there it is!
◮ Currently all 7 letters are matched
◮ After recording this match (at T16 . . . T22, we shift P again in order to find other matches
– Shiftby7−π[7]=7letters
Knuth-Morris-Pratt (KMP) Algorithm 19

Computing π
◮ Observation 1: if P1 …Pπ[i] is a suffix of P1 …Pi, then P1 …Pπ[i]−1 is a suffix of P1 …Pi−1
– Well, obviously…
◮ Observation 2: all the prefixes of P that are a suffix of P1 . . . Pi can be obtained by recursively applying π to i
– e.g., P1 …Pπ[i], P1 …,Pπ[π[i]], P1 …,Pπ[π[π[i]]] are all suffixes of P1 …Pi
Knuth-Morris-Pratt (KMP) Algorithm 20

Computing π
◮ A non-obvious conclusion:
– First, let’s write π(k)[i] as π[·] applied k times to i
– e.g., π(2)[i] = π[π[i]]
– π[i] is equal to π(k)[i − 1] + 1, where k is the smallest integer
that satisfies Pπ(k)[i−1]+1 = Pi
◮ Ifthereisnosuchk,π[i]=0
◮ Intuition: we look at all the prefixes of P that are suffixes of P1 . . . Pi−1, and find the longest one whose next letter matches Pi
Knuth-Morris-Pratt (KMP) Algorithm 21

Implementation
pi[0] = -1;
int k = -1;
for(int i = 1; i <= m; i++) {while(k >= 0 && P[k+1] != P[i])
k = pi[k];
pi[i] = ++k; }
Knuth-Morris-Pratt (KMP) Algorithm 22

Pattern Matching Implementation
int k = 0;
for(int i = 1; i <= n; i++) {while(k >= 0 && P[k+1] != T[i])
k = pi[k];
k++;
if(k == m) {
// P matches T[i-m+1..i]
k = pi[k]; }
}
Knuth-Morris-Pratt (KMP) Algorithm 23

Outline
String Matching Problem
Hash Table
Knuth-Morris-Pratt (KMP) Algorithm Suffix Trie
Suffix Array
Suffix Trie 24

Suffix Trie
◮ Suffix trie of a string T is a rooted tree that stores all the suffixes (thus all the substrings)
◮ Each node corresponds to some substring of T
◮ Each edge is associated with an alphabet
◮ For each node that corresponds to ax, there is a special pointer called suffix link that leads to the node corresponding to x
◮ Surprisingly easy to implement!
Suffix Trie 25

Suffix Trie
26
Example
(Figure modified from Ukkonen’s original paper)

Incremental Construction
◮ Given the suffix tree for T1 …Tn
– Then we append Tn+1 = a to T , creating necessary nodes
◮ StartatnodeucorrespondingtoT1…Tn – Create an a-transition to a new node v
◮ Take the suffix link at u to go to u′, corresponding to T2 …Tn
– Create an a-transition to a new node v′ – Create a suffix link from v to v′
Suffix Trie 27

Suffix Trie
28
Incremental Construction
◮ Repeat the previous process:
– Take the suffix link at the current node
– Make a new a-transition there
– Create the suffix link from the previous node
◮ Stop if the node already has an a-transition
– Because from this point, all nodes that are reachable via suffix links already have an a-transition

Suffix Trie
29
Construction Example
Given the suffix trie for aba We want to add a new letter c

Suffix Trie
30
Construction Example

Suffix Trie
31
Construction Example

Suffix Trie
32
Construction Example

Suffix Trie
33
Construction Example

Suffix Trie
34
Construction Example

Construction Example
◮ Construction time is linear in the tree size ◮ But the tree size can be quadratic in n
– e.g., T = aa…abb…b
Suffix Trie 35

Construction Example
◮ To find P, start at the root and keep following edges labeled with P1, P2, etc.
◮ Got stuck? Then P doesn’t exist in T
Suffix Trie 36

Outline
String Matching Problem
Hash Table
Knuth-Morris-Pratt (KMP) Algorithm Suffix Trie
Suffix Array
Suffix Array 37

Suffix Array
38
Suffix Array

Suffix Array
39
– –
If you want to see how it works, read the paper on the course website http://cs97si.stanford.edu/suffix-array.pdf
Suffix Array
◮ Memory usage is O(n)
◮ Has the same computational power as suffix trie ◮ Can be constructed in O(n) time (!)
– But it’s hard to implement
◮ ThereisanapproachableO(nlog2n)algorithm

Notes on String Problems
◮ Always be aware of the null-terminators
◮ Simple hash works so well in many problems
◮ If a problem involves rotations of some string, consider concatenating it with itself and see if it helps
◮ Stanford team notebook has implementations of suffix arrays and the KMP matcher
Suffix Array 40

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 代写 algorithm String Algorithms
30 $