[SOLVED] CS computational biology algorithm Lecture 6:

$25

File Name: CS_computational_biology_algorithm_Lecture_6:.zip
File Size: 423.9 KB

5/5 - (1 vote)

Lecture 6:
Dynamic Programming I (Adv)
The University of Sydney
Page 1
Fast Fourier Transform

Dynamic Programming Summary
1Ddynamicprogramming
Weightedintervalscheduling
SegmentedLeastSquares
Maximum-sumcontiguoussubarray Longestincreasingsubsequence
2Ddynamicprogramming Knapsack
Sequencealignment
Dynamicprogrammingoverintervals
RNASecondaryStructure
Dynamicprogrammingoversubsets TSP
k-path Playlist
The University of Sydney
Page 2

How does Google know what I want?
The University of Sydney Page 3

How does Google know what I want?
The University of Sydney Page 4

Edit distance
How similar are two strings? S = ocurrance
T = occurrence
o
c

u
r
r
a
n
c
e
o
c
c
u
r
r
e
n
c
e
1 mismatch, 1 gap
Modify S to get T by adding c, swapping a-e The University of Sydney
Page 5

Edit distance
How similar are two strings? S = ocurrance
T = occurrence
o
c

u
r
r

a
n
c
e
o
c
c
u
r
r
e

n
c
e
0 mismatches, 3 gaps
Modify S to get T by adding c, e, deleting e The University of Sydney
Page 6

Edit distance
How similar are two strings? S = ocurrance
T = occurrence
o
c
u
r
r
a
n
c
e

o
c
c
u
r
r
e
n
c
e
5 mismatches, 1 gap
Modify S to get T by swapping u-c, r-u, a-r, n-e, c-n, e-c, adding e
The University of Sydney Page 7

Which edit is better?
o
c

u
r
r
a
n
c
e
o
c
c
u
r
r
e
n
c
e
1 mismatch, 1 gap
o
c

u
r
r

a
n
c
e
o
c
c
u
r
r
e

n
c
e
0 mismatches, 3 gaps
o
c
u
r
r
a
n
c
e

o
c
c
u
r
r
e
n
c
e
The University of Sydney
5 mismatches, 1 gap
Page 8

Edit Distance
Applications.
Basis for Unix diff.
Speech recognition.
Computational biology.
Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] Gap penalty d and mismatch penalty apq.
Cost = sum of gap and mismatch penalties.
C
T
G
A
C
C
T
A
C
C
T

C
T
G
A
C
C
T
A
C
C
T
C
C
T
G
A
C
T
A
C
A
T
C
C
T
G
A
C

T
A
C
A
T
aTC + aGT + aAG+ 2aCA
The University of Sydney Page 9
2d + aCA

Sequence Alignment
Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn find alignment of minimum cost.
Definition: An alignment M is a set of ordered pairs xi-yj such that each item occurs in at most one pair and no crossings.
Definition: The pair xi-yj and xa-yb cross if i < a, but j > b.
Example: CTACCG vs. TACATG.
x1 x2 x3 x4 x5 x6
C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney M = x2-y1, x3-y2, x4-y3, x5-y4, x6-y6. Page 10
y1 y2 y3 y4 y5 y6

Sequence Alignment
cost(M) = Saxi,yj + Sd + Sd (xi,yj)IM i: xi unmatched j: yj unmatched
mismatch Example: CTACCG vs. TACATG.
x1 x2 x3 x4 x5 x6
C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney
Page 11
y1 y2 y3 y4 y5 y6
M = x2-y1, x3-y2, x4-y3, x5-y4, x6-y6.

Key steps: Dynamic programming
1. Define subproblems 2. Find recurrences
3. Solve the base cases
4. Transform recurrence into an efficient algorithm
The University of Sydney Page 12

Sequence Alignment: Problem Structure
Step 1: Define subproblems ???
x1 x2 x3 x4 x5 xi

C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney
Page 13
y1 y2 y3 y4 y5 yj

Sequence Alignment: Problem Structure Step 1: Define subproblems
OPT(i, j) =
min cost of aligning strings x1 x2 . . . xi and y1 y2 yj.
x1 x2 x3 x4 x5 xi

C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney
Page 14
y1 y2 y3 y4 y5 yj

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = Step 2: Find recurrences
min cost of aligning strings x1 x2 . . . xi and y1 y2 yj.
x1 x2 x3 x4 x5 xi
C
T
A
C
C
G
T
A
C
A
T
G
The University of Sydney
Page 15
y1 y2 y3 y4 y5 yj

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 yj.
x1 x2 x3 x4 x5 xi
C
T
A
C
C
G
Step 2: Find recurrences
Case 1: OPT matches xi-yj.
pay mismatch for xi-yj + min cost of aligning twostringsx1 x2 xi-1 andy1 y2 yj-1
y1 y2 y3 y4
y5 yj
T
A
C
A
T
G
The University of Sydney
Page 16

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 yj.
x1 x2 x3 x4 x5 xi
Step 2: Find recurrences
Case 1: OPT matches xi-yj. y1 y2 y3 y4 y5 yj pay mismatch for xi-yj + min cost of aligning
twostringsx1 x2 xi-1 andy1 y2 yj-1 Case 2a: OPT leaves xi unmatched.
paygapforxi andmincostofaligningx1 x2 xi-1 andy1 y2 yj
C
T
A
C
C
G
T
A
C
A
T
G

The University of Sydney Page 17

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 yj.
x1 x2 x3 x4 x5 xi
Step 2: Find recurrences
Case 1: OPT matches xi-yj. y1 y2 y3 y4 y5 yj pay mismatch for xi-yj + min cost of aligning
twostringsx1 x2 xi-1 andy1 y2 yj-1 Case 2a: OPT leaves xi unmatched.
paygapforxi andmincostofaligningx1 x2 xi-1 andy1 y2 yj Case 2b: OPT leaves yj unmatched.
paygapforyj andmincostofaligningx1 x2 xi andy1 y2 yj-1 The University of Sydney Page 18
C
T
A
C
C
G

T
A
C
A
T
G

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 yj.
Case 1: OPT matches xi-yj.
pay mismatch for xi-yj + min cost of aligning
twostringsx1 x2 xi-1 andy1 y2 yj-1 Case 2a: OPT leaves xi unmatched.
paygapforxi andmincostofaligningx1 x2 xi-1 andy1 y2 yj Case 2b: OPT leaves yj unmatched.
paygapforyj andmincostofaligningx1 x2 xi andy1 y2 yj-1
OPT(i, j) = min
axiyj + OPT(i-1,j-1) d + OPT(i-1, j)
d + OPT(i, j-1)
The University of Sydney Page 19

Sequence Alignment: Problem Structure Step 3: Solve the base cases
OPT(0,j) = jd and OPT(i,0) = id
OPT(i,j) = jd if i=0 OPI(i, j) = OPT(i,j) = id if j=0
min{axiyj + OPT(i-1, j-1),d + OPT(i-1, j),d + OPT(i, j-1)} otherwise
The University of Sydney
Page 20

Sequence Alignment: Algorithm
Sequence-Alignment(m, n, x1x2xm, y1y2yn, d, a) { for i = 0 to m
M[0, i] = id for j = 0 to n
M[j, 0] = jd
for i = 1 to m
for j = 1 to n
M[i, j] = min{a[xi, yj] + M[i-1, j-1], d + M[i-1, j],
return M[m, n]
}
d + M[i, j-1]}
Analysis. Q(mn) time and space.
English words or sentences: m, n 100.
Computational biology: m = n = 100,000. 10 billions ops OK, but 10GB array?
The University of Sydney
Page 21

Sequence Alignment: Algorithm
Sequence-Alignment(m, n, x1x2xm, y1y2yn, d, a) { for i = 0 to m
M[0, i] = id for j = 0 to n
M[j, 0] = jd
for i = 1 to m
for j = 1 to n
M[i, j] = min{a[xi, yj] + M[i-1, j-1], d + M[i-1, j],
return M[m, n]
}
d + M[i, j-1]}
To get the alignment itself we can trace back through the array M.
The University of Sydney Page 22

Sequence Alignment: Linear Space
Question: Can we avoid using quadratic space?
The University of Sydney Page 23

Sequence Alignment: Linear Space
Question: Can we avoid using quadratic space?
Easy. Optimal value in O(m + n) space and O(mn) time.
Compute OPT(i, j) from OPT(i-1, j), OPT(i-1, j-1) and OPT(i, j-1). BUT! No longer a simple way to recover alignment itself.
The University of Sydney
Page 24

Sequence Alignment: Linear Space
Question: Can we avoid using quadratic space?
Easy. Optimal value in O(m + n) space and O(mn) time.
Compute OPT(i, j) from OPT(i-1, j), OPT(i-1, j-1) and OPT(i, j-1). BUT! No longer a simple way to recover alignment itself.
Theorem: [Hirschberg 1975] Optimal alignment in O(m + n) space and O(mn) time.
Clever combination of divide-and-conquer and dynamic programming. Inspired by idea of Savitch from complexity theory.
The University of Sydney Page 25

Sequence Alignment: Linear Space
Edit distance graph.
m n grid graph GXY (as shown in the figure)
Horizontal/vertical edges have cost d (gap)
Diagonal edges from (i-1, j-1) to (i,j) have cost axiyj. (match)
e y1 y2 y3 y4 y5 y6 0-0
axiyj d
d i-j
e
x1 x2
x3
m-n
The University of Sydney
Page 26

Sequence Alignment: Linear Space
Edit distance graph.
Let f(i, j) be cheapest path from (0,0) to (i, j). Observation: f(i, j) = OPT(i, j).
e y1 y2 y3 y4 y5 y6 0-0
axiyj d
d i-j
e
x1 x2
x3
m-n
The University of Sydney
Page 27

Sequence Alignment: Linear Space
min{axiyj + OPT(i-1, j-1),d + OPT(i-1, j),d + OPT(i, j-1)}
e
x1 x2
x3
e y1 y2 y3 y4 y5 y6 0-0
axiyj d
d i-j
m-n
The University of Sydney
Page 28

Sequence Alignment: Linear Space
Edit distance graph.
Let f(i, j) be cheapest path from (0,0) to (i, j).
Can compute f(m,n) in O(mn) time and O(mn) space.
j
y4
i-j
e y1 y2 y3 0-0
x1 x2
y5 y6
e
x3
m-n
The University of Sydney
Page 29

Sequence Alignment: Linear Space
Edit distance graph.
Let f(i, j) be cheapest path from (0,0) to (i, j).
If only interested in the value of the optimal alignment we do it in O(mn)
time and O(m + n) space.
e y1 y2 y3 0-0
x1 x2
j
y4
i-j
y5 y6
e
x3
m-n
The University of Sydney
Page 30

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS computational biology algorithm Lecture 6:
$25