[SOLVED] CS Bioinformatics algorithm Lecture 7: Dynamic Programming II

$25

File Name: CS_Bioinformatics_algorithm_Lecture_7:_Dynamic_Programming_II.zip
File Size: 574.62 KB

5/5 - (1 vote)

Lecture 7: Dynamic Programming II
The University of Sydney Page 1

Changes to the unit
More resources
Recorded tutorials
Online office hours (Tue 4-6, Fri 2-4). Contact me beforehand on Slack. Maybe available at other times as well
Assignments
Scale back in both difficulty and workload
Final Exam
Online, 2 hours long on ProctorU
Some knowledge-based questions, some algorithm design problems, easier than assignments
The University of Sydney
Page 2

How do I get good at problem solving?
Work on lots of problems, especially the tutorial ones
Present your solution during tutorial or on Ed to get
feedback
Feel free to ask about solution to other problems, such as those from Jeff Ericksons textbook
Approaches:
Make simplifying assumptions, solve special
cases
Reuse computation (esp. useful for D&C) Reformulate problem
The University of Sydney
Page 3

General techniques in this course
Greedy algorithms [Lecture 2]
Divide & Conquer algorithms [Lecture 3]
Dynamic programming algorithms [Lectures 4 and today] Network flow algorithms [Lectures 6 and 7]
The University of Sydney Page 4

Algorithmic Paradigms
Greedy. Build up a solution incrementally, myopically optimizing some local criterion.
Divide-and-conquer. Break up a problem into two sub- problems, solve each sub-problem independently, and combine solution to sub-problems to form solution to original problem.
Dynamic programming. Break up a problem into a series of overlapping sub-problems, and build up solutions to larger and larger sub-problems.
The University of Sydney Page 5

Main Idea: Dynamic Programming
Recurrence equation relating optimal solution in terms optimal solutions to smaller subproblems
Given optimal solutions to smaller subproblems, how to construct solution to original problem?
The University of Sydney
Page 6

Key steps: Dynamic programming
1. Define subproblems 2. Find recurrences
3. Solve the base cases
4. Transform recurrence into an efficient algorithm
The University of Sydney Page 7

Recap: Weighted Interval Scheduling
Job j starts at sj, finishes at fj, and has weight vj.
Two jobs compatible if they dont overlap.
Goal: find maximum weight subset of mutually compatible jobs.
b
a
c
d
e
f
g
h
0 1 2 3 4 5 6 7 8 9 10 11 The University of Sydney
Time
Page 8

Recap: Weighted Interval Scheduling
Notation. Label jobs by finishing time: f1 f2 . . . fn .
Def. p(j) = largest index i M[j-1])
print j
Find-Solution(p(j))
else
Find-Solution(j-1)
}
# of recursive calls n O(n). The University of Sydney
picked job j
16
Page 16

Recap: Knapsack Problem
Knapsackproblem.
Givennobjectsandaknapsack.
Item i weighs wi > 0 kilograms and has value vi > 0. KnapsackhascapacityofWkilograms.
Goal: fill knapsack so as to maximize total value.
W = 11
Item
Value
Weight
1
1
1
2
6
2
3
18
5
4
22
6
5
28
7
The University of Sydney
Page 17

Recap: Dynamic Programming Step 1 Step 1: Define subproblems
OPT(i, w) = max profit with subset of items 1, , i with weight limit w.
The University of Sydney
Page 18

Recap: Dynamic Programming Step 2 Step 2: Find recurrences
Case 1: OPT does not select item i.
OPT selects best of { 1, 2, , i-1 } using weight limit w
Case 2: OPT selects item i.
newweightlimit=wwi
OPT selects best of { 1, 2, , i1 } using this new weight limit
OPT(i,w) = max { vi + OPT (i-1,w-wi), OPT(i-1,w) }
The University of Sydney
Page 19

Recap: Dynamic Programming Step 2 Step 2: Find recurrences
Case 1: OPT does not select item i.
OPT selects best of { 1, 2, , i-1 } using weight limit w
Case 2: OPT selects item i.
newweightlimit=wwi
OPT selects best of { 1, 2, , i1 } using this new weight limit
If wi>w
OPT(i,w) = OPT (i-1,w)
otherwise
OPT(i,w) = max { vi + OPT (i-1,w-wi), OPT(i-1,w) }
The University of Sydney
Page 20

Recap: Dynamic Programming Step 3 Step 3: Solve the base cases
OPT(0,w) = 0
i0 if i=0 OPT(i,w)=iiOPT(i-1,w) if i>0andwi >w
iimax{OPT(i-1,w),vi+OPT(i-1,w-wi)} otherwise
The University of Sydney
Page 21
Donemore or less

Knapsack Problem: Bottom-Up
Knapsack. Fill up an (n+1)-by-(W+1) array. Space complexity: O(nW).
Input: n, w1,,wN, v1,,vN for w = 0 to W
M[0, w] = 0
for i = 1 to n
for w = 1 to W
if (wi > w)
M[i, w] = M[i-1, w]
else
M[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]} return M[n, W]
The University of Sydney
Page 22

n+1
W+1
0 1 2 3 4 5 6 7 8 9 10 11
000000000000 {1} 011111111111 {1,2} 016777777777
{1,2,3} 0 1 6 7 7 18 19 24 25 25 25 25
Knapsack Algorithm
The University of Sydney
Page 23
{1,2,3,4} {1,2,3,4,5}
0 1 6 7 7 18 22 24 28 29 29 40 0 1 6 7 7 18 22 28 29 34 34 40
OPT: {4,3}
value = 22 + 18 = 40
Item Value Weight
111 262 W=11 3 18 5 4 22 6
5 28 7

Longest Common Subsequence
Given two sequences X[1..n] and Y[1..m], find the longest subsequences X of X and Y of Y such that X = Y.
Example 1 X = BANANAS Y = KATANA
LCS(X,Y) = AANA
Example 2 X = DYNAMIC
Y = PROGRAMMING
LCS(X,Y) = AMI The University of Sydney
Page 24

Longest Common Subsequence: Matching Definition
Given two sequences X[1..n] and Y[1..m], find the longest non- crossing matching between elements of X and elements of Y
Example 1 X = BANANAS Y = KATANA
LCS(X,Y) = AANA. Matching: X[2] Y[2], X[4] Y[4], X[5] Y[5], X[6] Y[6].
Example 2 X = DYNAMIC
Y = PROGRAMMING
LCS(X,Y) = AMI. Matching: X[4] Y[6], X[5] Y[7], X[6] Y[8],
Note: if crossings allowed, then can add X[3] Y[10]. The University of Sydney
Page 25

Longest Common Subsequence: Applications
Measures similarity of two strings Bioinformatics
Merging in version control
The University of Sydney Page 26

LCS Dynamic Programming: Step 1 Step 1: Define subproblems
OPT(i,j) = length of LCS(X[1..i], Y[1..j]). Subproblems: finding LCS of prefixes of X and Y
Example
X = BANANAS Y = KATANA
OPT(6, 4) = length of LCS(BANANA, KATA) = 2 The University of Sydney
Page 28

LCS Dynamic Programming: Step 2
Notations:
OPT(i,j) = length of LCS(X[1..i], Y[1..j]). Step 2: Finding recurrences
Case 1: X[i] = Y[j].
Leave X[i] unmatched and take LCS(X[1..i-1], Y[1..j]). OR Leave Y[j] unmatched and take LCS(X[1..i], Y[1..j-1]).
OPT(i,j)= max{OPT(i-1,j), OPT(i,j-1)}
Example
X[1..5] = BANAN Y[1..6] = KATANA
X[5] unmatched: LCS = LCS(X[1..4], Y[1..6]) = ANA
Y[6] unmatched: LCS = LCS(X[1..5], Y[1..5]) = AAN The University of Sydney
Page 29

LCS Dynamic Programming: Step 2
Notations:
OPT(i,j) = length of LCS(X[1..i], Y[1..j]). Step 2: Finding recurrences
Case 1: X[i] = Y[j].
Leave X[i] unmatched and take LCS(X[1..i-1], Y[1..j]). OR Leave Y[j] unmatched and take LCS(X[1..i], Y[1..j-1]).
OPT(i,j)= max{OPT(i-1,j), OPT(i,j-1)}
Example
X[1..5] = BANAN Y[1..6] = KATANA
X[5] unmatched: LCS = LCS(X[1..4], Y[1..6]) = ANA
Y[6] unmatched: LCS = LCS(X[1..5], Y[1..5]) = AAN
The University of Sydney
Page 30

LCS Dynamic Programming: Step 2
Notations:
OPT(i,j) = length of LCS(X[1..i], Y[1..j]). Step 2: Finding recurrences
Case 1: X[i] = Y[j].
Leave X[i] unmatched and take LCS(X[1..i-1], Y[1..j]). OR Leave Y[j] unmatched and take LCS(X[1..i], Y[1..j-1]).
OPT(i,j)= max{OPT(i-1,j), OPT(i,j-1)}
Example
X[1..3] = BAN Y[1..6] = KATANA
X[3] unmatched: LCS(X[1..2], Y[1..6]) = A
Y[6] unmatched: LCS(X[1..3], Y[1..5]) = AN The University of Sydney
Page 31

LCS Dynamic Programming: Step 2
Notations:
OPT(i,j) = length of LCS(X[1..i], Y[1..j]). Step 2: Finding recurrences
Case 1: X[i] = Y[j].
Leave X[i] unmatched and take LCS(X[1..i-1], Y[1..j]). OR Leave Y[j] unmatched and take LCS(X[1..i], Y[1..j-1]).
OPT(i,j)= max{OPT(i-1,j), OPT(i,j-1)}
Example
X[1..3] = BAN Y[1..6] = KATANA
X[3] unmatched: LCS(X[1..2], Y[1..6]) = A
Y[6] unmatched: LCS(X[1..3], Y[1..5]) = AN
The University of Sydney
Page 32

LCS Dynamic Programming: Step 2
Notations:
OPT(i,j) = length of LCS(X[1..i], Y[1..j]). Step 2: Finding recurrences
Case 2: X[i] = Y[j].
Match X[i] to Y[j] and take LCS(X[1..i-1], Y[1..j-1]) + X[i] OR
Leave X[i] unmatched or Y[j] unmatched
OPT(i,j) = max{OPT(i-1, j-1) + 1, OPT(i-1, j), OPT(i, j-1)
Example
X[1..6] = BANANA Y[1..6] = KATANA
Match X[6] Y[6]: LCS = LCS(X[1..5], Y[1..5]) + A = AANA
X[6] unmatched: LCS = LCS(X[1..5], Y[1..6]) = AAN
Y[6] unmatched: LCS = LCS(X[1..6], Y[1..5]) = AAN The University of Sydney
Page 33

LCS Dynamic Programming: Step 3 Step 3: Solving the base cases
OPT(i,0) = 0 for all i, OPT(0,j) = 0 for all j
8>0 if i = 0 or j = 0 <OPT(i,j)= max{OPT(i1,j1)+1,OPT(i,j1),OPT(i1,j)} ifX[i]=Y[j] >
max{OPT(i, j 1), OPT(i 1, j)} if X[i] 6= Y [j]
:
The University of Sydney Page 34

LCS Dynamic Programming: Algorithm
INPUT: n, m, X[1..n], Y[1..m]
for j = 0 to m
M[0,j] = 0
for i = 0 to n
M[i,0] = 0
for i = 1 to n
for j = 1 to m
if (X[i] = Y[j])
M[i,j] = max(M[i-1,j-1] + 1, M[i-1,j], M[i,j-1])
else
M[i,j] = max(M[i-1,j], M[i,j-1])
return M[n,m]
8><>:0 if i = 0 or j = 0 OPT(i,j)= max{OPT(i1,j1)+1,OPT(i,j1),OPT(i1,j)} ifX[i]=Y[j] max{OPT(i, j 1), OPT(i 1, j)} if X[i] 6= Y [j]
The University of Sydney Page 35

LCS Dynamic Programming: Analysis
INPUT: n, m, X[1..n], Y[1..m]
for j = 0 to m
M[0,j] = 0
for i = 0 to n
M[i,0] = 0
for i = 1 to n
for j = 1 to m
O(n + m)
O(nm)
if (X[i] = Y[j])
M[i,j] = max(M[i-1,j-1] + 1, M[i-1,j], M[i,j-1])
else
M[i,j] = max(M[i-1,j], M[i,j-1])
return M[n,m] Running time: O(nm)
Space: O(nm)
See 3927 Lecture 6 for Sequence Alignment
The University of Sydney
Page 36

6.5 RNA Secondary Structure
Dynamic programming over intervals
The University of Sydney Page 37

RNA (Ribonucleic acid) Secondary Structure
RNA. String B = b1b2bn over alphabet { A, C, G, U }.
Secondary structure. RNA is single-stranded so it tends to loop
back and form base pairs with itself. This structure is essential
for understanding behavior of molecule.
CA AA AU
GC CGUAA G
Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA UGAUUA
G
ACGCU CGCGAGC
G
G AU
The University of Sydney
complementary base pairs: A-U, C-G G
Page 38

RNA Secondary Structure
Secondary structure. A set of pairs S = { (bi, bj) } that satisfy: [Watson-Crick.] S is a matching and each pair in S is a Watson-Crick
complement: A-U, U-A, C-G, or G-C.
[No sharp turns.] The ends of each pair are separated by at least 4
intervening bases. If (bi, bj) I S, then i < j – 4. [Non-crossing.] If (bi, bj) and (bk, bl) are two pairs in S, then we cannothave i < k < j < l. Free energy. Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy.approximated by number of base pairs Goal. Given an RNA molecule B = b1b2…bn, find a secondary structure S that maximizes the number of base pairs. The University of SydneyPage 3939 RNA Secondary Structure: ExamplesGGGGG CUGGCU CGCGCUAUAUAG UAUAUAbase pairAUGUGGCCAU AUGGGGCAU AGUUGGCCAU 4ok sharp turn crossing The University of SydneyPage 40 RNA Secondary Structure: Subproblems First attempt (Step 1). OPT(j) = maximum number of base pairs in a secondary structure of the substring b1b2…bj.1match bt and bitj Difficulty (in Step 2). Results in two sub-problems. Finding secondary structure in: b1b2…bt-1. Finding secondary structure in: bt+1bt+2…bj-1.OPT(t-1)The University of Sydney need more sub-problems Page 41 Dynamic Programming Over Intervals Step 1 Step 1: Define subproblemsOPT(i, j) = maximum number of base pairs in a secondary structure of the substringbibi+1…bj.The University of SydneyPage 42 Dynamic Programming Over Intervals Step 2 Notation. OPT(i, j) = maximum number of base pairs in asecondary structure of the substring bibi+1…bj. Step 2: Find recurrencesThe University of SydneyPage 43ijCase 1. Base bj is not involved in a pair. OPT(i, j) = OPT(i, j-1) Dynamic Programming Over Intervals Step 2 Notation. OPT(i, j) = maximum number of base pairs in asecondary structure of the substring bibi+1…bj. Step 2: Find recurrences The University of SydneyPage 44imatch bt and bitjCase2. Basebjpairswithbt forsomeit M[w] + cvw) {
M[v] M[w] + cvw
successor[v] w }
} }
If no M[w] value changed in iteration i, stop.
}
}
Analysis: Q(mn) time, Q(n) working space. The University of Sydney Page 67

Shortest Paths: Practical Improvements
Practical improvements
Maintain only one array M[v] = shortest v-t path that we have
found so far.
No need to check edges of the form (v, w) unless M[w] changed in previous iteration.
Theorem: Throughout the algorithm, M[v] is length of some v-t path, and after i rounds of updates, the value M[v] is no larger than the length of shortest v-t path using i edges.
Overall impact
Working space: O(n).
Total space (including input): O(m+n)
Running time: O(mn) worst case, but substantially faster in practice.
The University of Sydney
Page 68
68

6.3 Segmented Least Squares
The University of Sydney Page 69

Segmented Least Squares
Least squares.
Foundational problem in statistic and numerical analysis.
Given n points in the plane: (x1, y1), (x2, y2) , . . . , (xn, yn).
Find a line y = ax + b that minimizes the sum of the squared error:
y
SSE = an (yi -axi -b)2 i =1
x
Solution. Calculus min error is achieved when
The University of Sydney Page 70
a=naixiyi -(aixi)(ai yi), b=ai yi -aaixi
n a x 2 (a x )2 n ii ii
O(n)

Segmented Least Squares
Segmented least squares.
Points lie roughly on a sequence of several line segments.
Givennpointsintheplane(x1,y1),(x2,y2),,(xn,yn)with x1 < x2 < … < xn, find a sequence of lines that minimizes f(x). Question. What’s a reasonable choice for f(x) to balance accuracy and complexity?number of linesy The University of Sydneyx Page 71 Segmented Least Squares Segmented least squares. Points lie roughly on a sequence of several line segments. Givennpointsintheplane(x1,y1),(x2,y2),…,(xn,yn)withx1 < x2 < … < xn, find a sequence of lines that minimizes: the sum of the sums of the squared errors E in each segment the number of lines L Tradeoff function: E + cL, for some constant c > 0.
y
The University of Sydney
x Page 72

Segmented Least Squares
Segmented least squares.
Points lie roughly on a sequence of several line segments.
Givennpointsintheplane(x1,y1),(x2,y2),,(xn,yn)with
x1 < x2 < … < xn, find a partition into segments that minimizes: the sum of the sums of the squared errors E in each segment the number of segments L Tradeoff function: E + cL, for some constant c > 0.
y
The University of Sydney
x Page 73

Dynamic Programming: Multiway Choice Step 1 Step 1: Define subproblems
OPT(j) = minimum cost for points p1, p2 , . . . , pj.
The University of Sydney Page 74

Dynamic Programming: Multiway Choice Step 2
Notations:
OPT(j)=minimumcostforpointsp1,p2 ,,pj.
e(i,j) =minimumsumofsquaresforpointspi,pi+1 ,,pj.
Step 2: Finding recurrences
Lastsegmentusespointspi,pi+1 ,,pj forsomei. Cost = e(i, j) + c + OPT(i-1).
OPT(j) = min { e(i,j) + c + OPT (i-1) } 1ij
The University of Sydney Page 75

Dynamic Programming: Multiway Choice Step 3 Step 3: Solving the base cases
OPT(0) = 0
OPT(j) = (0 if j = 0 min1ije(i,j)+c+OPT(i1) ifj>0
The University of Sydney
Page 76

Segmented Least Squares: Algorithm
O(n2) iterations
O(n) iterations
INPUT: n, (p1,,pn), c
Segmented-Least-Squares() {
M[0] = 0
for j = 1 to n
for i = 1 to j
compute the least square error eij for the segment pi,, pj
O(n)
O(n)
for j = 1 to n
M[j]=min1 i j(eij +c+M[i-1])
return M[n] }
OPT(j) = (0 if j = 0 min1ije(i,j)+c+OPT(i1) ifj>0
The University of Sydney
Page 77

Segmented Least Squares: Algorithm
2 O(n )
iterations
O(n) iterations
for j = 1 to n fori=1toj
compute the least square error eij for the segment pi,, pj
for j = 1 to n
M[j]=min1 i j(eij +c+M[i-1])
O(n)
O(n)
INPUT: n, (p1,,pn), c
Segmented-Least-Squares() {
M[0] = 0
The University of Sydney
Page 78
return M[n] }
Running time: O(n3) Space: O(n2)

Dynamic Programming Summary I
3 steps:
1. Definesubproblems
2. Findrecurrences
3. Solvethebasecases
4. Transformrecurrenceintoanefficientalgorithm [usually bottom-up]
The University of Sydney
Page 79
79

Dynamic Programming Summary II
1D dynamic programming
Weighted interval scheduling
Segmented Least Squares
Maximum-sum contiguous subarray Longest increasing subsequence
2D dynamic programming Knapsack
Shortest path
Longest common subsequence
Dynamic programming over intervals RNA Secondary Structure
The University of Sydney
Page 80

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS Bioinformatics algorithm Lecture 7: Dynamic Programming II
$25