The Smith-Waterman algorithm identifies the similar regions in two genome sequences. In this assignment, you will implement an MPI version of the Smith-Waterman algorithm to measure the similarity between two strings.
The pseudocode of the Smith-Waterman algorithm is shown in Algorithm 1. Given string A = a1,a2,,an and string B = b1,b2,,bm, we first construct a scoring matrix H of size (n + 1) (m+1) and set the first row and column to zero. Then, we iteratively compute the scoring matrix using the following dynamic programming formula:
Hi1,j1 + s(ai,bj) Hi1,j wHij = maxHi,j1 w 0 | (1 i n,1 j m) |
where s is the substitution matrix is the match score, v is the
mismatch score, and w is the gap penalty. The value of u, v and w are constant and are give as part of the input to the algorithm. Finally, we return the highest score in the scoring matrix as the similarity score between A and B.
Input and Output
The input file will be in the following format:
- The first line contains two integers a_len and b_len (1 a_len 20000,1 b_len 20000), separated by a space. They represent the length of string a and b, respectively.
- The second line is a string a of length a_len, consisting of four letters A,T,G,C.
- The third line is a string b of length b_len, consisting of four letters A,T,G,C.
Assignment 1 COMP5112, 2020 Spring
Algorithm 1: The Smith-Waterman Algorithm
Input : string A = a1,a2,,an and B = b1,b2,,bm, match score u, mismatch score v, gap penalty w
Output: the similarity score between A and B
- int score[|A| + 1][|B| + 1];
- for i 0 to |A| do
3score[i][0] 0;
- end
- for i 0 to |B| do
6score[0][i] 0;
- end
- int max_score 0;
- for i 1 to |A| do
- for j 1 to |B| do
11score[i][j] max(0,
12score[i 1][j] w,
13score[i][j 1] w,
14score[i 1][j 1] + sub_mat(ai,bj));
15max_score max(max_score,score[i][j]);
- end
- end
The output of the program consists of two lines:
- The similarity score between a and b, i.e., the highest score in the scoring matrix.
- The elapsed time in seconds of the execution.
We assume that the match score is 3, the mismatch score is -3, and the gap penalty is 2.
Here is an example input/output for your reference:
Input:
9 8
GGTTGACTA TGTTACGG
Output:
13
Time: 0.00001 s
Reviews
There are no reviews yet.