The input consists of four items: a string of all (unique) letters of length p, a (p + 1) x (p + 1) substitution matrix (represented by a list of lists) and the two sequences. For instance, the first input could be ABC in which case the second input should be a symmetric 4 x 4 matrix, presented as a list of five five-element lists, each of them being a row vector. The indices are as follows: 0 for A, 1 for B, 2 for C, 3 for indel denoted by _. So if the scoring matrix is A B C _ A 1 -1 -2 -1 B -1 2 -4 -1 C -2 -4 3 -2 _ -1 -1 -2 0 the first two inputs should be ABC and [[1,-1,-2,-1], [-1,2,-4.-1],[-2,-4,3,-2],[-1,-1,-2,0]]. You need to provide three functions which implement the following algorithms.
1. Basic dynamic programming that runs in quadratic time and space [50 marks].
2. Dynamic programming that runs in linear space [up to 65 marks for 1 and 2 combined].
3. A Heuristic procedure that runs in sub-quadratic time (similar to FASTA and BLAST) [up to 85
marks for 1,2 and 3 combined]. The functions should be in a single text file with names dynprog, dynproglin, and heuralign, respectively. A typical signature would be as follows: def dynprog (alphabet, scoring_matrix, sequence1, sequence2): Each function should return a list consisting of the following three items: the score of the best local alignment found by the algorithm plus two lists of indices, one for each input sequences, that realise the matches/mismatches in the alignment. For instance, if we aligned the sequences ABCACA and BAACB in the following way A B C A _ C A _ B _ A A C B the last two outputs should be [1,3,4,5] and [0,1,3,4]. This part will be marked automatically, so please make sure that you get both the inputs and the outputs in the right form. The memory usage of dynproglin will be restrictedmeasured (in a reasonable way). The first two will be expected to run perfectly, i.e. to pass all the tests and will be marked purely based upon correctness. For the second one, full credit will only be given to solutions that run in quadratic time, while the trivial solution, which runs in cubic time, will count for very little (and thus a partially correct quadratic-time solution will most likely be given a higher mark than a fully correct but cubic- time one). The third one will be assessed on trade-offs between running time and quality of output. A typical input will come with a planted alignment, which consists of segments of matches of different lengths separated by some random stuff (so that I will know that there is an alignment, which is at least as good as the planted one). You can use numpy but not any other packages/libraries. Partial marks will be awarded, of course, based on the number of tests passed! I will provide some inputs to test your implementation with.
1. Design your own substitution-cost function that operates on pairs of sequences of letters instead of on pairs of letters. Clearly describe it on at most one page [15 marks]. For instance, such a function might
a. give a cost of multi-letter substitution of ABC by CBB, which is different from the simple addition of the single-letter costs, the mismatches AC and CB and the match BB;
b. have a fixed cost for inserting/deleting a sequence of Cs irrespective of its length, e.g. the first C incurs a cost of 1/2, the second one a cost of 1/4, the third one a cost of 1/8 and so on (so that the total is less than 1).
[SOLVED] C algorithm The input consists of four items: a string of all (unique) letters of length p, a (p + 1) x (p + 1) substitution matrix (represented by a list of lists) and the two sequences. For instance, the first input could be ABC in which case the second input should be a symmetric 4 x 4 matrix, presented as a list of five five-element lists, each of them being a row vector. The indices are as follows: 0 for A, 1 for B, 2 for C, 3 for indel denoted by _. So if the scoring matrix is A B C _ A 1 -1 -2 -1 B -1 2 -4 -1 C -2 -4 3 -2 _ -1 -1 -2 0 the first two inputs should be ABC and [[1,-1,-2,-1], [-1,2,-4.-1],[-2,-4,3,-2],[-1,-1,-2,0]]. You need to provide three functions which implement the following algorithms.
$25
File Name: C_algorithm_The_input_consists_of_four_items:_a_string_of_all_(unique)_letters_of_length_p,_a_(p_+_1)_x_(p_+_1)_substitution_matrix_(represented_by_a_list_of_lists)_and_the_two_sequences._For_instance,_the_first_input_could_be_ABC_in_which_case_the_second_input_should_be_a_symmetric_4_x_4_matrix,_presented_as_a_list_of_five_five-element_lists,_each_of_them_being_a_row_vector._The_indices_are_as_follows:_0_for_A,_1_for_B,_2_for_C,_3_for_indel_denoted_by__._So_if_the_scoring_matrix_is_A_B_C___A_1_-1_-2_-1_B_-1_2_-4_-1_C_-2_-4_3_-2___-1_-1_-2_0_the_first_two_inputs_should_be_ABC_and_[[1,-1,-2,-1],_[-1,2,-4.-1],[-2,-4,3,-2],[-1,-1,-2,0]]._You_need_to_provide_three_functions_which_implement_the_following_algorithms..zip
File Size: 6782.4 KB
Only logged in customers who have purchased this product may leave a review.
Reviews
There are no reviews yet.