[SOLVED] Bioinformatics c++ matlab scheme algorithm HW4

$25

File Name: Bioinformatics_c++_matlab_scheme_algorithm_HW4.zip
File Size: 433.32 KB

5/5 - (1 vote)

HW4

Homework#4forBioimagingandBioinformatics
BME2210Spring2017
Bioinformaticsportion

Due(as5files4.cppfilesand1.txtfiletotheICONdropbox)by11amonMonday,February
20

ImplementNeedleman-Wunschalgorithm

ImplementtheglobalalignmentNeedleman-Wunschalgorithmforproteinsequencesinan
iterativefashion(i.e.,donotuserecursion,whichsimplifiestheimplementationforthosethat
donothaveexperiencewithrecursion.)Foraniterativesolution,wewilluse3matrices:

TheAmatrixwillstorethefinalalignmentscores.
TheBmatrixissimplyaninitializedAmatrixwithBlosum62scoresforeachpairof

residues.
TheDmatrixwillstoreourdirections.

Ifyouwish,youcanlookatthesamplecode(includedbelow),whichprovidesyouwith
functionsto:

initializeAand/orBmatrices
printmatrices,and
usetheBLOSUM62substitutionmatrixforscoring

However,notethatthesamplecodeiswritteninMatlab,tohelpyouunderstandtheoverall
algorithm,butyouneedtoturninyourcodeforparts1-4inC++(.cpp)files.Thus,whilethe
Matlabcodelimitsthenumberofaminoacidsto25,yourcodeshouldusestringsinorderto
handleanysizeofinput;andwhiletheMatlabcodemerelyinitializesthematrixtobefilled
withzeros(insteadofundefinedvalues),youactuallyneedtofillthematrixwithrealvalues.Its
mainuseistoshowyouhowamatrixcanbefairlyeasilyprinted.

Hint1:C++caneasilyhandlepaddingastringwithspacesusingthesetwandsetfillcommands,
forwhichyouwouldneedtoincludeiomanipaswellasiostreami.e.,asdescribedat
http://www.cplusplus.com/reference/iomanip/setw/.(actuallyyouwontevenneedsetfill(),
sincespacesarethedefaultpaddingmaterialIjustmentionthatforfuturereference)Ifyou
trytousetheolderCstyleofusingastatementsuchasprintf(%3s
,x);,forwhichyou
wouldneedtoincludestdio.h,youwillhaveamuchhardertimeprintingoffthematricesinthis
assignment,sinceyouwillhavetofigureoutawaytoconvertanobjectofstringclass(and
beingofanyarbitrarily-givensize)totheolderC-stylecharacterarraythatprintfdemands.
Thus,IhighlyrecommendtheC++style,asshowninthecodebelowwhereIprintanarbitrary
string(here,avalueof-2)withanarbitraryamountofspace-padding(here,3spaces,although
notethatthemaximumwidthofanyvalueintheBLOSUM62matrixisonly2characterspaces,
sinceitsmaximumvalueis11anditsminimumvalueis-4):

#include
#include
#include
usingnamespacestd;

intmain()
{
stringx=-2;
cout<<setw(3)<<x<<endl; //somewhatequivalentto:printf(“%3s
“,x); return0;}Hint2:theNeedleman-WunschalgorithmisdescribedintheReadingMaterial(especiallyL8-PairwiseAlignment1-Fall2015.pdf),andinevenmorehelpfuldetailinthebook(chapter3,especiallypages76-80,anditmaybehelpfultokeepgoingandreadportionsoftherestofthechapter,likeSmith-Watermanon82,heuristicsonpage84,dotplotson85,etc.).Payattentiontothedetailse.g.,eachmatrixisofdimensionality(n+1)*(m+1),wherethelengthoftheinputsequence1isnandthelengthoftheinputsequence2ism.Inotherwords,thesematricesareonedimensionlonger(inboth#ofrowsandcolumns)thanthelengthsofthetwosequences.(doyouseewhythisisso?the1addedtothenistheleftcolumn,andthe1addedtothemistheupperrow,showninFigure3.21(a)intheonlinebook;notethatthesearefilledinfirstbeforeanythingelseisaddedtothematrix,asshowninparts(b)-(g)ofthatfigure).Hint3:YoualreadyknowthatC++arrayindicesstartwith0,sothatthefirstpositionofastringisstoredinarrayindex0,thesecondinarrayindex1,etc.However,Needleman-Wunschintroducesatwist,inthatb/coftheadditionaldimensionaddedtoboththerowandthecolumn(addedatthe0thposition),thefirstpositionofthesequence1correspondstotheSECONDcolumnofthematrix,whichendsupmakingitsarrayindexequalto1afterall.Thesameappliestosequence2andtherowsofthematrix.Ratherthanattempttodescribethisfurtherin1,000+words,hereisapictureinstead:0 1 2 3 4 5Sequence Indices A R N C D ESequence Characters0 1 2 3 4 5 6Matrix Indices 0 0-2-4-6-8 -10 -12 0 A 1-2 4-1-2 0-2-11 R 2-4-1 5 0-3-2 02 N 3-6-2 0 6-3 1 03 D 4-8-2-2 1-3 6 24 C 5 -10 0-3-3 9-3-45 Q 6 -12-1 1 0-3 0 26 E 7 -14-1 0 0-4 2 5Hint4:Toforestallquestionsfrompeoplewhoarenotcarefulreaders,Iwillpointoutthatwhilesomepartsoftheonlinebooktalksaboutusinganidentitymatrixtoscorethealignment,otherpartstalkaboutusingaBLOSUM62oneinstead.e.g.,Figure3.20demonstratesanexamplewithBLOSUM62.Thelessonhereis:donotsimplyreadasentenceoutofcontext!Yes,youshouldactuallyreadtheWHOLEthing,ifyouwanttoknowhowitworks.Part1:Createafilehw4.1.cpp.Inthispart,simplyinitializetheBmatrixandprintitout.Scoringscheme: Gappenaltyof-2 Match/mismatchscoresfromBLOSUM62Youcangetthelatterfrom:ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62.Notethatyouarenotrequiredtowritecodetoreadthismatrixfromafileyoumayhard-codeitintoyourprogramasamatrixifyouprefer.Exampleoutput:Sequence 1: ARNDCQE Sequence 2: ARNCDE B matrix A R N C D E 0-2-4-6-8 -10 -12A-2 4-1-2 0-2-1R-4-1 5 0-3-2 0N-6-2 0 6-3 1 0D-8-2-2 1-3 6 2C -10 0-3-3 9-3-4Q -12-1 1 0-3 0 2E -14-1 0 0-4 2 5Part2:Createafilehw4.2.cpp.Inthispart,nowcreatetheAmatrixandprintitout.(RememberthatBEFOREyoudothecalculationstoobtainthefinalvaluesthatthismatrixwillcontain,youwillneedtoinitializethismatrixthesamewaythatyouinitializedtheBmatrix.e.g.,youcouldsimplycopytheBmatrixintoA,ifthatiseasierforyou.AlthoughIfounditeasiertojustputtheinitializationcodeintoafunction,andthenIcalledittwice,firstwithBandthenwithA.)Hint:exceptforthefirstcolumnandfirstrow,eachmatrixscoreisthemaximumof3possibilities:analignmentoftworesidues,agapinsequence1,oragapinsequence2.Continuingtheexampleoutputfromabove:A matrix A R N C D E 0-2-4-6-8 -10 -12A-2 4 2 0-2-4-6R-4 2 9 7 5 3 1N-6 0 7151311 9D-8-2 513121917C -10-4 311222018Q -12-6 1 9202222E -14-8-1 7182227score = 27(doyouseewherescore=comesfrom?hint:bottom-rightcornerofthematrix) Part3:Createafilehw4.3.cpp.Inthispart,nowcreatetheDmatrixandprintoutthecontents.AsyouwerecalculatingtheAmatrixinpart2above,youhadtoselectadirectionfromwhichtocalculatethenewsub-totals(Vertical(V),Horizontal(H),orDiagonal(D)thesearethearrowsshowninthelectureslides/readingmaterial/onlinebookfigures).KeeptrackofthesedirectionsinthisDmatrix.PrintoutthisDmatrixaspartofyouranswer.IfmorethanonedirectionispossiblethatincludesDiagonal(D),thensimplyselectD.IfmorethanonedirectionispossiblebutDisnotincluded(i.e.,HandVonly),thenarbitrarilyselectV.Notethatyoucaninitializethefirstrow/columntoallH/V.[Optional]:Insteadofpickingarbitrarily,youcouldencodethecombinationsofdifferentdirectionswithadditionalcharacters.Idsuggestthefollowingscheme,forcompatibilitywithotherpeoplescode:forV&H=>+,forD&V=>y,forD&H=>2,andforV,D&H=>*.Note
thatonereasontodothisisthatifyoudonotdealwithsuchscenarios,thenthecodebecomes
non-deterministic(meaningthatifIreversetheorderofthetwoinputsequencesi.e.,swap
sequence#1and#2Icangetadifferentanswer,mainlyb/cofarbitrarilypickingV,and
thereforeswappingH&Vwillaffecttheoutcome).

Continuingthesampleoutputfromabove:

D matrix
A R N C D E
0 H H H H H H
A V D H H H H H

R V V D H H H H
N V V V D H H H
D V V V V D D H
C V V V V D H H
Q V V V V V D D
E V V V V V D D

Part4:Createafilehw4.4.cpp.Inthispart,itisnowstraight-forwardtousetheDmatrixto
generatethealignment.Dothat,andthenprintoutthatalignment.

Continuingthesampleoutputfromabove,thealignmentwouldbe:

ARNDCQE
||| | |
ARN-CDE

Notehowtheverticalbarisonlyshownforidenticalletters,butnotforgapsormismatches.

[OptionalChallenge].Purelyforyourowninterest,youcouldimplementthisNeedleman-
Wunschalgorithmusingrecursion.Thereareanumberofreasonsthatyoumightwanttodo
thisbutifyouendupdoingthat,whichdoyoufindthatyouprefer?

Part5:Createaplain-textfilehw4.5.txt.Inthispart,writeafewsentencestobrieflydiscuss
(i.e.,justafewsentenceseach)thefollowingissues:

a) Whatwerethemajorobstacles/hurdlesthatyouhadtoovercomeduringthis
assignment?

a. Personally,wasityourmasteryofthetoolthatyouused(C++),orbasicbiology
(knowingwhatanaminoacidis),orbioinformatics(knowingwhatan
alignmentis),orjustfindingtimetodothehomework?Thoseareallhonest
andvalidresponsestothisquestion.

b. WhatIamparticularlyinterestedin,though,is:whatwerethemajorengineering
challengesthathadtobeovercometosolveproblemsofthistype?

b) NowthatyouhavecodethatimplementstheNeedleman-Wunschalgorithm,what
wouldbethenextstep(s)forsomeonetotake?(dontworry,itdoesnthavetobeyou,

atleastnotforthisclass:-)

a. Ifyouweregoingtomakeimprovementstoyourcode,whataresomeideasthat
youcouldstartoffwith,andwhymightyouwanttodothoseimprovements?

(Igaveyousomeideasabove,likeaddingsymbolsthatgobeyondjustH/V/D,
whichwouldallowmoreoptionstobeexploredinsteadofbeingcompletely
arbitrary;andanotherideawouldbetoreadintheBLOSUM62matrixfromafile
insteadofhard-codingitin,b/cthatwouldallowyoutouseother,alternative
matricesinthefuture,suchasBLOSUM45,orPAM250,etc.ButsinceIgaveyou
theseideas,pleasepickotheronesnotethatrecursionisfinetoinclude,as
longasyoustatewhyitwouldbeuseful,butinthatcasepleasealsoincludeat
leastonemoreideaaswell)

b. Goingbeyondjustthiscode,whataresomemoregeneralimprovementsthat
youcouldmake,andwhywouldyouwanttodothem?

Hint:themainanswerthatIamlookingforhereisthenameofanalgorithmthat
isbetterthanNeedleman-Wunschcanyouthinkofany?

c) PretendthatIamaworld-classexpertatthesubjectofbioinformatics(e.g.,Iknowall
aboutalignments,andalgorithms,andscoring,etc.),butyetsomehowIknewnothing
abouttheNeedleman-Wunschalgorithm(asifthatwerepossible!MaybeImfroman
alternateuniverseorsomething:-).DescribeformewhatNeedleman-Wunschisin
general,whatitdoesinparticular,andwhatitcandoforme.Like,whenshouldIuseit?
WhenwouldIbebetteroffnottouseit,andshouldusesomethingelseinstead?
Notably,whatarethekeyconceptsthatdistinguishitfromallotherbioinformatics
algorithms?

GRADINGRubric(100pointstotal):

15pointsforworkingcode,part1.
15pointsforworkingcode,part2.
30pointsforworkingcode,part3.
40pointsforworkingcode,part4.
5pointsforanswers,part5.

NOTE:Ifyourprogramdoesnotcompileyouwillreceiveazeroonthatpartofyourhomework.
Inaddition,latehomeworkwillnotbeaccepted.DONOTWORKTOGETHER!Studentscaught
workingtogetheronthisoranyassignmentwilldropawholelettergradeforthecourse!

% Code to start part HW4 from.
function HW4_0

% A trial sequence with all amino acids:
% ARNDCQEGHILKMFPSTWYV

% The trial sequences in the HW description:
% ARNDCQE
% ARNCDE

% Ask the user for protein sequence 1.
prompt = Enter the first amino acid sequence of length 1 to 25: ;
seq1 = input(prompt, s);
len1 = length(seq1);
fprintf(1, Seq 1 %s has %d residues
, seq1, len1);

% Ask the user for protein sequence 2.
prompt = Enter the second amino acid sequence of length 1 to 25: ;
seq2 = input(prompt, s);
len2 = length(seq2);
fprintf(1, Seq 2 %s has %d residues
, seq2, len2);

% Initialize a 2525 matrix of zeros.
A = zeros(25,25);

% The supplied initMatrix must be modified!
A = initMatrix(A, seq1, seq2);

% Print the matrix A.
printMatrix(A, seq1, seq2);

end

% Print out a matrix
function mat = initMatrix(mat, seq1, seq2)
numRows = length(seq1) + 1;
numCols = length(seq2) + 1;

mat(1,1) = 0; % Init the value at (1,1) to 0.

% Initialize the first column.
for i=2:numRows
% Do something here!
end

% Initialize the first row.
for c=2:numCols
% Do something here!
end

% Init the rest of the matrix.
for r=2:numRows
for c=2:numCols
% Do something here!
end
end

end

% Print out a matrix
function printMatrix(mat, seq1, seq2)
len1 = length(seq1);
len2 = length(seq2);

% Print the first row of sequence letters
fprintf(1,tt);
for r=1:len2
fprintf(1,%2ct, seq2(r));
end
fprintf(1,
);

% Print the first row of scores.
fprintf(1,t);
for c=1:len2+1
fprintf(%2dt, mat(1,c));
end
fprintf(1,
);

% Print the rest of the scores.
for r=2:len1+1
% Print the first letter of vertical sequence
fprintf(1, %ct, seq1(r-1));
for c=1:len2+1
fprintf(1,%2dt,mat(r,c));
end
fprintf(1,
);
end
end

function value = scorePair(char1, char2)
% Order: ARNDCQEGHILKMFPSTWYV
i1 = aaToInt(char1);
i2 = aaToInt(char2);
A = [
4 -1 -2 -20 -1 -10 -2 -1 -1 -1 -1 -2 -110 -3 -20;
-150 -2 -310 -20 -3 -22 -1 -3 -2 -1 -1 -3 -2 -3;
-2061 -30001 -3 -30 -2 -3 -210 -4 -2 -3;
-2 -216 -302 -1 -1 -3 -4 -1 -3 -3 -10 -1 -4 -3 -3;
0 -3 -3 -39 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1;
-1100 -352 -20 -3 -210 -3 -10 -1 -2 -1 -2;
-1002 -425 -20 -3 -31 -2 -3 -10 -1 -3 -2 -2;
0 -20 -1 -3 -2 -26 -2 -4 -4 -2 -3 -3 -20 -2 -2 -3 -3;
-201 -1 -300 -28 -3 -3 -1 -2 -1 -2 -1 -2 -22 -3;
-1 -3 -3 -3 -1 -3 -3 -4 -342 -310 -3 -2 -1 -3 -13;
-1 -2 -3 -4 -1 -2 -3 -4 -324 -220 -3 -2 -1 -2 -11;
-120 -1 -311 -2 -1 -3 -25 -1 -3 -10 -1 -3 -2 -2;
-1 -1 -2 -3 -10 -2 -3 -212 -150 -2 -1 -1 -1 -11;
-2 -3 -3 -3 -2 -3 -3 -3 -100 -306 -4 -2 -213 -1;
-1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -47 -1 -1 -4 -3 -2;
1 -110 -1000 -1 -2 -20 -1 -2 -141 -3 -2 -2;
0 -10 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -115 -2 -20;
-3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -11 -4 -3 -2 112 -3;
-2 -2 -2 -3 -2 -1 -2 -32 -1 -1 -2 -13 -3 -2 -227 -1;

0 -3 -3 -3 -1 -2 -2 -3 -331 -21 -1 -2 -20 -3 -14];
value = A(i1, i2);
end

function value = aaToInt(aa)
switch aa
case A,
value = 1;
case R,
value = 2;
case N,
value = 3;
case D,
value = 4;
case C,
value = 5;
case Q,
value = 6;
case E,
value = 7;
case G,
value = 8;
case H,
value = 9;
case I,
value = 10;
case L,
value = 11;
case K,
value = 12;
case M,
value = 13;
case F,
value = 14;
case P,
value = 15;
case S,
value = 16;
case T,
value = 17;
case W,
value = 18;
case Y,
value = 19;
case V,
value = 20;
otherwise,
value = 1;
end
end

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Bioinformatics c++ matlab scheme algorithm HW4
$25