[SOLVED] DNA matlab Bioinformatics algorithm HW3

$25

File Name: DNA_matlab_Bioinformatics_algorithm_HW3.zip
File Size: 367.38 KB

5/5 - (1 vote)

HW3

Homework#3forBioimagingandBioinformatics
BME2210Spring2017
Bioinformaticsportion

Due(as3.cppfilestotheICONdropbox)by11amonWednesday,February8

SequenceAlignmentBasics

Part1:Createafilehw3.1.cpp.WriteaprogramthatpromptstheuserfortwoDNAsequence
stringsandscoresanalignment.

EachAorTmatchcontributesascoreof+2(soAmatchingA,TmatchingT,Amatching
T,andsoon)

EachCorGmatchcontributesascoreof+3
Eachnon-match,includingmismatchesandgaps,contributesascoreof-2

Youmayassumethattheinputisvalid(onlyDNAbases),althoughyoushouldallowtheuserto
inputthesequenceineitherupper-orlower-caseletters.Forexample,twosequencesand
theircorrespondingscorecouldbe:

Sequence 1: ATGCTGACTGCA

Sequence 2: CTTGAGACG

A/T score = 3*2= 6
C/G score = 3*3= 9

Non-match = 6*-2= -12
Total score =3

Part2:Createafilehw3.2.cpp.Youcanstartwiththeprogramyouwroteinpart1andmodify
itifyoulike.Inthisone,youaretoassesswhetheragivenRNAsequencecouldrepresenta
codingstrand(CDS)foraprotein(assumenointrons)i.e.,ifitcanrepresentavalid
messengerRNA(mRNA)withastartcodonandin-framestopcodon.Tosimplifythings,input
onlyasinglesequenceinsteadof2sequences.Scanthatsequenceinall6readingframes(that
is,inboththeforwardandreversedirections,andinall3readingframesforeachofthose
directions)determineifthereisatleastonestartcodon(AUG)withanin-framestopcodon
(UAA,UAGorUGA).TheoutputforthegivensequenceshouldbeeitherNo,ifthesequence
doesnotappeartobeavalidmRNA,orYes: ,ifthesequenceappearstobeavalidmRNA,
andwhere isthenumberofaminoacidsintheresultingtranslatedprotein.

YoucanassumeforthisassignmentthatthegivensequenceencodesatmostonevalidCDS

region,andyoucanalsogoaheadandstopatthefirststopcodonthatyouencounter(although
inaREALbioinformaticsprogram,youwouldhavetofindandaccountfor/dealwithallsuch
possibilities).

Forexample,thesequenceGGGAUGAAAUAACCCwouldresultinanoutputofYes:2.

Part3:Createafilehw3.3.cpp.Copyyouranswertopart1above(not2)andmodify
accordingly.Thistimeyoumayagainassumethattheinputisvalidanddonotneedtoverifyas
youdidinpart2.

Thisscriptwilltryalloverlappingcombinationstofindthemaximumalignmentscore.Youcan
addgapsonthefront/backofthesequencestofacilitatetheanalysis.Youdonotneedtoinsert
internalgaps.Usethesamescoringfunctionthatyoudidinpart1above(i.e.,AorTmatches=
+2,CorG=+3,andmismatchorgap=-2).Tosimplifytheassignment,anycharacterpaired
withadash()shouldbecalledagap.However,agapcharacterpairedwithanothergap
character(atthesequenceends)isconsideredtonotbeproperlypartofthealignmenti.e.,
eitherskipthosescenariosentirelyinyourscoringprocess,orelsegivethemascoreof0(which
hasthesameeffect,andmightbeeasiertoimplement,dependingonhowyougoaboutit).

Forexample:

Enter the first DNA sequence:
ATGCTGACTGCA

Enter the second DNA sequence:
CTTGAGACG

seq1 ATGCTGACTGCA
seq2 CTTGAGACG

0 A/T matches
0 C/G matches

21 mismatches + gaps
score: -42

seq1 ATGCTGACTGCA

seq2 -CTTGAGACG
0 A/T matches

0 C/G matches
20 mismatches + gaps

score: -40

seq1 ATGCTGACTGCA

seq2 CTTGAGACG-
0 A/T matches

0 C/G matches
19 mismatches + gaps

score: -38

seq1 ATGCTGACTGCA
seq2 CTTGAGACG

1 A/T matches
1 C/G matches

16 mismatches + gaps
score: -27

(others not shown)

seq1 ATGCTGACTGCA
seq2 CTTGAGACG

0 A/T matches
0 C/G matches

21 mismatches + gaps
score: -42

Maximum score = ???

Hints:Althoughtherearemanywaystodopart3,itissurprisinglydifficult.Youhavetobe
mindfuloftheendofthesequences(i.e.,donottrytoaccessacharacterpositionthatdoesnot
exist),andcarefulwithloopindices.Formysolution,ImadesureIalwaysknewwhichstring
wasthelongest.IalsomadeafunctioncallrotateSeqRightthatsimplytookasequenceas
input,androtatedallthecharactersrightbyoneposition(althoughifyouusecharacterarrays
insteadofthestringclass,whichIdonotrecommendb/citwouldbemoredifficult,thenmake
suretoexemptthenullterminatingcharacterfromthisprocedure).Ifyoucantgetyour
programtodoallthecomparisons,atleasttrytogetthestringssetupwithdashesanddothe
firstcomparisonandscoringforsomepartialcreditonpart3.Also,incaseithelps,the
followingpageshavesomecodeinMatlabtohelpgetyoustartedonthealgorithm(although
remembertoactuallydotheassignmentinC++).

GRADINGRubric(100pointstotal):
30pointsforworkingcode,part1.
30pointsforworkingcode,part2.
40pointsforworkingcode,part3.

NOTE:Ifyourprogramdoesnotcompileyouwillreceiveazeroonthatpartofyourhomework.
Inaddition,latehomeworkwillnotbeaccepted.DONOTWORKTOGETHER!Studentscaught
workingtogetheronthisoranyassignmentwilldropawholelettergradeforthecourse!

% Code to start Part 3 from.
function HW2_3
% Ask the user for DNA sequence 1.
prompt = Enter the first DNA sequence of length 1 to 99: ;
seq1 = input(prompt, s);
len1 = length(seq1);
sprintf( Seq 1 %s has %d bases, seq1, len1)
% Ask the user for DNA sequence 2.
prompt = Enter the second DNA sequence of length 1 to 99: ;
seq2 = input(prompt, s);
len2 = length(seq2);
sprintf( Seq 2 %s has %d bases, seq2, len2)
% Guess sequence 1 is longer than sequence 2.
longSeq = seq1;
longLen = len1;
shortSeq = seq2;
shortLen = len2;
% Correct an incorrect guess.
if (len2 > len1)
longSeq = seq2;
longLen = len2;
shortSeq = seq1;
shortLen = len1;
end
% Create buffered strings to facilitate the scan.
bufLength = longLen + 2 * shortLen;
scan1 = blanks(bufLength);
scan2 = blanks(bufLength);
% Fill the scan strings with dashes
for i = 1:bufLength
scan1(i) = -;
scan2(i) = -;
end
% Initialize the first scan string with the long sequence in the middle.
for i = shortLen + 1:shortLen + longLen
scan1(i) = longSeq(i-shortLen);
end
% Initialize the second scan string with the short sequence at the
% beginning.
for i = 1:shortLen
scan2(i) = shortSeq(i);
end
% Score & print the initial alignment.
score = scoreSequences(seq1, seq2);
printSeq(scan1, scan2, score);
% Rotate the short sequence to the right, score & print.
scan2 = rotateSeqRight(scan2);
score = scoreSequences(seq1, seq2);
printSeq(scan1, scan2, score);
% Repeat in a loop!!
end

% Use your scoring logic from part 1.
function score = scoreSequences(seq1, seq2)
% To Do!
score = 0;
end

% Print out the sequences and score.
function printSeq(seq1, seq2, score)

fprintf(1, %s
, seq1);
fprintf(1, %s
, seq2);
fprintf(1, Score: %d
, score);
end

% Rotate the sequence to the right.
function seq = rotateSeqRight(seq)
seqLen = length(seq);
last = seq(seqLen);
for i=seqLen:-1:2
seq(i) = seq(i-1);
end
seq(1) = last;
end

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] DNA matlab Bioinformatics algorithm HW3
$25