Assignment title: Information
Problem 1 (10pts)
Would the solution to 1.17 in Borodovsky and Ekisheva change if you had the following
additional information? There are twice as many boxes with 6-base cutters as with 4-base
cutters arranged in no particular order in absent-minded researcher's lab. If the answer is
yes what are the posterior probabilities?
Problem 2 (10pts)
Align two sequences shown below using Smith-Waterman algorithm.
Use match score of 4, mismatch score of -4 and gap penalty score of 2.
Show:
a) dynamic programming matrix with scores (as it shown in Figure 2.6 in Durbin et al.)
b) trace back pointers
c) total score
sequences:
sequence 1:
AGAGCTCACAA
sequence 2:
AGTAGCTTCCAAA
Note, that you applied Needelman Wunsch algorithm to these sequences in homework 1.
Problem 3 (40 pts)
For an alignment shown below derive log-odds score for the column 1 only by
a) Computing observed probability values
b) Expected probability values
TAGCTT
AAGCTC
T-GGTT
TGGCAT
TACCTT
Problem 4 (30pts)
Using 1st order models shown in homework 3 solution determine the probability of
coding region in frame two for DNA fragment AGTAGCTTCCAG. Use only parameters
provided in the homework solution (posted in course content). Show all of your work.
Problem 5 (60 pts)
For the zero order hidden Markov model defined in homework 3 determine probability of
coding state at the last nucleotide of the sequence AGTAG. Use parameters provided in
the homework solution (posted in course content). Show all of your work.
Problem 6 (10pts)
p-value and e-value are used to assess the significance of the alignment. Can you think of
additional ways of evaluating the strength of the alignment other than bit Score?
Problem 7 (30 pts)
Provide detailed reasoning for the following statement. For comparison/alignment of
closely related species BLOSUM80 is a better choice than BLOSUM62.
Problem 8 (10pts)
In HW4 we were given aligned block of sequences to derive model parameters for profile
HMM. We used observed counts to accomplish this task. What is(are)possible
limitation(s) of this approach? What would you suggest as solution(s)?