Assignment title: Information


Problem 1 (10pts) Would the solution to 1.17 in Borodovsky and Ekisheva change if you had the following additional information? There are twice as many boxes with 6-base cutters as with 4-base cutters arranged in no particular order in absent-minded researcher's lab. If the answer is yes what are the posterior probabilities? Problem 2 (10pts) Align two sequences shown below using Smith-Waterman algorithm. Use match score of 4, mismatch score of -4 and gap penalty score of 2. Show: a) dynamic programming matrix with scores (as it shown in Figure 2.6 in Durbin et al.) b) trace back pointers c) total score sequences: sequence 1: AGAGCTCACAA sequence 2: AGTAGCTTCCAAA Note, that you applied Needelman Wunsch algorithm to these sequences in homework 1. Problem 3 (40 pts) For an alignment shown below derive log-odds score for the column 1 only by a) Computing observed probability values b) Expected probability values TAGCTT AAGCTC T-GGTT TGGCAT TACCTT Problem 4 (30pts) Using 1st order models shown in homework 3 solution determine the probability of coding region in frame two for DNA fragment AGTAGCTTCCAG. Use only parameters provided in the homework solution (posted in course content). Show all of your work. Problem 5 (60 pts) For the zero order hidden Markov model defined in homework 3 determine probability of coding state at the last nucleotide of the sequence AGTAG. Use parameters provided in the homework solution (posted in course content). Show all of your work. Problem 6 (10pts) p-value and e-value are used to assess the significance of the alignment. Can you think of additional ways of evaluating the strength of the alignment other than bit Score? Problem 7 (30 pts) Provide detailed reasoning for the following statement. For comparison/alignment of closely related species BLOSUM80 is a better choice than BLOSUM62. Problem 8 (10pts) In HW4 we were given aligned block of sequences to derive model parameters for profile HMM. We used observed counts to accomplish this task. What is(are)possible limitation(s) of this approach? What would you suggest as solution(s)?