Assignment title: Management

Homework 2 1. Define zero and first order Markov models for the sequence (seqeuence1_A2) provided in the course content. Sequence1_A2 is Mycobacterium tuberculosis gene mtb48 (15 pts) Hints:

- Zero order Markov model is defined by P(i), where i= {A,T,G,C} - First order Markov Model is defined by P(i|j), where i,j ={A,T,G,C}. For example P(A|T) is

- For this and higher order Markov models read 3.2.1 of Borodovsky and Ekisheva - To implement this would be easiest by writing a small script in R using a alphabetFrequency

probability of observing A after T in DNA sequence function of the Biostrings package you have already installed or perl or any other language

of your choice. Otherwise, if you have to, exhausted all the options , see no other way and hopelessly behind on your schedule, you can use Microsoft word or excel's substitute function or MS word's find/replace.

2. Using models you derived in (1) determine the probability of DNA fragment AGTAGCTTCCAG

(this fragment was also used in A1) (25 pts) 3. Given hidden Markov Model framework (10pts)

a. What is hidden? b. What is emitted?

Feel free to use examples

4. a) Define zero order Markov model for sequence2_A2, which represents portion of non-coding sequence of Mycobacterium tuberculosis (refer to course content) (5 pts)

b) Use zero order Markov models defined for sequence1_A2 and sequence2_A2 and apply Viterbi algorithm to find the most likely path for sequence CGCGTTCATTCAATG in frame 1 only

(45 pts)

Assume: Initial transition probabilities

a0c= a0n =0.5 ann= anc =0.5

acc =0.55 acn= 0.45

where, aij is transition probability, c- coding, n-non-coding USE COMPLEMENTARY EXCELL FILE TO FILL IN YOU VITERBI RECURSION. Check out the

comments in cells D2, D6, F2, and F6. Note that this problem is for exercise purposes. As a result for this short sequence you may

observe even shorter coding/noncoding regions.