Assignment title: Information


CSE5BIO Bioinformatics Technologies Assignment Two (Presentations will be Week 11, 17th May and Week 12, 24th May in Lecture and Practical Time) Assignment Due Thursday 9:30am 26th May 2016 20% of Total Subject Grade Plagiarism Note: As part of this evaluation, student assignments may be submitted to turnitin for checking. It should be noted that this software not only checks for students copying from another student, but also checks for copying from web sites. Any plagiarism detected by the turnitin software will be pursued according the Faculty's regulations for plagiarism. Part I 1) Complete Biological Central Dogma (5 Marks) Zika sequence containing a gene 1 ctgttgctgc ttcagactgc gacagttcga gtttgaagcg aaagctagca acagtatcaa 61 caggttttat ttggatttgg aaacgagagt ttctggtcat gaaaaaccca aaaaagaaat 121 ccggaggatt ccggattgtc aatatgctaa aacgcggagt agcccgtgtg agcccctttg 181 ggggcttgaa gaggctgcca gccggacttc tgctgggtca tgggcccatc aggatggtct 241 tggcgattct agcctttttg agattcacgg caatcaagcc atcactgggt ctcatcaata 301 gatggggttc agtggggaaa aaagaggcta tggaaataat aaagaagttc aagaaagatc 361 tggctgccat gctgagaata atcaatgcta ggaaggagaa gaagagacga ggcgcagaaa 421 ctagtgtcgg aattgttggc ctcctgctga ccacagctat ggcagcggag gtcactagac 481 gtgggagtgc atactatatg tacttggaca gaaacgatgc tggggaggcc atatcttttc 541 caaccacatt ggggatgaat aagtgttata tacagatcat ggatcttgga cacatgtgtg 601 atgccaccat gagctatgaa tgccctatgc tggatgaggg ggtggaacca gatgacgtcg 661 attgttggtg caacacgacg tcaacttggg ttgtgtacgg aacctgccat cacaaaaaag 721 gtgaagcacg gagatctaga agagccgtga cgctcccctc ccattccact aggaagctgc 781 aaacgcggtc gcaaacctgg ttggaatcaa gagaatacac aaagcacttg attagagtcg 841 aaaattggat attcaggaac cctggtttcg ctttagcagc agctgccatc gcgtggcttt 901 tgggaagctc aacgagccaa aaagtcatat acttggtcat gatactgctg attgccccgg 961 catacagcat caggtgcata ggagtcagca atagggactt tgtggaaggt atgtcaggtg 1021 ggacttgggt tgatgttgtc ttggaacatg gaggttgtgt caccgtaatg gcacaggaca 1081 aaccgactgt cgacatagag ctggttacaa caacagtcag caacatggcg gaggtaagat 1141 cctactgcta tgaggcatca atatcagaca tggcttcgcc cagccgctgc ccaacacaag 1201 ccgctgccta ccttgacaag caatcagaca ctcaatatgt ctgcaaaaga acgttagtgg 1261 acagcgactg gggtt a) There exists a gene in this sequence, use the translate tool at http://au.expasy.org/tools/dna.html to find the most likely protein product. i. Paste the protein sequence into your assignment and indicate which frame you used. ii. Why are there 6 possible frames? Hint: the longest sequence Met -> Stop will be the correct outcome. b) Using the translator you have found the start and end of the gene, trim the nucleotide sequence to show just the coding sequence i. How many nucleotides where removed from the front or end of the above sequence? c) Use the Nucleotide sequence, transcribe the first 21 nucleotides into RNA, and then translate that into amino acids. d) Paste the nucleotide sequence of the whole gene into NCBI Blast and perform a BLASTX search against SwissProt database. i. What organism was my sequence most closely matching? ii. Give a brief summary on the function of the closely matching protein? 2) Biological Sequence alignment. (5 marks) a) Calculate the alignment score for different alignments and indicate the best possible aligned sequence for the following sequence: ACCTAGCTAGCCGAT ACCCCTAGGCGAAA Use Match: +1, Mismatch:-1 and Indel: -2 i. Show all possible alignments and their scores ii. Which is the best possible alignment? State the reason. b) Carry out a Multiple Sequence Alignment (MSA) for the protein Ras 1. Use the UniProt (http://www.uniprot.org/) database for retrieving the protein sequence of keratin type2. The protein sequences from the following organisms must be included in your MSA. 1) Homo sapiens (Human) 2) Drosophila melanogaster (Fruit Fly) 3) Candida albicans (Yeast) 4) Mus musculus (Mouse) 5) Gallus gallus (Chicken) Notes: Use only the reviewed sequences from the database. This can be done by selecting 'Reviewed' under the heading 'filter by' at the left hand corner of the results page in UniProt. Use Clustal Omega for MSA (http://www.ebi.ac.uk/Tools/msa/clustalo/) i. Provide the FASTA sequences retrieved in your report (5 sequences retrieved from UniProt). ii. Show the coloured alignment obtained from ClustalO in your report. iii. What do you think about the alignment? Give a brief discussion on the alignment results. iv. Display the phylogenetic tree and discuss the relationship. 3) Hidden Markov Model (5 marks) Consider the following HMM, which models an intronic sequence with GC rich regions. The model consists of 9 states: The states and transition probabilities are indicated in the following diagram: Fig. 1. The states and transition probabilities. The emission probabilities for the non-silent states are as follows: Base S1 S2 S3 S4 S5 S6 S7 A 0 0 0.35 0.50 0.50 0 0 C 0 0 0.25 0.18 0.44 1 1 G 1 0 0.30 0.19 0.33 0 0 T 0 1 0.10 0.13 0.03 0 0 Please answer the following questions: a) What is the probability of observing GTGGTA along the state path p =START-S1-S2- S4-S5--S7 -END? b) What is the probability of seeing GTCGC, given the current HMM? (Show the steps of your calculation.) c) What is the probability of seeing GCTCGT, given the current HMM? State your observation. 4) Microarray (2 marks) a) Explain why some of the dots were red, some yellow, and some green in Fig. 2 b) Find gene expression using the MagicTool. You can either download microarray data in the form of tiff files from any website or use the use the data provided. A website for downloading TIF dataset for example: http://www.bio.davidson.edu/projects/magic/magic.html 5) Next Generation sequencing (3 marks) Using Galaxy find out the highest number of SNPs in chromosome 9. a) Display the results acquired after sorting, showing the SNP count and state the highest number of SNP per exon. b) Select top 5 exons with highest number of SNPs and display the result. c) Also provide screenshots of the whole process as shown in tutorial. Part II: Presentation of a given research topic (6 marks for slide content + 4 marks for innovative solution + 5 marks for presentation +5 mark for group work) Groups of three, presentation 8 minutes' length to be presented in week 11 & 12 Lectures and Practicals time, 17th and 24th May, 2016. Students have to attend whole presentation sessions.  Choose ONE of the 8 research topics given below.  Each topic is allocated to ONLY ONE group on a first come first serve basis.  In case your topic is chosen by other group prior to you, you will be asked to choose another topic.  You MUST email your group members and topic by 3rd May 2016 to [email protected] LIST OF 10 TOPICS 1. Next generation sequencing data analysis –Current development. 2. Personalised medicine and Bioinformatics 3. Role of bioinformatics in Drug Design, Discovery and Development 4. RNA secondary structure prediction using Bioinformatics tools. 5. Translational bioinformatics. 6. Select one of the recent hot topics in proteomics. 7. Current challenges in Bioinformatics. 8. Insights into disease using Bioinformatics. Submission requirements: An electronic submission of your answers (*.doc or *.docx). An electronic copy of your presentation (*.PPT). Oral Presentation in weeks 11 and 12 of semester.