ITECH 2201 Cloud Computing School of Science, Information Technology & Engineering Workbook for Week 6 (Big Data) Please note: All the efforts were taken to ensure the given web links are accessible. However, if they are broken – please use any appropriate video/article and refer them in your answer Part A (4 Marks) Exercise 1: Data Science(1 mark) Read the article at http://datascience.berkeley.edu/about/what-is-data-science/ and answer the following: What is Data Science? ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ According to IBM estimation, what is the percent of the data in the world today that has been created in the past two years? ____________________________________________________________________________ What is the value of petabytestorage? _______________________________________________________________________ For each course, both foundation and advanced, you find at http://datascience.berkeley.edu/academics/curriculum/briefly state (in 2 to 3 lines) what they offer?Based on the given course description as well as from the video. The purpose of this question is to understand the different streams available in Data Science. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Exercise 2: Characteristics of Big Data(2 marks) Read the following research paper from IEEE Xplore Digital Library Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the , pp.1,5, 3-5 April 2014 and answer the following questions: Summarise the motivation of the author (in one paragraph) _______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph. __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Explore the author’s future work by using the reference [4] in the research paper. Summarise your understanding how Big Data can improve the healthcare sector in 300 words. _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ _______________________________________________________________________ Exercise 3: Big Data Platform(1 mark) In order to build a big data platform - one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links: Check the videos and change the wordings − http://www.infochimps.com/infochimps-cloud/how-it-works/ − http://www.youtube.com/watch?v=TfuhuA_uaho − http://www.youtube.com/watch?v=IC6jVRO2Hq4 − http://www.youtube.com/watch?v=2yf_jrBhz5w Please note: You are encouraged to watch all the videos in the series from Oracle. How to acquire big data for enterprises and how it can be used? ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ How to organize and handle the big data? ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ What are the analyses that can be done using big data? ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Part B (4 Marks) Part B answers should be based on well cited article/videos – name the references used in your answer.For more information read the guidelines as given in Assignment 1. Exercise 4: Big Data Products (1 mark) Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large scale data is used effectively in these products. a. Google’s PageRank ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ b. Google’s Spell Checker ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ c. Google’s Flu Trends ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ d. Google’s Trends ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Like Google – Facebook and LinkedIn also uses large scale data effectively. How? ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Exercise 5: Big Data Tools(2 marks) Briefly explain why a traditional relational database (RDBS) is not effectively used to store big data? _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ What is NoSQL Database? ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Name and briefly describe at least 5 NoSQL Databases ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ What is MapReduce and how it works? ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Briefly describe some notable MapReduce products (at least 5) ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Amazon’s S3 service lets to store large chunks of data on an online service. List some 5 features for Amazon’s S3 service. _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Getting the concise, valuable information from a sea of data can be challenging. We need statistical analysis tool to deal with Big Data. Name and describe some (at least 3) statistical analysis tools. _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Exercise 6: Big Data Application (1 mark) Name 3 industries that should use Big Data – justify your claim in 250 words for each industry using proper references. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________