Assignment title: Information


Programming for Data Analytics Practical Test 1 Sem. 2, 2015-2016 For this Practical Test you are required to process a set of data using the Virtual Machine Hadoop MapReduce environments that you have established on your laptops. You must complete each of the following tasks: 1) Generate 10,000 lines of random data using the gendata.sh bash script. This script is available for download on Moodle. The script takes two parameters: the first parameter is the number of lines of random data to generate, the second parameter is the name of the file to which the data is written. Each row of output contains the following types of data: Field Type Description Field 1 Integer Row counter Field 2 Date-Time Timestamp of when the row of data was created Field 3 Character Single uppercase character [A-Z] representing the record category Field 4 String A user name string. This field can contain one of three messages. Field 5 String A group name string. This field can contain one of two messages. Field 6 Integer A random integer [0 -255] Field 7 String A string of randomly generated alphanumeric characters 2) MapReduce Java Programming Task 1: Process the data you have generated from step 1) to find the average of Field 6 values grouped by Field 3 values. 3) MapReduce Java Programming Task 2: Process the data you have generated from step 1) to find all distinct combinations of Field 2, Field 4, and Field 5 values. NOTE: i. Create a separate Eclipse project for each of the programming tasks. ii. For each of the programming tasks you should create appropriate classes and methods. You must at a minimum include the following methods: a main driver method, a map method, a reduce method. iii. For your submission you are required to submit a) the source code for each project; b) the result of each analysis; c) the sample data that you processed. iv. You should zip up all the required elements into a single file for submission via the Moodle submission link.