Assignment title: Information


Lab: HBase & Hadoop 1) Create a table called randomRows in the Test database in MySQL using the following command: CREATE TABLE randomRows (Fld1 INTEGER PRIMARY KEY, Fld2 CHAR(1), Fld3 VARCHAR(50), Fld4 INTEGER, Fld5 VARCHAR(255), Fld6 VARCHAR(100)); 2) Use the gendataCA.sh shell script to generate 10000 lines of sample data in a file called randomRows.csv 3) Use the mysqlimport command to load the sample data into the randomRows table: mysqlimport --fields-terminated-by=, --columns='Fld1,Fld2,Fld3,Fld4,Fld5,Fld6' --local -u root -p Test /path/to/csvfile/randomRows.csv 4) Launch a HBase shell and create a table called randomRows with a single column family called cf1 . 5) Use Sqoop to load the MySQL randomRows table data into the HBase randomRows table. 6) Create a new Java Project in Eclipse called HBaseSummarisation. a) Reference the HBase jars that are in /usr/local/hbase/lib b) Create relevant Driver, Mapper, and Reducer classes (see Moodle) for two programming tasks that Count the frequency of Fld2 chars in the HBase randomRows table. The first program should output results to HDFS. The second program should output to another HBase table called randomRowsCharCount that has been set up with a single column family called cf1 . 7) Run both programs and cross reference the result sets. Check the corresponding result set in MySQL.