Assignment title: Information
Lab: HBase & Hadoop
1) Create a table called randomRows in the Test database in MySQL using the following command:
CREATE TABLE randomRows (Fld1 INTEGER PRIMARY KEY, Fld2 CHAR(1), Fld3
VARCHAR(50), Fld4 INTEGER, Fld5 VARCHAR(255), Fld6 VARCHAR(100));
2) Use the gendataCA.sh shell script to generate 10000 lines of sample data in a file called
randomRows.csv
3) Use the mysqlimport command to load the sample data into the randomRows table:
mysqlimport --fields-terminated-by=, --columns='Fld1,Fld2,Fld3,Fld4,Fld5,Fld6' --local -u root
-p Test /path/to/csvfile/randomRows.csv
4) Launch a HBase shell and create a table called randomRows with a single column family called
cf1 .
5) Use Sqoop to load the MySQL randomRows table data into the HBase randomRows table.
6) Create a new Java Project in Eclipse called HBaseSummarisation.
a) Reference the HBase jars that are in /usr/local/hbase/lib
b) Create relevant Driver, Mapper, and Reducer classes (see Moodle) for two programming
tasks that Count the frequency of Fld2 chars in the HBase randomRows table. The first
program should output results to HDFS. The second program should output to another
HBase table called randomRowsCharCount that has been set up with a single column
family called cf1 .
7) Run both programs and cross reference the result sets. Check the corresponding result set in
MySQL.