Topics in Big Data Analytics
Hadoop Mahout-KMeans clustering
The objective of the assignment is to gain some experience applying the Hadoop Map Reduce function using Mahout tool for the analysis of 10 News groups into 5 Clusters.
Your Task is to
1. Create 10 news groups with 5 different categories of news such as Political News, Business News, Sports News, Fashion News, and Science & Technology News from any of news groups such as ARAB News.
2. Preprocess the files using Mahout preprocessing techniques.
3. Perform KMeans clustering technique to group the 10 news groups into 5 clusters.
4. Dump the cluster into an output file and display the results of the analysis into a txt file.
5. Submit the report about the entire clustering process performed and the snap shots of the generated clusters.