Assignment title: Information
Advanced Analytics
Clustering
Submit your report, containing the answers to the questions below
We have 'dental records' on 66 animals. For each animal the following 8 attributes have been recorded:
1. Number of top incisors
2. Number of bottom incisors
3. Number of top canines
4. Number of bottom canines
5. Number of top pre-molars
6. Number of bottom pre-molars
7. Number of top molars
8. Number of bottom molars
Create an arff file for the Dental data provided in Appendix –A
The objective is to cluster the animals into a number of groups with similar dental records, perform the following tasks
1. Use the k-means clustering algorithm for this purpose. a) Complete the following table by filling in the within-cluster sum of squares (also called sum of squared errors or SSE for short) for five different random starts (round to one decimal place). Select cluster mode Use training set in Weka. For each value of k, report the best solution found in the final column.
2) Right-click the best result for k = 3, and select Visualize cluster assignments.
Slide Jitter to its maximum value and enlarge the windowto get a good visualization. Select the following X and Y attributes:
1. X = top-premolars, Y = bottom-premolars
2. X = top-molars, Y = bottom-molars
3. X = bottom-incisors, Y = bottom-canines
4. X = top-incisors, Y = top-canines
Which pair of attributes seems to give the best separation of the clusters? Explain.
3) You can also use Manhattan distance rather than Euclidian distance to find the clusters. Find the best clusters for k=3, using the same random seeds as under a). Do you get the same clustering?
Appendix -A