Assignment title: Information
Consider the following data set below which represents the assessment results of approximately 50 students in a subject consisting of three assignments (Assignment 1, Assignment 2 and Assignment 3) and a Final Exam. ?,94,34,42 35,94,85,45 31,46,22,48 46,90,60,50 52,94,49,50 58,94,30,51 47,90,?,52 37,94,25,52 35,94,45,54 57,94,100,54 51,94,5,54 45,94,33,55 44,0,35,55 52,95,56,56 35,94,?,57 57,97,57,57 45,90,71,57 39,94,54,57 31,94,63,57 45,94,?,59 35,90,84,59 37,90,40,61 83,97,26,61 68,97,55,62 50,95,56,62 77,93,?,63 84,48,18,63 45,90,21,63 62,95,38,63 38,94,40,64 50,90,?,64 32,90,38,64 44,90,43,65 57,94,52,68 50,94,39,70 55,90,62,71 43,94,54,72 50,90,30,74 54,90,82,77 64,95,5,78 85,95,?,79 63,90,62,82 75,90,35,83 85,97,39,84 77,95,79,84 79,94,35,86 86,98,57,87 71,90,9,89 45,94,72,90 90,94,68,92 89,94,53,93 90,98,79,98 57,92,40,? 36,94,54,22 a) (10 marks) Use a text editor to create an ARFF file for this dataset and open the ARFF in WEKA. b) (5 marks). Observe the summary data for the data set and the histograms for all attributes on the Preprocess tab page. Use the Visualize tab page to view the scatter plots between the variables of the data sets. Put a screenshot of the tab in your assignment and make a remark on the data. c) (5 marks) Apply the unsupervised Discretize filter to the exam marks. Put a screenshot of the filter output in your assignment and make a remark on the data. d) (5 marks) Practice filling in the missing values in Weka both manually in the Viewer window and by using filters. Put a screenshot of the filter output in your assignment, and what values are suggested by WEKA for the missing values? Task 2 (20 marks) In Weka, load the data set from soybean.arff. Perform classifications using the following methods OneR (10 marks) NaiveBayesSimple (10 marks) For each method give a summary of the rules produced and comment on the accuracy of the rules. Task 3 (30 marks) Perform decision tree induction in WEKA on the glass.arff data set using J48 (10 marks) NBTree (10 marks) REPTree (10 marks) For each method give a summary of the tree and rules produced and comment on the accuracy of the rules.