Assignment title: Information


Data Mining Task 1: Choose one from the following three tasks. 1. Mining association rules over distributed databases Review the popular association-mining algorithms and propose a new algorithm to mine association rules over distributed databases. 2. Mining classification over large databases Review the popular classification-mining algorithms and propose a new algorithm to mine classification over large databases in order to improve efficiency and scalability. 3. Mining clusters over large databases Review the popular cluster-mining algorithms and propose a new algorithm to mine cluster over large databases in order to improve performance (e.g. efficiency, scalability, able to deal with noise and outliers). (5 written reports – each report should be 1 page ) review [8 marks] proposal [5 marks] Task 2: A database in .ARFF format has been provided for you on Studynet. Analyse this database using the WEKA toolkit and tools introduced within this module. Produce a report explaining which tools you used and why, what results you obtained, and what this tells you about the data. Marks will be awarded for: variety of tools used, quality of analysis, and interpretation of the results. An extensive report is not required (at most 4000 words), nor is detailed explanation of the techniques employed, but any graphs or tables produced should be described and analysed in the text. A reasonable report could be achieved by doing a thorough analysis using three techniques. An excellent report would use at least four tools to analyse the dataset, and provide detailed comparisons between the results. You should perform the following steps: 1. Analyse the attributes in the data, and consider their relative importance with respect to the target class. [6 marks] 2. Construct graphs of classification performance against training set size for a range of classifiers taken from those considered in the module. You may need to experiment with different training sets, depending on what you have discovered about the data in step (1). [10 marks] 3. Analyse the data structure/representation generated by at least three classifiers when trained on the complete dataset. What does your analysis tell you about the data set? [7 marks] 4. Combine the results from the previous three steps and all your classifiers to develop a model of why instances fall into particular classes. (Your answer to this question should be understandable by someone who is not a specialist in data mining.) [4 marks] [Total 40 marks]