Assignment title: Information
5-minute Presentation on an IT Data Mining challenge 1. Learning objectives This exercise will enable you to - Understand how to apply algorithms, resources and techniques for implementing and evaluating data mining in a practical exercise using the WEKA toolkit; - understand theory and terminology used in WEKA classifiers; - summarise and present your understanding to a peer audience; - reflect on what you have learnt. 2. The task for the IT Data Mining Challenge Workshop A document classifier is used to detect texts about a "subject of interest"; for example, a news story on a specific topic such as Terrorism. Chapter 17 of the Data Mining textbook (Witten et al 2011) presents a number of tutorial exercises for the Weka Explorer. Section 17.5 has some exercises in document classification, including use of the StringToWordVector filter, and experiments with datasets such as ReutersCorn-train.arff - a training set of Reuters news stories, annotated to show news stories about corn; and ReutersCorn-test.arff - a corresponding test file available to use if you want to. Your challenge is: find the best classifier for detecting news stories on a "subject of interest", using the ReutersCorn dataset as a case study example. In your Presentations, you must outline your Methods: what you did to find the best Classifier for the dataset; and your Results: which is the best classifier for the dataset and what evidence you have for this; and Reflection: what did you learn from this exercise? You MUST keep to time: five minutes maximum per pair of students. NOTE: there is no straightforward answer! There will be a Prize for the best presentation: the IT Data Mining Prize. 3. Marking scheme At the IT Data Mining Challenge Workshop, as each pair presents, I will assess: Methods: what you did to find the best Classifier for this dataset (0-6 marks); Results: evidence showing which classifier is best for this dataset (0-6 marks); Reflection: What did you learn from this exercise? (0-6 marks); In addition: Presentation: overall interest and appeal of the presentation (0-2 marks) TOTAL grade: up to 20 marks. 4. References I Witten et al. 2011. Data Mining (3rd edition). Elsevier. Ch.17 Tutorial Exercises for the Weka Explorer: