Assignment title: Management


MSC-IT 521 Homework Assignment 1. (Assigned: Sept 18, 2008, Due: Oct 9, 2008, 3pm, BY EMAIL TO [email protected]) (Please do not send to instructor and TA's email accounts) 1. Cross Validation (25%) A dataset of 1000 cases was used to predict a binary class attribute. Suppose that the dataset was partitioned into a training set of 600 cases and a test data set of 400 cases. A k-Nearest Neighbors model with k=1 had a misclassification error rate of 8% on the test data set. It was subsequently found that the partitioning had been done incorrectly and that 100 cases from the training data set had been accidentally duplicated and had overwritten 100 cases in the test data set. What is the misclassification error rate for the 300 cases that were truly part of the test data? Explain? 2. Data Transformation part I (25%) Min-Max normalization is used to transform values in the range of [5, 10] to [m, 5]. After transformation, the value of 6 is transformed to 4. What is m? 3. Data Transformation part II (25%) Suppose the temperature values and the class values of 7 days of the weather data are given as follows: Temperature (Celsius), Class 25, N 27, P 23, N 24, N 35, P 32, P 33, N Use entropy based discretization on this data set to partition the data into two values: Low and High. What is the appropriate threshold for the partition?4. Probability and Information Theory (25%) Consider again the weather dataset with 14 records on Page 2 of the KNN.ppt lecture notes. a) What is the conditional probability Pr(Outlook=Sunny|Class=P)? b) What is the joint probability Pr(Humidity=normal ,Class = N)? c) What is the entropy of the Windy Attribute? d) What is the mutual information of Windy and Class attribute?