Assignment title: Information


SUBMIT Two files: 1. A one page single or double spaced, does not matter) Word or PDF file that has the recommendations and a brief outline of the approach you took for a) and b) below. 2. A ZIP file of your RAPIDMINER repository that has your data and process. Whiskey Analytics In Chapter 6 of the book "Data Science for Business" by Provost and Fawcett, there is a reference (page 144) to NYU colleague Foster Provost's desire to find Whiskeys that are similar to Bunnahabhain (he really likes this drink!!). We will use a data science approach to help Professor Provost's friend Professor Johnson. The relevant data and data-dictionary for this are posted below and was originally curated by François-Joseph Lapointe and Pierre Legendre (1994) of the University of Montréal. You will of course use machine learning to do address the issues at hand: a) Clustering (10 points) - Your goal is to suggest a few interesting Whiskies to Professor Johnson whose favorite is the Dalwhinnie.Try both hierarchical and k-means clustering, and then choose one of two methods to find some meaningful clusters of whiskeys that can help business decisions makers gain insights from the Whiskey dataset. Based on the cluster Professor Johnson's favorite whiskey falls in suggest 4-5 other whiskies to him. b) Association rules (10 Points) - Professor Johnson and Professor Provost were overheard having a heated argument around whiskey makers preferences and understanding of the market. Provost claimed that there is a higher than random chance that those drinkers that likes a dry palate and a dry finish also liked a whiskey that was dry on the nose, "and that's why any distiller worth his name in salt would make em' that way." Provost claimed that his Scottish grandmother told him so. You have been hired by Bapna as a well-trained data scientist to verify this claim from actual compositions of whiskey (hint: this time using association rules mining). Please also suggest a few interesting patterns of association that you can discern from that data with respect to the traits/characteristics of Scotch whiskies. c) BONUS (extra credit up to five points) -- See if you can replicate the table below from the book. You only have to worry about the Distance column, not the labels that go with it. (see page 146 of the attached book pages)