Assignment title: Information


The key frameworks and concepts covered in modules 1–5 are particularly relevant for this assignment. Assignment 2 relates to the specific course learning objectives 1, 2 and 4 and associated MBA program learning goals and skills: Global Content, Problem solving, Critical thinking, and Written Communication at level 3: 1. demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes 2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems 4. demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed. Assignment 2 consists of three main tasks and a number of sub tasks Task 1 (Worth 40 marks) consists of the following sub tasks The sinking of the Titanic is a famous event. You may find it useful to research the facts surrounding the sinking of the Titanic to inform your understanding of the problem and ensuing interpretation of your data analysis of the factors determining the survival of passengers on the Titanic. Use the data mining tool RapidMiner to conduct an exploratory analysis of the titanic_train.csv data set which is provided on the course study desk Assignment 2 folder link and then build a simple predictive model of Survival on the Titanic using a Decision Tree. a) You need to identify five key variables that contribute most to determining the survival rate of passengers on the ill-fated Titanic on its maiden voyage. Note you should also refer to the data dictionary provided with the titanic3_train.csv file which describes each of the variables and their range of values. (Hint: an exploratory analysis should be based on summary statistics, histograms, crosstab tables and scatterplots of individual variables and the relationship between individual variables and the target variable survived. Which variables are correlated with target variable survived and other variables?) You might also need to consider reformatting some of variables to facilitate the next stage of analysis of the titanic3._train.csv and titanic3_score.csv data sets using a Decision Tree (Hint: you will need to convert the survival variable to nominal variable with the values Yes = 1, No = 0 in titanic_train.csv). See Data Mining for the Masses Chapters 3 and 4 for guidance in Exploratory Data Analysis using RapidMiner. Discuss each of your five top predictor variables and the results of your exploratory data analysis in general using the RapidMiner data mining tool as well as how you dealt with missing data and unusual data informed by relevant supporting literature on the survival rate of passengers on the Titanic. Your discussion should also include appropriate statistical analysis results such as graphs and results tables from conducting an exploratory data analysis in the RapidMiner data mining tool with some supporting references on predictive model building and interpretation using Decision Trees in data mining (about 600 words). The following table lists the data dictionary for the data set titanic_train.csv. (Note: titanic_score.csv is the same as titanic_train.csv but does not contain any values for target variable survived which is referred to as a label variable in Rapidminer).