Assignment title: Information
MIS772 Predictive Analytics Assignment A2
1 of 4
Assignment A2: SAS Enterprise Miner
After this workshop consisting of sessions in modules M2 and M3 students will understand
how to use SAS Enterprise Miner (SAS EM) to explore data, gain insights into the problem
domain and make predictions based on such insights. The workshop will rely on students'
knowledge of methods and techniques introduced in a series of seminar.
During the workshop (on-campus and on-cloud) students will work in teams of up to 3
members. They'll be given some tasks and in groups will use SAS EM to achieve them.
Students arriving late will work on their own!
Before attending this tutorial, students are required to be familiar with:
Kattamuri S. Sarma (2013): Predictive Modeling with SAS Enterprise Miner:
Practical Solutions for Business Applications, Second Edition. SAS Institute.
Activities – No late arrivals for the on-campus sessions! Topic
1. Learn how to use Deakin AppsOnDemand and SAS Enterprise
Miner, create project and library folders on your home drive.
Before
Workshop
2. The workshop facilitator will explain the case in the focus of
this assignment. Work in groups of up to 3.
M2T1, M2T2
SAS EM
Decision Trees &
Model
Comparison
3. Learn SAS EM and the role of nodes to read and manipulate
data from libraries, clean and transform this data, produce
statistics and charts. Learn to create decision trees, regression
and neural network models. Gain hands-on experience in
model validation and comparison of models' performance.
4. Explore SAS EM facilities for data exploration and
dimensionality reduction with data clustering. Use Ward's of
hierarchical cluster analysis to determine number of clusters for
k-means clustering. Learn how to profile and validate data
clusters using CCC statistic.
M2T3
Clustering
5. Learn SAS EM methods of text analytics. Explore aspects of
text parsing, using of stop rules, different ways of representing
and analysing text. Filter text to reduce its complexity. Create
text clusters and topics. Visualise results. Conduct customer
propensity analysis. Note that topic and rule formation in text
analytics follow the principles of association rule creation.
M3T1-M3T2
Text
Analytics,
Association &
Sequence
6. Learn how to evaluate and compare individual predictive
models. Deploy large models into production. Integrate several
predictive models into ensembles. Conduct validation and
testing of ensemble models. Visualise and interpret the results.
M3T3
Model
Comparison &
Ensembles
7. As a team, prepare a report of your findings using the provided
template. Executive summary should offer interpretation and
justification of results. Your forms should include screen shots
of SAS EM analytic processes, tables and charts produced.
Report and
Executive
Summary
8. Teams have to submit a single submission of teams' work via
CloudDeakin dropbox (possibly in multiple versions submitted
weekly or daily), Submissions must include team member's
names, student numbers and the group ID.
Submission
Objectives
Methods
Prerequisites
Workshop
Schedule
MIS772 Predictive Analytics Assignment A2
2 of 4
The following mini case study will be used in assignment A2. The workshop material for
topics M2T1-M3T3 is presented in a separate handout. All amendments, extensions and
assumptions should be recorded in the final submission.
Business Scenario
A marketing company has been commissioned by a number of popular airlines to
understand customer satisfaction and feedback. The data set airlines.sas7bdat and
airlines.csv (see CloudDeakin) contains responses from a survey evaluating customer
satisfaction with their airline travels. The data set contains 1,474 observations and 11
variables. The metadata is given in the table below.
The airlines would like to know based on the customer survey and feedback whether the
customers would recommend their airline or not and what is their perception of value for
money. In particular, they are interested in incorporating the unstructured variable
Review into any predictive modelling, as they are of the opinion there is a lot of
meaningful information in there.
Assessment Objective
As a data scientist for the marketing company, your role is to determine the propensity for
a particular customer to recommend the airline they travelled with. The airlines would
also like a list of their customers with a probability score that they will recommend their
airline. They are also interested in improving the quality of their services and the likely
impact of issues that may develop in the company logistics and during the flight.
Questions
Q1. Describe the business problem and the potential value of the predictive model to the
client. Propose an analytic solution to the problem and support your
recommendation with references to the conducted data and text analytics.
Q2. Explore the sample data using descriptive statistics, frequency plots and cluster
analysis. Specifically identify any missing, anomalous or inconsistent data
characteristics, explaining the potential impact.
Q3. Describe any treatments or transformations undertaken to resolve, missing,
anomalous or inconsistent data characteristics.
Assignment
Case Study
MIS772 Predictive Analytics Assignment A2
3 of 4
Q4. Perform text analytics on the "Review" data item, generating at least 5 topic
clusters. Provide a description for each of the clusters generated.
Q5. Develop at least three analytic models to predict whether or not the customers are
likely to recommend the airline services and their perception of value for money,
for each of the following combinations of input characteristics:
a. Using only the structured data (using appropriate columns)
b. Using only the text data (using only the generated text topics and clusters)
c. Using both structured and text data
Q6. For all models provide a summary of the model assessment statistics over the
training and validation data sets.
Q7. Select the best predictive model, possibly an ensemble model, and provide a
summary of the model and its performance.
Both on-campus and off-campus students will work in teams created for the duration of
assignment A2. Workshops will support the assignment work. Use forms provided as a
template – deviation from the format is acceptable, however, the page limit and
readability of each section must be preserved. Teams must submit the assignments via
CloudDeakin dropbox end of trimester by the indicated deadline. You will be assessed as
a team, with equal share. Ensure that your team's work is unique. As this is a team's
effort, no extensions will be possible.
Weekly contribution of all team members is necessary and must be documented. All
teams, whether on-campus or off-campus must lodge weekly minutes of meetings to
CloudDeakin's discussion area with a prominent title "Minutes of Meeting 1 May 2016",
for example. The post should include the following information:
Date and time of the meeting
Location, either virtual or face-to-face
Attending team members and apologies from others
Review of tasks allocated to individual team members
Issues discussed and actions taken, especially issues to do with the lack of
progress, incompletion of tasks and rescheduling of tasks, screen shots of
diagrams, charts, tables and results discussed
Allocation of new tasks and rescheduled tasks and team members' responsibilities
and due dates for their completion
Date and time of the next meeting
Your lecturer will acknowledge the team's weekly reports, note the lack of progress and
respond to the reported issues. It is important that all team members keep in touch and
actively communicate with their teams and complete the assigned tasks on time.
Failure to contribute to the team effort or failure to lodge weekly reports may result in the
expulsion from the team or splitting the team, which can only be initiated by the lecturer.
Non-contributing team members will be allocated to a team of one.
Note that there are is no relief or dispensation for any team of less than 3 members, the
deliverables will always be the same.
On-Campus/
Off-Campus
Submission
MIS772 Predictive Analytics Assignment A2
4 of 4
The assessment of the submitted assignment work will use the following rubric.
Note that the solution may fit on one or more EM models, submit an XML file for each.
Assessment
Machine Learning in SAS Enterprise Miner Typical Distribution of Marks
5% of students 20% 30% 20% 20% 3% 2 %
Exceptional Meets Expectations Acceptable Needs
Improvement
Unacceptable
10 8 5 2 0
Exec Summary:
Q1
The executive
summary, its insights
and arguments all fit
on one page.
Exceptional quality
presentation of the
entire report and
included XML files.
The summary is clear
and convincing. Aimed
at the management
reader. All decisions
and recommendations
identifiable. All reported
aspects can be justified
by tracing them back to
data and text analytics.
Summary of findings
and recommendations
based on those
findings provided. All
aspects crossreferenced with tables
and charts.
Few aspects
identified and
briefly described.
Some errors and
omissions.
Aspects not
identified or incomprehensible.
20 16 10 4 0
Data Prep:
Q2 & Q3
Exceptional quality
presentation on one
page. Crisp, short
and to the point.
Showing expert
knowledge of data,
tools and methods.
All data exploration was
supported with charts
and tables, identifying
problems in data, which
were then eliminated.
XML files supplied to
reproduce the results.
The supplied data set
was explored in
preparation for
analysis in EM.
Few aspects
identified and
briefly described.
Some errors and
omissions.
Aspects not
identified or incomprehensible.
Modelling did
not rely on SAS
Enterprise
Miner.
30 24 15 6 0
Text Analytics:
Q4
Exceptional quality
presentation on three
pages. Crisp, short
and to the point.
Showing expert
knowledge of data,
tools and methods.
The model was
optimised to give the
best results.
Cluster analysis was
conducted over the
provided text. The
model was evaluated,
results presented,
explained and justified.
XML files supplied to
reproduce the results.
Partial solution
submitted on time.
The required text
analytics methods
were applied to the
provided data in EM.
Few aspects
identified and
briefly described.
Some errors and
omissions.
Aspects not
identified or incomprehensible.
Modelling did
not rely on SAS
Enterprise
Miner.
20 16 10 4 0
Predictive Models:
Q5 & Q6
Exceptional quality
presentation on three
pages. Crisp, short
and to the point.
Showing expert
knowledge of data,
tools and methods.
The models were
cross-validated and
optimised to give the
best results.
At least three predictive
models were developed
based on text only,
structured data only and
mixed data. Each model
was evaluated, results
presented, explained
and justified.
XML files supplied to
reproduce the results.
At least two predictive
models were
developed based on
text only, and
structured data only.
Both models were
evaluated, results
presented.
Partial models based
on structured data
not submitted on
time.
Few aspects
identified and
briefly described.
Some errors and
omissions.
Aspects not
identified or incomprehensible.
Modelling did
not rely on SAS
Enterprise
Miner.
20 16 10 4 0
Model Comparison:
Q7
Exceptional quality
on two pages.
Models were fully
integrated to provide
a highly cohesive
analytic report.
Ensemble models
used as appropriate.
All developed models
were compared for their
performance, best
models selected and
their results generated
to solve a business
problem.
XML files supplied to
reproduce the results.
All developed models
were compared for
their performance and
results reported.
Few aspects
identified and
briefly described.
Some errors and
omissions.
Aspects not
identified or incomprehensible.
Modelling did
not rely on SAS
Enterprise
Miner.