MIS772 Predictive Analytics Assignment A2
1 of 4
Assignment A2: SAS Enterprise Miner
After this workshop consisting of sessions in modules M2 and M3 students will understand
how to use SAS Enterprise Miner (SAS EM) to explore data, gain insights into the problem
domain and make predictions based on such insights. The workshop will rely on students’
knowledge of methods and techniques introduced in a series of classes. Note that partial
assignment solution needs to be formally submitted by its own deadline.
In the assignment (as well as on-campus labs) students will work in teams of up to 3
members. They will be given some tasks and use SAS EM to achieve them in groups.
Demonstrations and lab exercises will assist skill development.
Before attending SAS EM workshops, students need to be familiar with class readings and:
Kattamuri S. Sarma (2013): Predictive Modeling with SAS Enterprise Miner:
Practical Solutions for Business Applications, Second Edition. SAS Institute.
Activities – No late arrivals for the on-campus sessions! Topic
1. Learn how to use Deakin AppsOnDemand and SAS Enterprise
Miner, create project and library folders on your home drive.
Before
Workshop
2. The workshop facilitator will explain the case in the focus of
this assignment. Work in groups of up to 3 (1-2-3 but not 4).
M2T1, M2T2
SAS EM
Regression,
Neural Nets,
Decision Trees &
Model
Comparison
3. Learn SAS EM and the role of nodes to read and manipulate
data from CSV files and libraries, clean and transform this data,
produce statistics and charts. Learn to create decision trees,
regression and neural network models. Gain hands-on
experience in model validation and comparison of models’
performance.
4. Explore SAS EM facilities for data exploration and
dimensionality reduction with data clustering. Use Ward’s
hierarchical cluster analysis to determine number of clusters for
k-means clustering. Learn how to profile and validate data
clusters using CCC statistic.
M2T3
Clustering
5. Evaluate the models using bagging, boosting and crossvalidation. Explore gradient boosting, random forests and other
“high performance” data models (HPDM).
M2T4
Cross-Validation
HP Models
6. Learn how to evaluate and compare individual predictive
models. Integrate several predictive models into ensembles.
Conduct validation and testing of ensemble models. Visualise
and interpret the results.
M3T1
Model
Comparison &
Ensembles
7. As a team, prepare a report of your findings using the provided
template. Executive summary should offer interpretation and
justification of results. Your reports should include screen shots
of SAS EM analytic processes, tables and charts produced.
Report and
Executive
Summary
8. Teams have to submit a single submission of teams’ work via
CloudDeakin dropbox (possibly in multiple versions submitted
weekly or daily), Submissions must include team member’s
names, student numbers and the group ID.
Submission
Objectives
Methods
Prerequisites
Workshop
Schedule
Note:
Demos of
workshop
activities are
given in
class and
are video
recorded
Workshop
activities
support all
assignment
deliverablesMIS772 Predictive Analytics Assignment A2
2 of 4
The following mini case study will be used in the assignment A2. The workshop
materials for topics M2T1-M3T1 are presented in separate handouts. All amendments,
extensions and assumptions should be recorded in the final submission.
Business Scenario
An independent online business Best Iowa Buys have setup a members-only service to
predict the likely value of auctioned real-estate. They have collected sample data about
property sales in Ames, Iowa (USA) and asked you to develop an analytics solution,
which they could use to estimate the price of any property. They are also interested in the
property classification in terms of its affordability within its category or group, and of
course its value for money for the potential buyer.
You were given some data in the CSV format. The data consists of over 2,930 records of
properties sold between 2006 and 2010 in Ames, described with 79 variables.
Each record provides description of each property in terms of the property type (house,
unit, apartment or townhouse), the number of storeys, its zoning, lot area and shape,
utilities, its condition, location (suburb), and of course the price.
Note that the case has been adapted from the past Kaggle.com competition.
Assessment Objective
You have been hired as data analyst sub-contracting for the Best Iowa Buys. Your role is
to develop, evaluate and test a predictive model in SAS Enterprise Miner. The company
would also like you to produce a list of the Iowa properties currently advertised for the
auction, each with the estimated price, its affordability and value for money.
Questions
Q1. Describe the business problem and the potential value of the predictive model to the
client. Present an analytic solution to the problem and support your
recommendation with references to the conducted data analytics.
Q2. Explore the sample data using descriptive statistics, frequency plots and cluster
analysis. Specifically identify any missing, anomalous or inconsistent data
characteristics, explaining the potential impact. Perform the necessary treatment or
transformation of data, which may be needed to rectify any data quality issues.
Assignment
Case StudyMIS772 Predictive Analytics Assignment A2
3 of 4
Q3. Perform cluster analysis and segmentation of your data to identify any natural
categories or groups of the Ames properties that could be potentially used to guide
the customer buying choices.
Q4. Develop analytic models to estimate the property price (at least two models), assess
its affordability (at least two models) and value for money (at least two models).
Ensure to incorporate all these model types: a) Regression; b) Decision trees; and,
c) Neural networks. Consider using HPDM models.
Q5. For all models, provide a summary of the model assessment statistics over the
training, validation and test data sets. Consider using cross-validation.
Q6. Compare performance of alternative / competing models. Select the best modelling
options and combine your models to provide the predictive solution to the problem.
Consider using ensembles.
Q7. Undertake research of any additional factors that may have been at play to influence
the Ames property prices in the period of data collection. Suggest and implement
the strategy to include those factors in your predictive model. Evaluate your
strategy. Alternatively, incorporate into your modelling some novel (not previously
taught in MIS772) elements of SAS Enterprise Miner that could significantly
improve the model predictive performance.
Both on-campus and off-campus students will work in teams created for the duration of
the assignment A2. Module M2 and M3 workshops will support the assignment work.
Use the provided template as a way to structure your final report – some deviation from
the format is acceptable, however, the page limit and readability of each section must be
preserved. All SAS Enterprise Miner models in XML format and any data used in this
project must be included in your submission. Teams must submit the assignments via
CloudDeakin assignment box by the indicated deadline. You will be assessed as a team,
with equal share. Ensure that your team’s work is unique. No extensions will be possible.
Weekly contribution of all team members is necessary and must be documented. All
teams, on-campus, off-campus, even those of a single member, must lodge weekly
minutes of meetings to CloudDeakin’s file locker area with a file name “Minutes of
Meeting yy-mm-dd.pdf” (where yy-mm-dd is a date). The post should be in the format:
Date and Time: when the meeting took place
Location: where the meeting took place (either virtual or face-to-face)
Attendance: names of all present team members and any apologies from others
Work Review: Status of all tasks allocated to all individual team members
Issues: Discussion of problems and suggested plans of actions, especially those to
do with the lack of progress, incompletion of tasks and rescheduling of tasks,
screen shots of diagrams, charts, tables and results discussed
Task Allocation: a list of new and rescheduled tasks against the names of
responsible team members, with due dates clearly identified
Next Meeting: Date and time of the next meeting
Failure to lodge meaningful minutes of meetings on two consecutive weeks will result in
automatic deduction of 20% of the assignment marks. Failure to contribute to the team
effort, as evident from weekly reports, will result in expulsion from the team.
On-Campus/
Off-Campus
Submission
Weekly
MeetingsMIS772 Predictive Analytics Assignment A2
4 of 4
The work will be assessed based on the competency criteria explained in lecture 1. Note
the column called “Unacceptable”, which indicates that the assignment should be done
along the analytic process and its competency levels. Do not commence the more
advanced levels without first meeting the lower levels’ expectations (no points given).
Data files, all SAS EM models in XML and data files must be supplied to (easily) reproduce reported results.
Competency
Assessment
Criteria
Very
Important:
Read this
very
carefully!
Exceptional /
Extensions and Research
Meets Expectations – Focus on These /
Based on Unit Teaching
Unacceptable
10 Half a page limit 0
Prepare Exec
Summary
Clearly identify what kind of
decisions are to be supported
by the analytic solution and
what types of actions can be
recommended by the system.
Succinctly state a business problem (or
question) and specify requirements for its
solution in terms of insights to be generated.
Not provided or
incomprehensible.
Solution not justified.
No references to the
rest of the report
(numbered figures).
Succinctly describe the results (answer or
solution) and justify. Provide references to the
supporting evidence, e.g. charts and plots.
30 One page limit 0
Prepare
Data
Clean up, transform and filter
your data as needed. Ensure
that your predictors include
both numerical (interval) and
categorical (nominal) variables.
Define all (3) targets and select predictors.
Justify your definition and the methods of
creating / adopting the nominal vars. Explore
and understand selected variables using variety
of charts. Report the important insights.
XML and data files attached.
Not meeting
expectations. Above
steps unacceptable.
Missing or messy
SAS models. Over the
page limit.
10 One page limit 0
Discover
Relationships
Include performance analysis of
your clustering activities.
Perform cluster analysis and segmentation of
your data to identify data categories. Explore,
visualise and understand vars relationships.
Justify the selection of your predictors to build
each of the models and an analytic solution.
XML and data files attached.
Not meeting
expectations. Above
steps unacceptable.
Missing or messy
SAS models. Over the
page limit.
10 One page limit 0
Create
Models
Include the results of your
cluster analysis in the predictive
models. Use the selected
HPDM models.
Develop a number of analytic models to predict
the property price (at least 2 models), its
affordability (at least 2 models) and value for
money (at least 2 models). Use all these model
types: a) Regression; b) Decision trees;
c) Neural networks.
XML and data files attached.
Not meeting
expectations. Above
steps unacceptable.
Missing or messy
SAS models. Over the
page limit.
15 One page limit 0
Evaluate &
Improve
Apply the most suitable crossvalidation methods to your
models.
Validate and test the models for their ability to
predict target values; evaluate the models’
performance. Visualise, interpret and report the
results of your performance testing.
XML and data files attached.
Not meeting
expectations. Above
steps unacceptable.
Missing or messy
models. Page limit.
10 One page limit 0
Provide
Solution
Create ensemble models where
appropriate. Evaluate the final
model using cross-validation,
bagging or boosting, plot and
interpret the overall model
performance.
Integrate all analytic elements into a process
that could be used by the client to solve the WB
problem, i.e. to read and transform data, create
and validate the model, produce visualisations,
tables and reports. Write the final report.
XML and data files attached.
Not meeting
expectations. Above
steps unacceptable.
Missing or messy
SAS models. Over the
page limit.
15 One page limit 0
Research &
Extend
Wow factor. Report new and
surprising insights. Deliver
professional quality. Conduct
independent research to verify
your results.
Extend your work with features well beyond
what was covered in class, to improve the
model and to present its results in the best way.
XML and data files attached.
Not meeting
expectations. Above
steps unacceptable.
Missing or messy
models. Page limit.