Assignment title: Information
32130 Assignment 1 1
32130 Fundamentals of Data Analytics
Assignment 1: the data analytics consultant
Due date Thursday 13 April 2017 11:59PM
Marks Out of 100, weighted to 35% of your final mark.
Submission format Adobe PDF (preferable) or MS Word Doc.
Filename ida_a1_xxxxxxxx.pdf or ida_a1_xxxxxxxx.doc where
xxxxxxxx is your student id.
Report format Ten pages maximum with the information and sections
described below. Use 11 or 12 point Times or Arial fonts.
Submit to UTS Online assignment submission button.
Please, make sure to call the filename as described
above and make sure you put your name and student
ID in the report.
In this assignment you need to develop a project proposal that would use data
analytics methods to address a problem for a company. Choose one of the areas
from the table below. Formulate a specific business problem in that area and
make up the specific details of the company yourself.
You will also give a 3-minute pitch for your project proposal. Please submit
this as a link to a YouTube video or similar.
This assignment is individual work.
The project proposal is oriented towards a client to provide funding to support
such projects in science, business and technology. The funding is issued on a
competitive basis, so the aim of your proposal is to convince the client to fund
your project. A good proposal communicates the importance of the problem,
makes a strong case that the proposer (i.e. you) knows how to go about solving
the problem and leaves the impression that you would be successful if you were
given the money.
The project proposal is limited to 10 pages and should include the following:
• Project title
Give a title that describes what the project seeks to do.
• Your name and student ID
Remember to provide these so that I know who to give the marks to!32130 Assignment 1 2
• Section 1: Aims, objectives and possible outcomes.
Provide a clear statement of the aims and objectives of the data analytics
study and the possible outcomes in terms of discovered knowledge and its
potential application towards solution of the problem. In this section you
need to discuss the business problem.
• Section 2: Background.
In this section you should include the background information to the
problem, including the approaches that have been used so far by other
researchers. You will need to do some research into how other people have
tried to solve the problem. This section should demonstrate to the client that
you have a clear picture of what is happening in the field and how similar
problems have been approached so far. It is even better if you can point out
deficiencies in how others have tried to solve the problem and link that to
your proposal. Do not forget to refer to the sources of the information that
you have used in your References section.
• Section 3: Data analytics scenario and methodology.
This section should take into account the CRISP-DM methodology. Here you
discuss the data analytics problem you have formulated from the business
problem. In this section you should:
o formulate the problem as a data mining problem and identify the data
analytics tasks;
o formulate the data collection and organisation strategy (what kind of
data, how to record it, format(s) in which it is preserved, integration
issues, and, if applicable, changes in current data collection and
organisation strategies) relevant to the objectives and the possible
outcomes of the project;
o briefly discuss some of the data mining method(s) that might be used;
o briefly consider how the results will be evaluated with respect to the
project objectives;
o briefly consider how to deploy the results into the business.
Your proposal will benefit if you include examples of data, similar to the one
that you plan to collect. Include also examples of the results that the data
mining methods produce from these data, illustrating their applicability to
the problem. You may illustrate your proposal with examples of what you can
get out of the tools for the type of data that you address - if that is done
correctly then it will definitely convince the client that you know what you
are talking about.
• Section 4: Plan and timetable.
In this section you should provide details about the plan to run the project,
including a timeline and a budget that has the project completed (if necessary
consider possible contingencies).32130 Assignment 1 3
• References
List the references for information used in the background section in Harvard
format.
Problem Areas
Problem Area Description
Biomedical and
DNA data analysis
Client: Cancer Research Centre, which produces
microarray DNA data, SNP data and possesses also
clinical records of the patients from whom the DNA
samples were taken.
Current data source: Microarray data, patients' clinical
data, patients' demographic database.
Problem: Identify particular gene sequence patterns
that play a key role in cancer diseases.
Brief Background: An important focus in medical
research, particularly for cancer, is the study of DNA
sequences since such sequences form the foundation of
the genetic codes of all living organisms. A gene is
comprised of hundreds of individual nucleotides
arranged in particular order. There are almost an
unlimited number of ways that nucleotides can be
ordered and sequenced to form distinct genes. Since
many interesting sequential pattern analysis and
similarity search techniques have been developed in
data mining, the biologists and medical researchers in
the Cancer Research Centre expect data analytics to be
able to contribute to the identification of co-occurring
gene sequences and to link genes to different stages of
disease development.
Detecting financial
fraud
Client: A large Australian bank, which collects relatively
complete, reliable and high quality data.
Current data source: Distributed databases, which have
data about business and individual customer
transactions (including ATM transactions), credit (such
as business, mortgage, and car loans) and investment
services (such as mutual funds, stock investment).
Problem: Detect fraudulent activities.
Brief Background: One of the steps in detecting money
laundering and other financial crimes, is to integrate
information from different databases (like bank
transaction databases, federal or state police databases, 32130 Assignment 1 4
even criminal library databases). Data analytics and data
mining can identify important relations and patterns of
activities and help financial investigators to focus on
suspicious cases for further detailed examination. Most
likely the project will require the use of a broad range of
data mining tools that will operate over different data.
Developing
financial products
Client: National financial institution, which has collected
data over the past three decades.
Current data source: Historical data about loans,
customers (including income levels, education level,
residence region, credit history, etc.), loan packages and
their performance.
Problem: Develop novel financial products that will be
attractive to a broad range of customers.
Brief Background: Customer profiles, including
customer credit analysis and loan payment prediction,
are critical to the business of a financial institution. On
the other hand, in a competitive market, loan packages
have to offer features markedly different, perhaps
targeting specific customer segments. Data mining
methods may help to identify potential customer
segments, important factors that may influence the
selection of a loan package, and eliminate irrelevant
factors. Based on the results, the financial institution
may then decide to adjust packages, change its loangranting policy so as to grant loans to those whose
application was previously denied for particular
package, but whose profile, derived from the patterns,
discovered in the data, shows relatively low risks under
specific conditions.
Sales and marketing Client: Large retail company, which collects huge
amounts of data on sales and customer shopping history
(through a loyalty card scheme).
Current data source: Data about transactions,
demographic information for customers with a loyalty
card.
Problem: Identify and support loyal customers.
Brief Background: Customer loyalty and purchase
trends can be analysed in a systematic way. Goods
purchased at different periods by the same customers
can be grouped into sequences. Methods of sequential 32130 Assignment 1 5
pattern mining can then be used to investigate changes
in customer consumption or loyalty, and suggest
adjustments on the pricing and variety of goods in order
to help retain customers and attract new ones.
Detecting
telecommunications
fraud
Client: Large telecom company
Current data source: Multidimensional data
(dimensions, such as calling time, duration, location of
caller, location of callee, type of call, etc.).
Problem: Identify typical patterns of fraudulent activity
and identify other unusual behaviour patterns.
Brief Background: Fraudulent activity costs the
telecommunication industry millions of dollars a year. It
is important to identify potentially fraudulent users and
their usage patterns; detect attempts to gain fraudulent
entry to customer accounts; and to discover unusual
patterns that may need special attention, such as busyhour, frustrated call attempts, switch and route
congestion patterns, and periodic calls from automatic
dial-up equipment, like computer logins and logouts,
that differ from typical patterns of such calls. The
expectation is that cluster analysis and outlier analysis
may do a good job.
Sales and marketing
in
telecommunications
industry
Client: Large telecom company
Current data source: Multidimensional data
(dimensions, such as calling time, duration, location of
caller, location of callee, type of call, etc.).
Problem: Identify successful aggregations (package
deals) of telecommunication services.
Brief Background: The telecommunication industry has
quickly evolved from offering local and long-distance
telephone services to providing many other
comprehensive communication services including voice,
fax, pager, mobile phone, images, e-mail, computer and
Web data transmission and other data traffic. With the
deregulation of the telecommunication industry and the
development of new computer and communication
technologies, the telecommunication market is rapidly
expanding and highly competitive. Identifying unique
and competitive packages of services is one way to
survive in such market. For example, suppose that you
have discovered that "If a customer in NSW works in a
city different from the residential one (e.g. works in 32130 Assignment 1 6
Sydney and lives in Wollongong), s/he is likely to first
use the long-distance service between the two cities
around 5:30 pm and then to use a mobile phone for at
least 30 minutes in the subsequent hour every weekday.
Further analysis may determine whether this holds for
particular groups of persons (e.g. age group or
profession group) and particular pairs of cities. Then this
can help promote the sales of specific long-distance and
cellular phone combinations (package deals) and
improve the availability of particular services in the
region.
Stock exchange Client: Major stock exchange market
Current data source: Record of all transactions at the
stock exchange, database of financial news, collection of
transcripts of discussions of stocks that are traded at
this stock exchange.
Problem: Identify patterns of insider trading and the
influence of different events on the price of particular
shares.
Brief Background: The development of software
systems that collect data directly from the stock market
led to the collection of enormous amounts of historical
data about the behaviour of different players at the stock
exchange. Moreover significant amount of text data, in
the form of news and transcripts from chat rooms, is
available as complementary data. The idea is whether
data mining methods can be used to utilise these data
sets, discover unusual sequential patterns of behaviour
and connect them with the price variation of particular
stock. Going further - can data analytics methods for
unstructured data be used to discover the influence of
specific events (e.g. a visit of the new Pope, a new
member on the Board of Company Directors, a change of
a CEO, etc.) on the price of particular stocks.
The Pitch
The aim of your 3-minute pitch is to sell the idea to an investor, i.e. the
coordinator and the rest of the class. In 3 minutes you will not be able to give
more than an overview of the most important aspects of the project with the aim
of exciting the investors. Make your pitch as a YouTube video or similar and
submit the link to the video as part of your assignment. The best pitches will be
shown in class.32130 Assignment 1 7
Assessment
This assignment is assessed as individual work. The assessment criteria are:
• Formulation of the business problem in terms of the specific aims, objectives
and potential project outcomes (section 1) -- 20%
• The background to the data analytics project in terms of comprehensiveness
and understanding (section 2) -- 20%;
• Formulation of the data analytics problem and methodology and how well
they connect to the aims, objectives and possible outcomes of the project
(section 3) -- 20%;
• The feasibility of the planned data analytics solution and how well it ensures
that the goals will be achieved. (section 4) -- 20%;
• Quality of the 3 minute pitch: was it within time? does it inspire investment?
did we understand what you were proposing to do?
Relationship to Objectives
This assignment addresses subject objectives 1 and 5.
Return of Assignments
We plan to return marked assignments within 3 weeks of submission. Emails
will be sent when marking is complete.
Academic Standards
All text in your assignment should be paraphrased into your own words and
referenced using the Harvard referencing style. Please refer to the Subject
Outline for details about penalties for Academic Misconduct.
Late Penalties
A late penalty of up to 50% may be applied to submitted work unless prior
arrangements have been made with the subject coordinator. Unless an extension
has been approved, assignments submitted late will incur a penalty of 10% per
calendar day or part thereof up to 5 days after which the assignment will not be
accepted.
Special Consideration
You may apply for special consideration (SC) due to unforeseen circumstances as
described in the subject outline. You must provide documentary evidence to
support your claim, such as a doctor's certificate, a statutory declaration, or a
letter from your employer.
Note
The assignments will be checked through the Turnitin ® Plagiarism Prevention
system, for identifying unoriginal material, copied (without reference to the
source) from an electronic source on the Internet, electronic libraries, other
assignments.
.