Regulations governing assessment offences including Plagiarism and Collusion are available from:
http://sitem.herts.ac.uk/secreg/upr/pdf/AS14-Apx3-Assessment%20Offences-v06.0.pdf
School of Computer of Science
ASSIGNMENT BRIEFING SHEET (2016/17 Academic Year) –
ANONYMOUS MARKING
Assignment
Title Coursework Submission Date
Module Title Data Mining Module
Code 7COM1018
Tutor Dr. Na Helian, Dr. Peter Lane
GROUP or
INDIVIDUAL
Assignment
Individual
FOR INDIVIDUAL ASSIGNMENTS – STUDENT TO COMPLETE
By completing BOX A below, I certify that the submitted work is entirely mine and that any material derived or quoted from the
published or unpublished work of other persons has been duly acknowledged. [ref. UPR AS12, section 7 and UPR AS14
(Appendix III)]. )]. I also certify, that any work with human participants has been carried out under an approved ethics
protocol in accordance with UPR RE01.
Please ONLY provide your ID (srn) number as this assignment will be anonymously marked
BOX A
Student ID Number (SRN)
This sheet must be submitted with the assignment, and BOX A filled in.
LATE SUBMISSION WILL ATTRACT A STANDARD LATENESS PENALTY.
1. For undergraduate modules, a score of 40% or above represents a pass mark.
2. For postgraduate modules, a score of 50% or above represents a pass mark.
3. For work submitted up to 5 working days late marked is capped to a bare pass (40% for undergraduate and
50% for postgraduate).
4. For work submitted more than 5 working days a mark of zero will be awarded for the assignment.Regulations governing assessment offences including Plagiarism and Collusion are available from:
http://sitem.herts.ac.uk/secreg/upr/pdf/AS14-Apx3-Assessment%20Offences-v06.0.pdf
School of Computer of Science
ASSIGNMENT BRIEFING SHEET (2016/17 Academic Year) –
ANONYMOUS MARKING
THE ASSIGNMENT TASK:
See attached.
MODULE LEARNING OUTCOMES ASSESSED BY THIS ASSIGNMENT:
Knowledge and understanding
Successful students will typically
- be able to appreciate the strengths and limitations of various data mining models,
- be able critically evaluate, articulate and utilise a range of techniques for designing data mining systems.
Skills and attributes
- be able to critically evaluate different algorithms and models of data mining.
SUBMISSION REQUIREMENTS:
This is assignment is to be submitted and marked anonymously. Students should ONLY use their
student ID number to identify themselves on their work. Work submitted via StudyNet for
anonymous marking will automatically have an anonymity number allocated to it.
StudyNet: YOU SHOULD SUBMIT A SINGLE MSWORD DOCUMENT FILE THROUGH STUDYNET.
Missing items that need to be supplied later will cause the complete work to incur a lateness penalty.
FEEDBACK FROM THIS ASSIGNMENT
Written feedback will be provided via Studynet.
MARKS AWARDED FOR:
Marks will be awards as shown in the Assessment Tasks.
There is no late submission for referred courseworks.
DEADLINES AND ASSIGNMENT WEIGHTINGS
1 This assignment is worth 40% of the overall assessment for this module.
2 You are expected to spend about 20 Hours to complete this assignment to a satisfactory standard
3 Date assignment
set
9 June 2017 Date completed
assignment to be handed in
22 June 2017
4 Target date for return of marked
assignment
20 July 2017Regulations governing assessment offences including Plagiarism and Collusion are available from:
http://sitem.herts.ac.uk/secreg/upr/pdf/AS14-Apx3-Assessment%20Offences-v06.0.pdf
INTERNAL MODERATION
This assignment has been internally moderated.
I confirm:
That the assignment set, meets the requirements
of the module and that the brief provides
appropriate content for students to successfully
complete the assignment.
That the assessment is at an appropriate level
and matches QAA level descriptors and is an
appropriate form of assessment within the total
range of assessments for this module
That the marking scheme is attached and that
students can determine how marks are allocated.
That this assessment can be completed and
marked within University timeframes, and
provides detailed feedback (more than just a
grade) that supports learning.
Moderator name, signature and dateRegulations governing assessment offences including Plagiarism and Collusion are available from:
http://sitem.herts.ac.uk/secreg/upr/pdf/AS14-Apx3-Assessment%20Offences-v06.0.pdf
Data Mining
A dataset in .ARFF format has been provided for you on Studynet. Analyse this dataset using the WEKA
toolkit and tools introduced within this module. Produce a report explaining which tools you used and why,
what results you obtained, and what this tells you about the data. Marks will be awarded for: variety of
tools used, quality of analysis, and interpretation of the results. An extensive report is not required (at
most 4000 words), nor is detailed explanation of the techniques employed, but any graphs or tables
produced should be described and analysed in the text. A reasonable report could be achieved by doing
a thorough analysis using three techniques. An excellent report would use at least four tools to analyse
the dataset, and provide detailed comparisons between the results.
You should perform the following steps:
1. Analyse the attributes in the data, and consider their relative importance with respect to the target
class. You should explain what kind of classifier you believe might be most suitable for this task, given
the information about the attributes alone. [20 marks]
2. Describe in brief the operation of the classification algorithms you intend to use – these algorithms
should be taken from those described in the module. Explain their main characteristics and parameters.
Additionally explain any other algorithms you intend to use (such as to modify the original dataset).
[25 marks]
3. Describe briefly (not with screenshots) the steps you will use in Weka to prepare the data (if
necessary) and run your selected classification algorithms. Construct a table and graph of classification
performance against training set size for the classifiers. What can you conclude from your results?
[25 marks]
4. Analyse the data structure/representation generated by at least three classifiers when trained on the
complete dataset. What does your analysis tell you about the data set? [20 marks]
5. Combine the results from the previous steps and all your classifiers to develop a model of why
instances fall into particular classes. (Your answer to this question should be understandable by
someone who is not a specialist in data mining; imagine you are making a strategic recommendation to
the manager of a company.) [10 marks]
[Total 100 marks]
Description of dataset:
The following describe the numeric attributes. All instances are for women aged at least 21.
Values of 0 in fields like blood pressure represent missing values.
The output class indicates if the woman had diabetes (1) or not (0).
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)