Assignment title: Information
SIT 112 | Data Science Concepts
Lecturer: Dr Duc Thanh Nguyen
[email protected]
Data Science Project
Due: Friday 5pm, 7th October 2016
Note: This project contributes 25% to your final SIT112 mark. It must be completed
individually and submitted before the due date: 5pm, 07/10/2016.
Continuing from Assignment 1, this Data Science Project aims to explore data related to
Australia. In particular, we will use data provided by the Government at http://data.gov.au. Our
data strategy and task specifications for this assignment will focus on the analysis of WiFi Usage
data in Geelong region. While Assignment 1 was designed to work on a small subset of the data
for you to develop introductory skills and knowledge to the data science process, this Project
will require you to work on the full dataset.
1. Data and Resources
In the Data Science Project folder, you will find the following files:
Filename Description
Project_instructions.pdf
geelongwifistats.csv
geelongwifimetadata.txt
project_datadictionary_template.xlsx
project_notebook.ipynb
wifi_visualisation.PNG
This is the file contains the instruction to complete
your project.
This is the full dataset for Geelong WiFi Usage
dataset provided by data.gov.au
This file contains description for attributes in the
data file
This is the template for the data dictionary file in
Excel.
This is the IPython notebook which has been
prepared and pre-filled for you to complete the
programming task.
A screenshot to illustrate the visualization on
Google map.2. Task Description
There are two main tasks for this assignment which constitutes a total of 100 marks:
Construction of the data dictionary (15 marks) and
Programming tasks to perform analytics and visualization tasks (85 marks).
2.1 Construction of the Data Dictionary (15 marks)
A first systematic step to a data science process, as we have learned from the lectures and practical
sessions, is to construct a data dictionary for the dataset.
Similar to Assignment 1, except you are working with a full dataset, your task is to construct a
data dictionary for the dataset you are working with using the provided data dictionary
template.
You are required to prepare two sheets in your data dictionary Excel file:
Dataset description [5 marks]
Attribute description [10 marks]
Name your solution as [YourID]_project_datadictionary.xls and submit this file.
2.2 Programming task (85 marks)
A python note book file project_notebook.ipynb has been prepared for you to complete this
task. Download this notebook, load it up and follow instructions inside the notebook to
complete the task.
The total mark for this task is 85 marks. You are required to submit your solution in an IPython
Notebook format as well as its exported version in html.
3. Summary for submission
This project is to be completed individually and submitted to CloudDeakin. By the due date, you
are required to submit the following files to the corresponding Assignment (Dropbox) in
CloudDeakin:
1. [YourID]_datadictionary.xls: your solution for the data dictionary for the Geelong WiFi
Usage dataset.
2. [YourID]_project_solution.ipynp : your IPython notebook solution source file.
3. [YourID]_project_output.html: the output of your IPython notebook solution in html.
END OF PROJECT DESCRIPTION