Assignment title: Information


SIT 112 | Data Science Concepts Lecturer: Dr Duc Thanh Nguyen [email protected] Data Science Project Due: Friday 5pm, 7th October 2016 Note: This project contributes 25% to your final SIT112 mark. It must be completed individually and submitted before the due date: 5pm, 07/10/2016. Continuing from Assignment 1, this Data Science Project aims to explore data related to Australia. In particular, we will use data provided by the Government at http://data.gov.au. Our data strategy and task specifications for this assignment will focus on the analysis of WiFi Usage data in Geelong region. While Assignment 1 was designed to work on a small subset of the data for you to develop introductory skills and knowledge to the data science process, this Project will require you to work on the full dataset. 1. Data and Resources In the Data Science Project folder, you will find the following files: Filename Description Project_instructions.pdf geelongwifistats.csv geelongwifimetadata.txt project_datadictionary_template.xlsx project_notebook.ipynb wifi_visualisation.PNG This is the file contains the instruction to complete your project. This is the full dataset for Geelong WiFi Usage dataset provided by data.gov.au This file contains description for attributes in the data file This is the template for the data dictionary file in Excel. This is the IPython notebook which has been prepared and pre-filled for you to complete the programming task. A screenshot to illustrate the visualization on Google map.2. Task Description There are two main tasks for this assignment which constitutes a total of 100 marks:  Construction of the data dictionary (15 marks) and  Programming tasks to perform analytics and visualization tasks (85 marks). 2.1 Construction of the Data Dictionary (15 marks) A first systematic step to a data science process, as we have learned from the lectures and practical sessions, is to construct a data dictionary for the dataset. Similar to Assignment 1, except you are working with a full dataset, your task is to construct a data dictionary for the dataset you are working with using the provided data dictionary template. You are required to prepare two sheets in your data dictionary Excel file:  Dataset description [5 marks]  Attribute description [10 marks] Name your solution as [YourID]_project_datadictionary.xls and submit this file. 2.2 Programming task (85 marks) A python note book file project_notebook.ipynb has been prepared for you to complete this task. Download this notebook, load it up and follow instructions inside the notebook to complete the task. The total mark for this task is 85 marks. You are required to submit your solution in an IPython Notebook format as well as its exported version in html. 3. Summary for submission This project is to be completed individually and submitted to CloudDeakin. By the due date, you are required to submit the following files to the corresponding Assignment (Dropbox) in CloudDeakin: 1. [YourID]_datadictionary.xls: your solution for the data dictionary for the Geelong WiFi Usage dataset. 2. [YourID]_project_solution.ipynp : your IPython notebook solution source file. 3. [YourID]_project_output.html: the output of your IPython notebook solution in html. END OF PROJECT DESCRIPTION