RMIT University
School of Science [Computer Science and Information Technology]
COSC2407 – Database Systems
Assignment #1: MongoDB and Apache Derby
Due: 11.59pm on Wednesday 12 April 2017
Marks: This individual assignment is worth 10% of your overall mark
1 Introduction
In this assignment, using your AWS linux instance you will build and query an instance of a
MongoDB database and an Apache Derby relational database using data concerning pedestrian
traffic in Melbourne from: https://data.melbourne.vic.gov.au
Please read ALL the following requirements carefully before you start.
2 Data
The data that you are going to use in this assignment is open data from the City of Melbourne
about pedestrian traffic in the Melbourne CBD. The data is described at the following website:
https://data.melbourne.vic.gov.au/Transport-Movement/
Pedestrian-volume-updated-monthly-/b2ak-trbp
and the data can be explored (and downloaded) in various formats from that website. You can
download a compressed copy of the data from:
http://goanna.cs.rmit.edu.au/~jat/Pedestrian_volume__updated_monthly_.csv.zip
which can be accessed using the wget command from your AWS linux instance. Or you can
download the file on another machine and use scp to copy to your AWS linux instance or use a
tool (such as DBeaver or Robomongo to connect to your database and upload the data directly
into your database). When you download the file you will need to unzip the file. This file is
also available on blackboard.
The data is from https://data.melbourne.vic.gov.au and the first line contains the
field names that should be used to inform how you create and name any entities in your
databases.
3 Academic Integrity
This is individual assignment, which means what you submit MUST be your own original
work.
So make sure you reference any sources you use (including all web resources) as all assignments will be checked with plagiarism-detection software.
Any student found to have plagiarised will be subject to disciplinary action in accordance
with RMIT policy and procedures. Plagiarism includes submitting code that is not your own or
submitting text that is not your own. Submitting a comment from someone else in your code or a
sentence from someone else’s report is plagiarism, and plagiarism includes submitting work
from previous years. Allowing others to copy your work is also plagiarism. All plagiarism
will be penalised; there are no exceptions and no excuses. For further information, please see:
http://www1.rmit.edu.au/students/academic-integrity.
14 General Requirements and Getting Help
This section contains information about the general requirements that your assignment must
meet and how to get help.
1. Your database must be set up on your AWS linux instance (as set up following the instructions in the initial practical classes in the laboratories). As some tasks require timing you
should use the same AWS linux instance and versions of MongoDB and Apache Derby
for all tasks.
2. Parts of this assignment will ask you to analyse your results, and to write your conclusions in a report. The report MUST be a PDF file. Submissions that do not meet this
requirement will NOT be marked.
3. Your report must be well-written. Poorly written or hard to read reports will receive
substantially lower marks. Your report should be appropriate to submit in a professional
environment (such as including in a portfolio of your work for a prospective employer).
The RMIT Study & Learning Centre employs advisors to help you improve your writing.
For details, see http://www.rmit.edu.au/studyandlearningcentre.
4. All sections of this assignment are expected to show that you have thought about the
problem. The most basic structuring of data and analysis will get the most basic mark.
5. The COSC2407 Database Systems course blackboard contains a discussion board for this
assignment allowing a forum for students to ask questions (see below) and contribute
to discussion about aspects of the assignment. If there are announcements about the
assignment (including if there are any revisions to the assignment specification) these
will also be made via announcements on the course blackboard. You are expected to
check these on a daily basis. Login through https://my.rmit.edu.au.
6. If you have any questions about the assignment (for example to clarify requirements):
(a) Please first check this assignment specification, as well the announcements and the
discussion board to see if it has already been answered.
(b) If it has NOT already been answered and does NOT include your own code (including database queries), please post your question on the discussion board.
(c) Otherwise, if your question involves your own code (or is about your personal situation) then discuss it in your practical class with the lab instructor or contact the
lecturer (or your tutor) via email.
5 Tasks and Analysis
In this section, you will be asked to carry out several tasks using your AWS linux instance to
create and query a database (using both MongoDB and Derby), with the data in the provided
file and to analyse your results.
Create a file called report.pdf (various software including word processors can export
as PDF). Use this file to report on the following tasks. Each task should be reported under
a separate heading with the task name and description, for example for the first task use the
heading: Task 1: Load the data into MongoDB.
2Task 1: Load the data into MongoDB
You are required to load the data into MongoDB. In your report:
• explain how have you chosen to structure the data inserted in MongoDB
• provide details of the time taken to load the data (The mongoimport is one utility will
provide such information). Please note that a naive import into a flat structure in Mongodb
will not accrue you a great mark. You need to analyse the data and consider appropriate
ways to structure the data and then using any scripting, programming or other tools to
format the data accordingly.
• What alternative way or ways could you have organised the data when storing in MongoDB, and what advantages or disadvantages would these alternative designs have?
Task 2: Load the data into Derby
You are required to load the data into Derby. In your report:
• explain how have you chosen to structure the Derby relational database and give reasons.
• provide details of the time to load the data into Derby. The same criterion to think about
how you structure your data applies and that this structure should reflect your understanding of database system theory applies.
• What alternative way or ways could you have organised the data when storing in Derby,
and what advantages or disadvantages would these alternative designs have?
Task 3: Queries and timings
Using both MongoDb and Derby, run the queries below 20 times: ten times after a fresh reboot
and ten times after the server has been running for a while.
Undertake the following queries (including timings when you run the queries, for comparison):
• Where is the sensor with ID number 3 located?
• For how many days in 2017 has the City of Melbourne published the sensor data of
pedestrian counts?
• In 2015, what location had the most pedestrians and was the total number of pedestrians
at that location across the whole year)?
• What was the busiest day in the city in 2016 (give the date)?
• What day of the week (i.e. Monday to Sunday) is the busiest?
In your report provide the queries (code), details of the answers returned by the database
system and query timings. The best approach would be to detail the timings in a table along
with the mean and the standard deviations of these timings.
In your report provide the queries, details of the answers returned by both database systems
and query timings.
3Task 4: Evaluate and critically analyse results
With reference to your experimental results above:
• Are the timing results what you would have expected to see based on your theoretical
understanding of these database systems? Explain why or why not?
• How would you expect these results to have changed if you had used the alternative
designs (considered in Tasks 1 and 2) for MongoDB and for Derby? Justify for answers.
• Discuss the advantages and disadvantages of using MongoDB and Derby for this dataset.
6 Submission
Before you submit anything, read through the assignment specifications again carefully, especially Section 4. Check that you have followed ALL instructions. Also check that you have
attempted all parts of all tasks in Section 5.
When
The assignment is due at 11.59pm on Wednesday 12 April 2017.
(This is later than the original due date in the course guide because of delayed release of the
assignment while some students’ problems accessing AWS were resolved by RMIT ITS).
What
You MUST submit your report as one PDF file.
How
You need to submit your report in one PDF file using the link under “Assesments” on the course
blackboard through myRMIT by 11.59pm on Wednesday 12 April 2017.
Late submissions should be submitted using the same Blackboard procedure, but will be
penalised by 10% of total possible marks per day for assignments that are late 1 to 5 days late.
For assignments that are more than 5 days late, a penalty of 100% will apply.
You should ensure that your score from the turnitin similarity checker is in the green range.
Any greater similarity ( yellow , orange or red ) will be flagged for closer inspection.
47 Marking Criteria and Weighting
Marking criteria will include: (i) appropriate design of databases, (ii) correctness of scripts,
queries, and explanations (iii) completeness of results, (iv) clarity and quality of justifications
and explanations (v) depth of critical analysis.
Task 1: Load data into Mongodb 2 marks
• scripts for shaping data
• justification for scripts and explanation of chosen design
• explanation and analysis of alternative designs
Task 2: Load the data into Derby 2 marks
• scripts for shaping data
• justification for scripts and explanation of chosen design
• explanation and analysis of alternative designs
Task 3: Queries and timings 3 marks
• queries for MongoDB
• timings for MongoDB queries
• queries for Derby
• timings for Derby queries
Task 4: Evaluate and critically analyse results 3 marks
• discussion of timing results
• consideration of impact alternative designs
• discussion of advantages and disadvantages of MongDB and Derby for this dataset
5