RMIT University
School of Science [Computer Science and Information Technology]
COSC2407 – Database Systems
Assignment #2: Design and implement variation on the index component
component of Derby
Due: 11.59pm on Thursday 25 May 2017
Marks: This individual assignment is worth 30% of your overall mark
Introduction
This assignment builds on assignment 1 using the same open data from the City of Melbourne
about pedestrian traffic in the Melbourne CBD as described at the following website:
https://data.melbourne.vic.gov.au/Transport-Movement/
Pedestrian-volume-updated-monthly-/b2ak-trbp.
Experiment 1: With and without using a secondary index
In this part of the assignment, using the Derby database you built as part of assignment 1 (or a
variation of it), add a secondary index on one (or more) fields in the database. Design ten new
queries (different from the queries used in assignment 1), five of these queries should be queries
that use the secondary index and five should be queries that do not use the secondary index.
Running the queries multiple times (similar to assignment 1) on two versions of the database
(before and after adding the secondary index), and compare the performance of the database.
Experiment 2: B+-tree versus B-Tree
A B+-tree is a variant of the original B-tree, and in this part of the assignment you will explore
the relative performance of the B+-tree (as implemented in Derby) with the original B-tree.
As explained in Wikipedia “A B+-tree can be viewed as a B-tree in which each node contains
only keys (not keyvalue pairs), and to which an additional level is added at the bottom with
linked leaves.” To compare with the original B-tree you will need to make a variant of Derby
that changes the btree implementation from a B+-tree to a B-tree by including key-value pairs
(instead of just keys) in every node, not just leaf nodes. This will require you to rebuild your
database (or at least the indexes), so you may want to complete part 1 before modifying Derby. It
is recommended that you make as few changes as possible to the code to implement this change.
Repeat the comparison from experiment 1 on your new version of Derby that implements a
B-tree instead of a B+-tree, and compare the difference in the results.
1General Requirements
This section contains information about the general requirements that your assignment must
meet. Please read all requirements carefully before you start.
• The “Database Systems” blackboard contains further announcements and a list of frequently asked questions. You are expected to check the discussion board on daily basis.
Login through http://my.rmit.edu.au.
• Your database and Java programs must be set up and run on your AWS linux machine
using the same data as in assignment 1.
• As some tasks require timing you should use the same AWS linux machine for all tasks.
• You must implement your program changes in Java. Your program changes must be well
written, using good coding style and including appropriate use of comments (that clearly
identify the changes you are making to the code). Your markers will look at your source
code. Coding style will form part of the assessment of this assignment.
• Any code you submit must be able to be built using the command ant on an AWS linux
instance. If your marker cannot compile your programs, you risk yielding zero marks for
the coding component of your assignment.
• Your program may be developed on any machine, but must compile and run your AWS
linux instance.
• You must use git as you develop your code (wherever you do the development). As you
work on the assignment you should commit your changes to git regularly (for example,
hourly or each time you rebuild) as the log may be used as evidence of your progress.
• Paths must not be hard-coded.
• Diagnostic messages must be output to stderr.
• Parts of this assignment will ask you to analyse your results, and to write about your
conclusions in a report. Your report must be a PDF file, called REPORTyyyyyyy.pdf
where yyyyyyy is your student number. Files that do not meet this requirement may not
be marked.
• Your report must be well-written. Poorly written or hard to read reports will receive
substantially lower marks. Your report should be appropriate to submit in a professional
environment (such as including in a portfolio of your work for a prospective employer).
The RMIT Study & Learning Centre employs advisors to help you improve your writing.
For details, see http://www.rmit.edu.au/studyandlearningcentre.
• All sections of this assignment are expected to show that you have thought about the problem. The most basic structuring of data and analysis will get the most basic mark.
• Take care to repeat timings in a consistent way, so that you can make fair comparisons.
2• Depending on your implementation, you may wish to provide additional information about
your code (for example, how it is to be compiled and run). If so, put this information into
a plain text file called readme.txt.
• Important: You must run all your experiments on your AWS linux instance.
Questions about assignment
If you have any questions about the assignment (for example to clarify requirements):
1. Please first check this assignment specification, as well the announcements and the discussion board to see if it has already been answered.
2. If it has NOT already been answered and does NOT include your own code (including
database queries), please post your question on the discussion board.
3. Otherwise, if your question involves your own code (or is about your personal situation)
then discuss it in your practical class with the lab instructor or contact the lecturer (or your
tutor) via email.
Academic Integrity
This is individual assignment, which means what you submit MUST be your own original work.
So make sure you reference any sources you use (including all web resources) as all assignments will be checked with plagiarism-detection software.
Any student found to have plagiarised will be subject to disciplinary action in accordance
with RMIT policy and procedures. Plagiarism includes submitting code that is not your own or
submitting text that is not your own. Submitting a comment from someone else in your code or
a sentence from someone else’s report is plagiarism, and plagiarism includes submitting work
from previous years. Allowing others to copy your work is also plagiarism. All plagiarism
will be penalised; there are no exceptions and no excuses. For further information, please see:
http://www1.rmit.edu.au/students/academic-integrity.
Assessment tasks, weightings and marking criteria
Experiment 1: With and without using a secondary index
Report on your Experiment 1 (10 marks)
You are required to write a report on the experiments undertaken using your new queries and
discuss the output and timings of queries using Derby with and without using a secondary index.
3Experiment 2: B+-tree versus B-Tree
Code Walkthrough in class (4 marks)
You must undertake a code walk-through during a scheduled lab class in week 12 explaining
your code and answering questions about it.
Submission of code (6 marks)
You must submit all files that you have modified, including your git log. In your report (in no
more than one or two pages) you should explain how you modified Derby to implement the Btree. In particular, for each file modified explain the changes made, and make sure you explain
any choices you made in your implementation. Also identify any known limitations or your
implementation.
Results of experiments (10 marks)
Undertake experiments using your program and report on the output and timings. In no more
than one or two pages, discuss your results and critically analyse the effectiveness of using a
B+-tree versus using a B-tree for indexing in Derby. Are the results as you expected?
Important: Your report will be marked on the quality of your written explanations and
analysis, and not on the length of the report (the page limits are meant as guidelines only). After
writing your report you should carefully revise it checking for clarity of expression and quality
of writing.
What to Submit, When, and How
What
You need to submit your source code of any files modified, including git log, and a report. Before
you submit anything, read through the assignment specifications again carefully. Check that you
have followed all instructions in the general requirements. Also check that you have attempted
all parts of all questions. In particular you must submit:
1. a zip file of your code (all files that you have modified and including your git log), and
2. your report (a single PDF file) that explains queries used, how your code implements a
B-tree, output of the queries, and a discussion of your results in the two experiments.
4When
The assignment is due at 11.59pm on Thursday 25 May 2017.
Late submissions should be submitted using the same procedure. If you unable submit by
the due date you must have an extension approved (follow the process at http://www1.rmit.
edu.au/students/assessment/extension) otherwise you will be penalised by 10% of total
possible marks per day for assignments that are late 1 to 5 days late. For assignments that are
more than 5 days late, a penalty of 100% will apply. See the course guide for further information.
The onus is on you to check that your submission has been received.
How
You need to separately submit two files under assessment tasks on blackboard via MyRMIT
1. ONE zip file that contains the Java source files you have modified, and your git log, this
should be submitted using the link to Assignment 2 Code Submission, and
2. ONE PDF file containing your report, this should be submitted using the link to Assignment 2 Report Submission (it is a turnitin submission).
You will also need to arrange a time during your laboratory class in week 12 to do your code
walkthrough.
5