Assignment title: Information
CMPT 454 Assignment 2 – Indices
This assignment is worth 6% of your final grade.
Part 1
A primary B+ tree file where the file's records are the leaf nodes
of the tree has 6 levels (including both the root and the
leaves). 5 main memory frames have been allocated for the use
of the B+ tree, all of which are initially empty. Interior nodes
contain more than 12 search key values and leaf nodes contain
more than 12 records.
The B+ tree is used twelve times in sequence to retrieve twelve
different single records (i.e. selections on equality, with each
search returning exactly one record), with each query being
made after the previous one has been evaluated. [1 mark each]
a. If the least recently used replacement policy (LRU) is used
to replace buffer frames, how many disk reads are
required if all twelve records reside on the same block? –
72 disk reads
b. If LRU is used, how many disk reads are required if all
twelve records reside on different blocks, and if the search
keys for all twelve records are on different interior nodes
of each level of the tree (except the root)? – 72 disk reads
c. If the most recently used replacement policy (MRU) is
used to replace buffer frames, how many disk reads are
required if all twelve records reside on the same block? –
17 disk reads
d. If MRU is used, how many disk reads are required if all
twelve records reside on different blocks, and if the search
keys for all twelve records are on different interior nodes
of each level of the tree (except the root)? – 71 disk reads
(72 is acceptable)
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
Part 2
a. Write the search algorithm in pseudocode for a B+ tree
that does not allow duplicates. – review and judge for
accuracy [good = 3, 2 = adequate, 1 = seriously
incomplete]
b. Write the insertion algorithm in pseudocode for a B+ tree
that does allow duplicates. The algorithm should follow
the process for allowing duplicates in the tree as described
in class (the method that may require null entries for
search key values). – review and judge for accuracy [good
= 3, 2 = adequate, 1 = seriously incomplete]
c. The B+ tree insertion process may result in parent nodes
having values inserted in them or in being split. In the
absence of parent pointers in nodes briefly describe how
this might be implemented. – there could be multiple
different answers, one is to keep track of parent nodes in
a data structure, but the most obvious explanation is to
use recursion [2 marks for an adequate explanation, 1
mark for an explanation that is incomplete]
Part 3
a. Sketch the extensible hash index, and its directory, that results
from the following insertions, where two (2) data entries can
fit in a disk page. Assume that initially the directory has four
entries and uses the last two bits of the hash value. The initial
state of the index, and the global and local depths, are shown
below, where the values shown are the hash values (not the
search key values). You should show the index and its directory
just before each insertion that causes the directory to double
and after all insertions.
global depth = 2, all buckets have a local depth of 2
00-----> 0010 0000, 1101 0100
01-----> 1111 0101, 1001 0001
10-----> 0000 0110
11-----> 0110 1111
Search keys with the following five hash values are to be
inserted:
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
1010 1011
0110 0000
1101 0111
1110 1101
0101 1111
directory bucket ld
00 0010 0000, 1101 0100 2
01 1111 0101, 1001 0001 2
10 0000 0110 2
11 0110 1111, 1010 1011 2
[1 mark]
Insertion of 0110 0000 causes a bucket to overflow and the directory to
split
directory bucket ld
000 0010 0000, 0110 0000 3
001 1001 0001 3
010 0000 0110 2
011 1010 1011 3
100 1101 0100 3
101 1111 0101, 1110 1101 3
110(010)
111 0110 1111, 1101 0111 3
[1 mark]
Insertion of 0101 1111 causes a bucket to overflow and the directory to
split (again)
directory bucket ld
0000 0010 0000, 0110 0000 3
0001 1001 0001 3
0010 0000 0110 2
0011 1010 1011 3
0100 1101 0100 3
0101 1111 0101, 1110 1101 3
0110(0010)
0111 1101 0111 4
1000(0000)
1001(0001)
1010(0010)
1011(0011)
1100(0100)
1101(0101)
1110(0010)
1111 0110 1111, 0101 1111 4
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
[3 marks, if the directory pointers are not shown somehow -1 – in my
version I`ve noted which buckets the unused pointers point to]
b. Sketch the linear hash index that results from the initial state
and insertions given in (a) above, assuming that index pages
cannot be greater than 80% full. You should show the index
and its directory just before each insertion that adds a bucket
to the index and after all insertions.
Search keys with the following five hash values are to be
inserted:
1010 1011
0110 0000
1101 0111
1110 1101
0101 1111
index bucket occupancy = 6/8 = 0.75
00 0010 0000, 1101
0100
01 1111 0101, 1001
0001
10 0000 0110
11 0110 1111
[1 mark]
Insertion of 1010 1011 results in 87.5% occupancy so we need another
bucket
index bucket occupancy = 8/10 = 0.8
000 0010 0000, 0110 0000
001 1111 0101, 1001 0001
010 0000 0110
011 0110 1111, 1010 1011
100 1101 0100
[1 mark]
Insertion of 1101 0111 results in 90% occupancy so we need another
bucket
index bucket occupancy = 9/12 = 0.75
000 0010 0000, 0110 0000
001 1001 0001
010 0000 0110
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
011 0110 1111, 1010 1011 1101 0111
100 1101 0100
101 1111 0101
[1 mark]
Insertion of 1110 1101 results in 83.3% occupancy so we need another
bucket
index bucket occupancy = 11/14 = 0.786
000 0010 0000, 0110
0000
001 1001 0001
010
011 0110 1111, 1010
1011
1101 0111, 0101
1111
100 1101 0100
101 1111 0101, 1110
1101
110 0000 0110
[2 marks]
Part 4
A DB table has the schema: Customer =
{id, fname, lname, age, income, city, street, number}
The fields have the following domains:
age, number - 4 byte integers
id - 8 byte integer
income - 8 byte real
fname, lname, city, street - 20 byte character strings
The table contains 100,000 records, and (for the sake of
simplicity) no additional space is required to represent record
headers. The following file organizations are to be considered
for this file:
heap file - unsorted, pages have full occupancy
sorted file - sorted on {lname, fname} pages are in sequence
and have full occupancy
primary B+ tree file – sorted on {city, street, number}, note
that the leaves contain records (so only require pointers to
adjacent leaf nodes), pages (leaf and interior) have 2/3
occupancy
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
The following additional information is relevant
Size of a record ID, or pointer to a block - 8 bytes
The file is to be stored on a single disk, with ten 2,048 byte
blocks per track
No single record is stored on more than one block
The average daily use of the file is expected to be:
Search with equality selection on {lname, fname}, note that
over 99% of the time both attributes are given, as
customers tend to know their own names - 10,000
Search with equality selection on {id} - 1,000
Search with equality selection on {city, street, number} -
1,000
Search with equality selection on {city, street} - 5,000
Search with equality selection on {city}, returning, on
average, 0.5% of the records - 1,000
Range selections on {income}, returning, on average, 10% of
the records - 20
Range selections on {age}, returning, on average, 1% of the
records - 500
Insertions - 1,200
Deletions, given all data except id - 500
Deletions, given only id - 300
Answer the following questions about the file:
a. How many records fit on a single disk block? – record size
= 8 + 20 + 20 + 4 + 8 + 20 + 20 + 4 = 104, 2,048 / 104 =
19.6923…, so 19 per page [1 mark]
b. How large in blocks is the file, assuming that it is stored as
the heap file? – 100,000 / 19 = 5,264 [1 mark]
c. How large in blocks is the file, assuming that it is stored as
the sorted file? – 100,000 / 19 = 5,264 [1 mark]
d. How large in blocks is the file, assuming that it is stored as
the B+ tree file (do not count the interior nodes or root of
the tree)? – 19 * 2/3 = 13 per page, 100,000 / 13 = 7,693 –
or 12 per page, 100,000 / 12 = 8,334 for the leaf
nodes (values close to this are acceptable) [1 mark]
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
e. Assume that a secondary B+ index on id is to be created,
how many leaf pages does the index have? Assume that
leaf pages have 2/3 occupancy. – a data entry is 8 + 8 = 16
bytes, 2,048 / 16 = 128, but we need two pointers for the
pointer chains, so 127 * 2/3 = 84 per page, and 100,000 /
84 = 1,190 (values close to this are acceptable) [1 mark]
f. For the index described in (e), how many data entries can
be stored in a non-leaf node? – a non-leaf page has an
extra pointer, so 127 data entries [1 mark]
g. For the index described in (e), what is the height of the
tree, assuming that non-leaf nodes also have 2/3
occupancy? State the height of the tree including both
root and leaf levels. – fan-out is 84, root fans out to 84
second level nodes, each of which can fan-out to 84 third
level nodes, 842 = 7,056 (!), so there 3 levels (root, nonleaf, and leaf) [1 mark]
h. Assume that a secondary hash index on id is to be created,
with hash buckets having 80% occupancy on
average. How many buckets does the index contain? –
again, a data entry requires 16 bytes, 128 * .8 = 102,
100,000 / 102 =981 (values close to this are acceptable) [1
mark]
i. For the index described in (h) above, if the index is an
extensible hash index, how large (in blocks) is the index? –
the index is an array of 8 byte pointers, it must be a power
of 2 so assume that it has 1,024 entries (assuming minimal
skew), so requires 4 blocks [1 mark]
j. Which of the three file organizations provides the best
support for the daily use of the file? Justify your answer. –
an unfortunately worded question, the best answer is the
B+ tree file, since it supports equality selection, on any
prefix of {city, street and number}, but only if it is
supported by an index on {lname, fname}, so I would also
accept the file sorted on {lname, fname} as an answer [2
marks – one for either organization, and 1 for a
reasonable justification]
k. Describe which secondary indices you would create for
the file where the file is organized as noted in your answer
to (j). Your goal is to make as few disk read and write
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
operations as possible. Your answer should include the
search key and type (B+ tree or hash) of each index
structure that you would create. – either (1) sorted file on
{lname, fname}, with a B+ tree index on
{city, street, number}, a hash index on id and a B+ tree
index on age or (2) the B+ tree file with its primary index
on {city, street, number}, a B+ tree index on
{lname, fname}, a hash index on id and a B+ tree index
on age [3 marks, 1 mark for each index -1 for indexes on
income]
l. For your chosen file and index organization, calculate the
cost, in disk reads and writes, of each of the operations
noted above (for one operation only, you don't have to
multiply by the number of operations per day) – see the
table below, note that only the column showing the cost
of a single operation for the chosen organization is
required [0.5 mark each, accept values close to those
shown below]
operation n (1) (1) total (1)
metho
d
(2) (2) total
search –
fname, lname
10,00
0
13 130,000 binary
search
8
80,000
search – id 1,000 3 3,000 index 3 3,000
search –
city, street, numb
er
1,000 4 4,000 index 3
3,000
search –
city, street
5,000 40 200,000 index,
10
record
s
4
20,000
search – city 1,000 2,00
0
2,000,00
0
index,
500
record
s
44
44,000
search – income 20 5,26
4
105,280 8,33
4 166,680
search – age 500 2,00
0
1,000,00
0
index,
1,000
record
2,00
0 1,000,00
0
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
s
insertion 1,200 2,65
3
3,183,60
0
search
, move
up
3
3,600
deletions (no id) 500 21 10,500 search 3 1,500
deletions (id) 300 8 2,400 search
, index
3
900
6,638,78
0
1,322,68
0
Assessment
The assignment is out of 419. Marks are assigned as follows:
Question 1 – 4
Question 2 – 8
Question 3 – 10
Question 4 – 19
Submission
You should submit your assignment online to
the CoursSys submission server. You must submit a single .zip (or
other archive) file that contains your solution, please read the
documentation on site for further information. The assignment
is due by 11:59pm on Monday the 17th of February.
CMPT 454 Home
John Edgar ([email protected])
https://www.coursehero.com/file/10763127/CMP-454-Assignment2solution1/
This study resource was
shared via CourseHero.com
Powered by TCPDF (www.tcpdf.org)