Assignment title: Information
Part 2
a. Sketch the extensible hash index, and its directory, that results
from the following insertions, where four data entries can fit in a
disk page. Initially the directory has two entries and therefore
uses just the last bit of the hash value. The initial state of the
index, and its global and local depths are shown below. The index
entries show the last eight bits of the hash values (not the search
key values). Sketch the index and its directory as it appears after
the insertion of each search key marked with [sketch]. For full
marks you must show the global depth of the directory, the local
depth of each bucket and which bucket is referred to by each
directory entry.
global depth = 1, both buckets have a local depth of 1
00-----> 0011 1110, 1101 0110, 0001 1010
01-----> 1111 1101, 1011 0011, 0011 1001, 1110 1101
Search keys with the following ten hash values are to be
inserted:
0011 0011 [sketch]
0101 0111
1011 1100
1001 0101
0110 0001 [sketch]
1111 1001
0000 1111
1001 0110 [sketch]
0110 0000
1000 0110 [sketch]
b. Sketch the linear hash index that results from the seven initial
data entries and subsequent ten insertions given in (a) above,
where the maximum average occupancy for index buckets is
80%. Sketch the index as it appears after the insertion of the
search keys marked with [sketch]. Hint: think carefully about the
state of the index at the start, do not assume it looks identical to
part (a).
Part 3
A DB table has the schema: Customer = {id, fname, lname,
email, birthdate, income, city, street, number}
The fields have the following domains:
§ number,id – 4 byte integer
§ income – 8 byte real
§ birthdate – 8 byte datetime
§ fname, lname, city, email– 20 byte character strings
§ street – 30 byte character strings
The table contains 210,000 records, and (for the sake of
simplicity) no additional space is required to
represent record headers. The following file
organizations are to be considered for this file:
1. heap file – unsorted, pages have full occupancy
2. sorted file – sorted on {lname, fname} pages are in
sequence and have 80% occupancy
3. sorted file – sorted on {city, street, number} pages are
in sequence and have 80% occupancy
The following additional information is relevant
§ Size of a record ID – 8 byte
§ Size of a pointer to a block - 8 bytes
§ The file is to be stored on a single disk, with twenty 4,096
byte blocks per track
§ No single record is to be stored on more than one block
§ A note on average occupancy – if a file (or page) has a
given average occupancy it means just that.
Therefore the average occupancy can be a value
with a fraction. It is possible for a block to hold 10.234
records on average, even though blocks only contain
whole records.
The average weekly use of the file is expected shown below.
The number given at the end of each query
description is the number of times that query will be
run in a week. For example there are expected to be
3,000 equality selections on id a week, and 10,000
insertions a week.
§ Equality selections on {lname, fname}, there are very few
{lname, fname} duplicate values (where customers
have the same first and last names); for sake of
simplicity assume that selections always return a
single record – 12,000
§ Equality selection on {fname}, returning, on average, 0.03%
of the records – 10
§ Equality selections on {id} – 3,000
§ Equality selections on {city, street, number}, there are very
few {city, street, number} duplicate values (where
customers live at the same address) ; for sake of
simplicity assume that selections always return a
single record – 1,200
§ Equality selections on {city, street}, returning, on average,
0.01% of the records – 7,000
§ Equality selections on {city}, returning, on average, 0.5%
of the records – 1,500
§ Equality selections on {email}, email is a candidate key –
60
§ Range selections on {birthdate}, returning, on average,
0.6% of the records – 3
§ Range selections on {income}, returning, on average, 10%
of the records – 50
§ Insertions – 10,000
§ Deletions, given at least {lname, fname} and sufficient
data to determine customer – 7,000
For parts m, n and o make the following assumptions:
§ The root node and directory of the indexes are retained in main
memory so do not have to be read from disk
§ If a selection returns multiple records they are located in the
greatest possible number of blocks; for example if a B+ tree index
is used in a search that returns five records assume that they are
stored on two leaf nodes
§ Insertions and deletions do not result in changes to the structure
of indexes or files – that is records can always be inserted in the
appropriate leaf nodes, buckets or file blocks without affecting
other blocks of the index or file
Answer the following questions about the file:
a. How many records fit on a single disk block? [1 mark]
b. How large in blocks is the file, assuming that it is stored as the
heap file? [1 mark]
c. How large in blocks is the file, assuming that it is stored as
either of the sorted files? [1 mark]
d. If a B+ tree index on {lname, fname} is created on any of the
file organizations how many data entries can be stored in a node
(Ieaf or interior)? [1 mark]
e. If a B+ tree index on {lname, fname} is created on file
organizations 1 or 3 how many leaf pages would the index
have? Assume that leaf pages have 2/3 occupancy on average.
[1 mark]
f. For the index described in (e), what is the height of the tree,
assuming that interior nodes also have 2/3 occupancy on
average? State the height of the tree including both root and leaf
levels. [1 mark]
g. If a sparse B+ index on {lname, fname} is created on file
organization 2 how many leaf pages would the index
have? Assume that leaf pages have 2/3 occupancy. [1 mark]
h. For the index described in (g), what is the height of the tree,
assuming that interior nodes also have 2/3 occupancy? State the
height of the tree including both root and leaf levels. [1 mark]
i. If a hash index on id is created (on any of the file
organizations), how many buckets does the index contain?
Assume that buckets have 80% occupancy on average [1 mark]
j. For the index described in (i), if the index is an extensible
hash index, how large (in blocks) is the directory? [1 mark]
k. If a B+ index on {city, street, number} is created on file
organizations 1 or 2, what is the height of the tree, assuming that
both interior and leaf nodes have 2/3 occupancy? State the
height of the tree including both root and leaf levels. [1 mark]
l. If a sparse B+ index on {city, street, number} is created on file
organization 3, what is the height of the tree, assuming that both
interior and leaf nodes have 2/3 occupancy? State the height of
the tree including both root and leaf levels. [1 mark]
m. Compute the total average weekly cost in numbers of disk
reads and writes if the table is stored as a heap file (organization
1) with B+ tree indexes on {lname, fname} and {city, street,
number} and a hash index on id. For full marks you must show
the cost, in disk reads and writes, for each of the operations noted
above. [11 marks]
n. Compute the total average weekly cost in numbers of disk
reads and writes if the table is stored as a file sorted on {lname,
fname} (organization 2) with B+ tree indexes on {lname, fname}
and {city, street, number} and a hash index on id. For full marks
you must show the cost, in disk reads and writes, for each of the
operations noted above. [11 marks]
o. Compute the total average weekly cost in numbers of disk
reads and writes if the table is stored as a file sorted on {city,
street, number} (organization 3) with B+ tree indexes on {lname,
fname} and {city, street, number} and a hash index on id. For full
marks you must show the cost, in disk reads and writes, for each
of the operations noted above. [11 marks]
p. Which of the three file organizations described in (m), (n) and
(o) provides the best support for the weekly use of the file? [1
mark]
q. For your chosen organization is it beneficial to add a hash
index on email, explain why or why not? [1 mark]
r. For your chosen organization is it beneficial to add a B+ tree
index on birthdate, explain why or why not? [1 mark]
s. For your chosen organization is it beneficial to add a B+ tree
index on income, explain why or why not? [1 mark]