Assignment title: Information
Closure Algorithms for Functional dependencies
Instructions
The objective of the assignment is to implement two closure algorithms for functional
dependencies given in pseudo-code and to experimentally compare their performances. The
programming language may be Java or Python.
Please send a zip file including all your source files, the readme.txt file, your experimental
figures in a tsv or csv file, the image depicting the later as well as relevant supplementary
material
The readme.txt report, in English, must contain your answers to the open questions (justification
for the data structures you selected, a discussion on the experimental results etc.).
Presentation of functional dependencies used in this course can be found here :
https://drive.google.com/file/d/0B6s0m-_LoN0MZGJBMFl4NDYzaEE/view?usp=sharing
Coding standards : Good(Post-graduation course)
Deadline : Monday 5rd April 2017 Midday, (GMT+2)
1.2 Main Program
A set of sample files is provided in the examples folder. Each file contains a set of Functional
Dependencies (FDs). Your main program closure has three options, where input is either a file
containing FDs. or - (the hyphen character) if the FDs are to be read on the standard input (cin):
● closure -naive input atts: with Σ the set of FDs read on input and X the set of attributes in
atts, computes Closure(Σ, X) with the naive algorithm, see Algorithm 1, and prints it on
the standard output (cout);
● closure -improved input atts: with Σ the set of FDs read on input and X the set of
attributes in atts, computes Closure0 (Σ, X) with the improved algorithm, see Algorithm
2, and prints it on the standard output (cout);
● closure -generate n: with n an integer, generates a particular set of FDs of the form {n −
1 →n, n − 2 → n − 1, . . . , 0 → 1}, shuffles it and prints it on the standard output (cout);
● closure -normalize input: with Σ the set of FDs read on input, computes Reduce(M
inimize(Σ), see Algorithms 3 and 4, and prints it on the standard output (cout);
● closure -decompose input: with Σ the set of FDs read on input, supposed to be obtained
after normalization, decomposes the (implicit) relation into a BCNF schema, and prints it
on the standard output (cout).2 Closure algorithms for functional dependencies
2.1 Basic Data Structure
Exercise 1 input and output
Your program needs to read a set of FDs from a file and to write sets of FDs and sets of
attributes on the standard output (cout).
1. Define data structures for attributes, sets of attributes, sets of sets of attributes, FDs and
sets of FDs.
2. Define input/read and output/write for the data structures of the previous question. If
needed by your favorite set library, define a lexical (total) order on FDs.
3. Define a main program with the requested command line options.
2.2 Closure algorithms
The closure X+ of a set of attributes by a set of FDs Σ is defined as X+ = {A | F ` X → A}.
Algorithm 1 gives a procedure to compute X+. As discussed during the lecture1 Algorithm 1 can
be improved to Algorithm 2 that does exactly the same job.
Exercise 2 naive closure
1. Implement Algorithm 1.
2. Provide unit tests for your implementation.
Exercise 3 improved algorithm
1. Define data structures for the count and list indices of Algorithm 2.
2. Implement the construction of indices of Algorithm 2, Lines 2 to 4 as a separate
function.
3. Implement Algorithm 2.
4. Provide unit tests for your implementation.
Exercise 4 open questions
1. Justify your implementation choices for the data structures, in the appropriate section of
the readme.txt file.
2. Discuss your strategy that implements the Choose A instruction of Algorithm 2, Line 7.
3. There is a small corner case in Algorithm 2 with non-standard DFs. Find it.Experimental Comparison
At this point you have two algorithms solving the same problem. The goal is to experimentally
compare the practical performance gain.
Exercise 5 performance comparison
1. Implement the -generate n command-line option that produces the set of DFs {n−1 → n,
n−2 →n − 1, . . . , 0 → 1}. Shuffle the DFs of this set before it is output.
2. Inspire yourselves from the ./perf.sh script to obtain raw execution times to be stored in a
Tab/Comma Separated Value (tsv/csv) file.
3. Use your favorite spreadsheet, plot or statistics software to graphically depict your
results. Figure 1 is provided for inspiration.Exercise 6 open questions
1. Explain why the set {n − 1 → n, n − 2 → n − 1, . . . , 0 → 1} is “interesting” when
evaluating the closure algorithms.
2. Detail your experimental setup and methodology.
3. Analyze your experimental results.Exercise 7 decomposition
The decomposition pipeline is given by Algorithm 5. It uses two subroutines Algorithm 3 and
Algorithm 4. The function Reduce Minimize is said to normalize a set of FDs.
1. Implement a tool function that checks if Σ |= X → Y by testing if Y ⊆ Closure(Σ, X) or Y
⊆ Closure0 (Σ, X).
2. Implement the -normalize command-line option that runs Algorithms 3 and 4. Mind that
the set of FDs is modified along the process, so be careful when using Algorithm 2.
3. Implement a tool Schema(Σ) function that gather all attributes that appears in a set of
FDs, for instance Schema({AB → C, D → C}) = ABCD.
4. Implements a tool function that given a set of FDs Σ and a set of attributes X checks if X
is a key of Schema(Σ). Use this function to implement the BCNF test that appears at
Line 3 of Algorithm 5.
5. Implement the -decompose command-line option that computes Decompose(Σ,
Schema(Σ))
.