Assignment title: Information


CSE2/CSE5ALG– Algorithms and Data Structures – 2016 Assignment – Part 2 Assessment: This part 2 of the assignment is worth 20 % of the final mark for this subject. Due Date: Monday 23 May 2016, at 10.00 am Delays caused by computer downtime cannot be accepted as a valid reason for a late submission without penalty. Students must plan their work to allow for both scheduled and unscheduled downtime. Penalties are applied to late assignments (accepted up to 5 days after the due date only). See the departmental Student Handbook for details. Copying, Plagiarism: Plagiarism is the submission of somebody else's work in a manner that gives the impression that the work is your own. The Department of Computer Science and Computer Engineering treats academic misconduct seriously. When it is detected, penalties are strictly imposed. Refer to the subject learning guide for further information and strategies you can use to avoid a charge of academic misconduct. Submission Details: Submit all the Java files that are required for the tasks described in this handout. The code has to run under Unix on the latcs8 machine. You submit your files from your latcs8 account. Make sure you are in the same directory as the files you are submitting. Submit each file separately using the submit command. For example, for the file called (say) Concordance.java, use command: submit ALG Concordance.java After submitting the files, you can run the following command that lists the files submitted from your account: verify You can submit the same filename as many times as you like before the assignment deadline; the previously submitted copy will be replaced by the latest one. Testing Platform: While you are free to develop the code for this assignment on any operating system, your solution must run on the latcs8 system. We should be able to compile your classes with the simple command javac *.java, and execute your programs with a simple command, e.g. java ConcordanceMenu. Return of Assignment: Assignments will be returned within three weeks of the due date. Students will be notified by email and via the CSE2/CSE5ALG website when marking sheets are available for collection. Assignment Objectives • To understand various data structures and searching and sorting techniques; • To analyse these techniques in relation to a particular problem; • To implement these techniques within a Java program. 1 As described in the handout for Part 1, the overall aim of the assignment is to develop a program to build a concordance and to perform various operations in relation to the concordance. Whereas for Part 1 we are only concerned with the correctness of the program, for Part 2 we are concerned with the efficiency. Besides the information given in the tasks below, please refer to Part 1 of the Assignment for any other information you need. Task 1 (80 marks) Develop a class called Concordance, which provides the same methods, with the same functionality, as the SimpleConcordance class that you developed for Part 1: • public void buildConcordance(String fileName) • public void matchAndDisplay(String start) • public void searchAndDisplay(String word) • public void searchAndSave(String word) The difference, of course, is that the Concordance class must perform these operation efficiently. Your Concordance class will be tested by programs that assume the method signatures are as specified above. You may, if you wish, have some methods throwing exceptions. That would be acceptable: the test programs will allow for that possibility. Thus, to reiterate, apart from the matter of exception throwing, the method signatures must conform to the given specifications. In addition, you are required to write a Java class called ConcordanceMenu, which provides a menu with the following options: R) Read in a book and build the concordance M) List the words that start with a given string S) Search for a word and the sentences in which it appears W) Write to a file the sentences in which a word appears Q) Quit The menu must accept the letters 'R', 'S', 'L', 'W' or 'Q' and be case insensitive. Option 'R' When the user selects option 'R' from the menu, the program should prompt the user for the name of a text file to open. The name entered by the user must include the file extension. If there is any error in reading the file, the program must warn the user and return to the menu. Otherwise, the program reads the text file and constructs all the required data structures about the words and sentences in the input text file. All the words are to be stored in upper case. The sentences are stored in uppercase and lower case as they appear in the text. 1 1 Note that these requirements are implicit in the handout for Part 1. 2 Option 'M' When the user selects option 'M', the program will ask for a non-empty string to be entered from the key board, and display on the screen all the words in the concordance that begin with that string. The format of the output must follow the requirements given in the handout for Part 1 for method matchAndDisplay. Option 'S' When the user selects option 'S', the program will ask for a word to be entered from the key board, and display on the screen the word, the number of sentences it appears in, and the list of those sentences. The format of the output must follow the requirements given in the handout for Part 1 for method searchAndDisplay. Option 'W' When the user selects option 'W', the program will ask for the word to search for, and display the search result on the screen as described for option 'S'. It will then write the search result to a text file. The output on the screen must follow the requirements given for option 'S', and the output filename and its content must follows the requirements given in the Part 1 handout for method searchAndSave. Efficiency Issue The efficiency with which the program performs various operations is a major concern. The files to be read in can be quite long. You will need to carefully consider which algorithms and data structures you use. You can use any text file for input to this program. A good source of long text files is at the Gutenberg project (www.gutenberg.com) which is a project aimed to put into electronic form older literary works that are in the public domain.2 You should choose files of lengths suitable for providing good information about the efficiency of your program. The program will be marked on correctness, style and efficiency. Style will be worth 10 marks. This will include commenting, layout, choice of identifier names, choice of methods and/or classes, etc. Correctness will be allocated 25 marks. 45 marks will be allocated for efficiency in the program. Programs that are highly inefficient may receive no marks here. Restriction on the use of Java Collections Framework With the exceptions of ArrayList and LinkedList, you are not permitted to use any of the other classes in the Java Collections Framework, e.g. TreeMap, HashMap, Collections, ListIterator. 2 You may need to convert the text files from this web site to the format that does not have the new line characters in the middle of the sentences. For this purpose, you may find the utility program provided in LMS useful. 3 Task 2 (20 marks) For this task, you are required to submit electronically a written report. This report is concerned with the optimized version only. It should have three main sections: • Description of the Design and Implementation. You should describe how the program works and which data structures and algorithms you use. You should focus on the data structures and algorithms used and your reasons for choosing them. • Correctness Testing. You should describe the tests carried out on the program as evidence of the correctness of your program. What input files did you use? What test case did you include? The test cases should be reported in table form with columns such as test case number, matching pattern, reason and expected result, and actual outcome. • Efficiency Testing. You should describe the strategies you use to "measure" your program's efficiency with respect to each tasks it has to perform. What data sets (i.e. text files) did you use? Reasons for using them? Matching patterns and reasons for their inclusion? What techniques dis you use to measure time efficiency? What were the results? And, perhaps, you should comment on the results obtained. The report should not be longer than 6 pages. Point form and tables are acceptable where appropriate. It should be submitted electronically as a file with the name "report". Acceptable formats are: PDF or Word. Make sure your name and student number are on every page of your report. (Putting it into a header or footer would be a good idea.) Task 3 (For CSE5ALG Students only) (a) Consider a B-trees of order M = 5. Calculate the number of elements in the barest trees of height 1, 2, 3 an 4. You are required to directly examine the trees and calculate the numbers of elements they contain. In other words, you must not use the result which states that the barest B-tree of height H contains N = 2K H − 1 elements, where K = ⌈ M 2 ⌉. (b) Use the the results in (a) to determine the least upper bound for a B-tree of order M = 5 which has 100 elements. Note: The total mark for Part 2 will be 100 for CSE2ALG students and 110 (100 + 10 for Task 3) for CSE5ALG students. The percentage of contribution to the final will be the same.  4