Assignment title: Information
CSE2/CSE5ALG– Algorithms and Data Structures – 2016
Assignment – Part 1
Assessment: This part 1 of the assignment is worth 10 % of the final mark for this subject.
Due Date: Friday 29th April 2016, 10 AM
Delays caused by computer downtime cannot be accepted as a valid reason for a late submission
without penalty. Students must plan their work to allow for both scheduled and unscheduled
downtime. Penalties are applied to late assignments (accepted up to 5 days after the due date
only). See the departmental Student Handbook for details.
Copying, Plagiarism: Plagiarism is the submission of somebody else's work in a manner that
gives the impression that the work is your own. The Department of Computer Science and Computer Engineering treats academic misconduct seriously. When it is detected, penalties are strictly
imposed. Refer to the subject learning guide for further information and strategies you can use to
avoid a charge of academic misconduct.
Submission Details: Submit all the Java files that are required for the tasks described in this
handout. The code has to run under Unix on the latcs8 machine. You submit your files from
your latcs8 account. Make sure you are in the same directory as the files you are submitting.
Submit each file separately using the submit command. For example, for the file called (say)
SimpleConcordance.java, use command:
submit ALG SimpleConcordance.java
After submitting the files, you can run the following command that lists the files submitted from
your account:
verify
You can submit the same filename as many times as you like before the assignment deadline; the
previously submitted copy will be replaced by the latest one.
Testing Platform: While you are free to develop the code for this assignment on any operating
system, your solution must run on the latcs8 system. We should be able to compile your classes
with the simple command javac *.java, and execute your programs with a simple command,
e.g. java SimpleConcordanceTester.
Return of Assignment: Assignments will be returned within three weeks of the due date.
Students will be notified by email and via the CSE2/CSE5ALG website when marking sheets are
available for collection.
Assignment Objectives
• To understand various data structures and searching and sorting techniques;
• To analyze these techniques in relation to a particular problem;
• To implement these techniques within a Java program.
1
Problem Description
A concordance is an alphabetical list of the principal words used in a book or body of work, with
their immediate contexts. Because of the time and difficulty and expense involved in creating a
concordance in the pre-computer era, only works of special importance, such as the Vedas, Bible,
Qur'an or the works of Shakespeare, had concordances prepared for them. (From Wikipedia) In
this assignment, the program you will design and implement an application that performs the basic
tasks of a concordance.
More specifically, the application will allow us to read in a book which is stored in a text file and
then to search for a word and the sentences in which the word appears, if any.
Operational Definition of 'Sentence'
To define a sentence, we need to define a paragraph. A paragraph is a sequence of characters from
the text that
• is terminated by a newline character (except possibly the last paragraph), and
• contains at least a letter
The newline character will not be considered to be part of the paragraph.
A sentence will be defined as a sequence of characters from a paragraph (in the sense defined
above) that
• terminates with the period character, the question mark, or the exclamation mark ( '.', '?',
'!'), except possibly the last sentence of the paragraph
• contains at least one letter
• has all the preceding and trailing whitespace characters removed
In addition, the period character, the question mark, and the exclamation mark will not be considered to be part of the sentence.
Note that this is an operational definition. It approximates, but is not completely in agreement with,
the usual concept of a sentence. It even has 'strange' consequences. For example, the sentence
(from Pride and Prejudice)
Mr. Bennet replied that he had not.
will be extracted as two sentences according to our definition.
You must use this definition for the sake of definiteness and consistency. That is, everyone will
extract the same list of sentences from a given text file.
Operational Definition of 'Word'
A word is defined to be a sequence of letters that appears in a sentence (in the sense defined above),
separated by non-letter characters.
2
Assignment Requirements
The assignment is divided into two parts. For part 1, you are required to write a simple version,
where execution efficiency is not taken into account. For part 2, you are required to write an
"optimized" version where efficiency, in addition to correctness, is the major concern.
With the exceptions of ArrayList and LinkedList, you are not permitted to use any of
the classes in the Java Collections Framework, e.g. TreeMap, HashMap.
Task 1
For this task, you are to design and implement the SimpleConcordance class. This class builds
a concordance and provides methods to support the usage of the concordance. What we are concerned for Part 1 is the correctness of the class – not its efficiency (yet).
More specifically, the SimpleConcordance class must provide the following methods:
• public void build(String fileName)
This method takes a the file name of the book, e.g. PridePrejudiceExtract.txt, and
builds the concordance for the book. The concordance must be able to support the methods
described below.
If the file does not exist (or is a directory), the method display the message:
The input file does not exist!
• public void match(String startOfWord)
This method takes a string representing the starting letters of a word, finds all the word in
the concordance starting with the specified string, and display them all on the screen.
For example, if the text file is the one listed in the Appendix, and the string entered by the
user is ''Th'', the following output will be displayed
Looking for words starting with: Th
Number of words matched: 4
THAT, THE, THERE, THIS
The matching is not case-sensitive (that is, we get the same output when we enter ''th'',
or ''TH'', or ''Th'', etc.). The words are listed in alphabetical order and separated by
commas.
If there is no match, the display format is the same and the second line will be
Number of words matched: 0
3
• public void search(String word)
This method takes a string representing a word, finds and displays on the screen all the
sentences in which the word appears.
For example, if the text file is the one listed in the Appendix, and the search word is
'truth', the following output will be displayed
Search word: truth
Number of sentences: 2
(1)[4]: It is a truth universally acknowledged that a single man in
possession of a good fortune must be in want of a wife
(2)[5]: However little known the feelings or views of such a man may be
on his first entering a neighbourhood, this truth is so well fixed in the
minds of the surrounding families, that he is considered the rightful
property of some one or other of their daughters
Note carefully the following requirements:
1. The sentences must appear in the order they appear in the text.
2. Each sentence is preceded by two numbers: the first signifies the order it appears in
the current list, the second the order it appears in the text
3. The sentences are separated by blank lines
4. If a word appears in the same sentence more than once, the sentence is listed only once
5. The matching of the search word with the words in a sentence is not case-sensitiv.
(That is, we would have the same output when the user enters "TRUTH" or "tRutH",
etc.)
6. If the word is not in the text, an appropriate message should be displayed
As for the previous method, if there is no match, the display format is the same and the
second line of the output will be: Number of sentences: 0
• public void searchAndSave(String word)
This method takes a string representing a word, finds all the sentences in which the word
appears, and save the result (as described for the previous method) to a text file. If, for
example, the search word is 'truth', the text file must have the name
ContextOf-TRUTH.txt
Note that the search word appears in uppercase in the filename.
4
Task 2
For this task, you are to write the SimpleConcordanceTester class to test all the methods described in Task 1 for the SimpleConcordance class.
Note the following:
• All the tests must be hardcoded.
• You must choose for test cases carefully, and you must include comments in your program
to explain the reasons for the tests.
• It is recommended that you use the text files that we provided to do your testing. In this
case, do not change the file names. If you choose to use your own text files for testing, you
need to submit them.
Note: Even though we describe the two tasks separately, they should be developed in parallel.
Marking Scheme Overview
• The program (classes) for Task 1 and Task 2 will be marked on correctness, and coding style
and comments.
• In addition, for Task 2, the program will also be marked on the choices of tests tou include
and the explanations given for these tests.
• Note: It is critically important that your program meet the the specification. In particular,
the class names and all the method signatures and return types must be exactly as specified.
Programs that fail to meet the specification will be heavily penalized, or even not marked.
If you need to make any assumptions, then state them clearly in a Word document named
"Assumptions.doc" and submit it along with your classes, etc.
• 10% of the total mark of part 1 of the assignment will be allocated to coding style and comments: Do the programs follow good design and programming practices? Do the indentation
and code layout follow a good, consistent standard? Are the identifiers meaningful? Are
comments being used effectively?
(See the Appendix on the next page.)
5
Appendix
An extract from Pride and Prejudice
Pride and Prejudice
by Jane Austen
Chapter 1
It is a truth universally acknowledged that a single man in possession
of a good fortune must be in want of a wife.
However little known the feelings or views of such a man may be on
his first entering a neighbourhood, this truth is so well fixed in the
minds of the surrounding families, that he is considered the rightful
property of some one or other of their daughters.
"My dear Mr. Bennet," said his lady to him one day, "have you heard
that Netherfield Park is let at last?"
Mr. Bennet replied that he had not.
"But it is," returned she; "for Mrs. Long has just been here, and she
told me all about it."
Mr. Bennet made no answer.
"Do you not want to know who has taken it?" cried his wife
impatiently.
"YOU want to tell me, and I have no objection to hearing it."
This was invitation enough.
6