https://www.futurelearn.com/courses/sit772-fl5/1/steps/199452 https://www.futurelearn.com/courses/sit772-fl5/1/tutor-marked-assignments/178835 Instructions Read these instructions Answer as many questions as possible Place your name, ID and answers in your document. Please submit your word file with your answers and graphs (embedded) where appropriate as a SINGLE document in the Submission Portal. Do not submit PDF files. Suppose you have joined a search engine develop team to design a search algorithm based on both the Vector model and the Boolean model. You have collected the following documents (unstructured) and plan to apply an index technique to convert them into an inverted index. Doc 1: Google is the most widely used Web search engine in the World. It claims to be the World’s most comprehensive search engine, indexing over 2.4 billion Web pages. Doc 2: Glimpse is an indexing and query system that allows for search through a file system or document collection quickly. Glimpse is the default search engine in Harvest. Doc 3: Dogpile is a metasearch engine that searches four search engines at a time and lists the results from each engine on each page. In the process of creating the inverted index, please complete the following steps: a) Remove all stopwords and punctuation, and then apply Porter’s stemming algorithm to the documents. Note that there are plenty of online stemming applications available, and you may use these for this question. The list of stopwords for this task is provided as follows: Is, The, Most, In, It, Of, Or, At, To, Be, Over, And, That, Where, Who, Whose, Which, Through, A, Each b) Create a merged inverted list including the within-document frequencies for each term. c) Use the index created in part (b) to create a dictionary and the related posting file. You may like to test the inverted index by using the following keywords: web, search, query engine index document. d) Please design three Boolean queries, (for example, web AND search) and list the relevant documents for each query. e) Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold). Question 2 (IR Evaluation) In this question, you are required to evaluate the performance of different search engines. First, please find two search engines you are familiar with, such as Google, Bing, Yahoo!, etc. Second, please choose a target in the following groups, and design two queries to search in both search engines. The target is chosen by the last number of your student ID. For example, if your student ID ends with the number is 1, please choose target 1; if it is 0, please choose target 10. Target 1: obtain the unit guide of SIT771. Target 2: obtain the unit guide of SIT772. Target 3: obtain the unit guide of SIT773. Target 4: obtain the unit guide of SIT774. Target 5: obtain the price of the new Macbook. Target 6: obtain the price of the new IPHONE. Target 7: obtain the price of a Lenovo Laptop. Target 8: obtain the install document of MongoDB. Target 9: obtain the manual of MongoDB. Target 10: obtain the operation guide of MongoDB. Third, select the first 20 results in both search engines, if they return the target, then mark them as relevant documents, otherwise, they are irrelevant. The following questions are based on your search results. a) List your target and designed search queries (You can use any keywords you think are related to the target). For Search Engine 1, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Search Engine 1 (all three curves should be on a single chart). b) For Search Engine 2, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Search Engine 2 (all three curves should be on a single chart, but a separate chart from that used in part (a)). c) Plot the averages for Search Engine 1 and Search Engine 2 on a separate chart, and compare the algorithms in terms of precision and recall. Do you think which search engine is superior? Why?