Assignment title: Information


INM305: Information Retrieval The deadline for this assignment is Sunday 30th April 2017 @ 5.00pm This assignment is a combination of written and practical work, allowing you to be assessed on various aspects of the INM305 module. The assignment is worth 100 marks - 100% of the total marks for this module. The educational purpose of this part of the assignment is to more firmly establish some of the more practical aspects of search and evaluation in information retrieval, by undertaking a wide ranging search on different types of systems. Evaluation of online and web search In lecture 5: Web evaluation case study a real world evaluation is presented which Andrew MacFarlane had conducted in industry in order to establish the usefulness of the Web Search engine being run by the company he worked for. The assignment will require you to use the same techniques as this study, albeit on a much smaller scale - we evaluated 50 queries, you will only do one. You will also extend your evaluation to the online services e.g. ProQuest Dialog and other image and video search services as well as social search and private engines. The part requires you to complete a number of search and evaluation tasks. These are to: 1. You will pick a topic and create a TREC style topic description for it. You are free to choose a topic of your choice e.g. related to your work, personal interests etc. This topic needs to be current so that the social search systems can be used to find information. You are required to do a facet analysis on your topic. You will need to create an evaluation policy for your topic (the narrative field should help you with this). 2. Using that facet analysis create appropriate search strategies to build 'bag of words' queries and 'Boolean' queries with which to search the following search services [Query types are listed with each search engine]: • Google: http://www.google.co.uk - the most used Web Search engine. [Boolean and ‘Bag of words queries]. Advanced Search URL: https://www.google.co.uk/advanced_search • Bing: http://www.bing.com/?cc=gb - the second most used Web Search engine. [Boolean and ‘Bag of words queries]. • Google Images: http://www.google.co.uk/imghp?hl=en&tab=wi - Google’s image search engine. [‘Bag of words’ query only]. • Bing Images: http://www.bing.com/?scope=images&nr=1&FORM=NOFORM - Bing’s image search engine. [‘Bag of words’ query only]. • YouTube: http://www.youtube.com/?gl=GB&hl=en-GB - Google’s Video Search. [‘Bag of words’ query only]. • Bing Video: http://www.bing.com/videos/browse - Bing’s Video Search. [‘Bag of words’ query only]. • ProQuest Dialog: http://search.proquest.com/professional/?accountid=143640 - an online service with a command line interface. [Boolean query only].• DuckDuckGo: https://duckduckgo.com/, - A Meta Search engine [‘Bag of words’ and Boolean query]. • Social Searcher : https://www.social-searcher.com/ - A search engine for social media search and analysis of user generated content. [‘Bag of words’ query]. • Startpage: https://startpage.com/ - A private search engine with Advanced Search. [Boolean and ‘Bag of words’ query]. Advanced search URL: https://startpage.com/uk/advanced-search.html?hmb=1 • One other Online system of your choice e.g. Trip Database or Factiva. You may use any of the systems listed on the Library A-Z link: http://libguides.city.ac.uk/az.php 3. Do evaluations on those searches using those search services. Remember to use the evaluation policy you've defined. Only look at the top 10 ranked documents (usually just one screens worth). You must use the following evaluation methods (lecture 5): • Precision at 5 documents retrieved (P @ 5): this figure should be in the range 0 to 1 for each search. • Precision at 10 documents retrieved (P @ 10): this figure should be in the range 0 to 1 for each search. • Estimated Average Precision (EAP) for the top 10 documents: Assume that for all queries there are at least 10 relevant documents: this figure should be in the range 0 to 1 for each search. • Rate of Repeated documents (RT): Record the number of duplicates per search. This figure should be in the range 0 to 10. • Link Broken (LB): Record the number of broken links per search. This figure should be in the range 0 to 10. • Not retrieved (NT): Record the total number of documents not retrieved by that search. This figure should be in the range 0 to 10. • Spam: Record the number of Spam documents per search. This figure should be in the range 0 to 10. You must use the following table to report your results. Please do not deviate from this format. Search Service Query Type P@5 P@10 EAP RTDup LB NT SPAM Google Web Search Boolean Bag of Words Bing Web Search Boolean Bag of Words Google mages Bag of Words Bing Images Bag of Words YouTube Bag of Words Bing Video Bag of Words ProQuest Dialog Boolean DuckDuckGo Boolean Bag of WordsSocial Searcher Bag of Words Startpage Boolean Bag of Words Other You will need to make a number of assumptions, particularly with the diagnostic measures. For example, how would you define a Spam document; what is a repeated document? Some sites have mirrors across the world which may be retrieved more than once etc. The evaluation metrics described above are precision based: we are looking for anything that might affect this precision (hence measures such as LB, NT & Spam). You will use the diagnostic measures to examine deficiencies found in the precision scores. Students should ensure that they tackle the following learning outcomes in their report: • Use a range of information retrieval systems and services to resolve information needs. • Evaluate information retrieval systems and services, by using appropriate methodologies. • Evaluate new developments in information retrieval research, understanding the problems which new ideas in IR are attempting to address. Coursework Deliverable After completion you must produce a document that contains a report on Search and Evaluation conducted - this should include: 1. Specification of the evaluation policy you have derived for your topic (Your TREC topic plus the assumptions you made on diagnostic measures). 2. Your facet analysis of this information need, together with a discussion on how you developed the facets. This should be a reflection on the process you used to produce the final facet set. 3. Reflect on the process you went through in order to generate the final queries, showing what you learned while undertaking searching. Declare both the query you used for search and the strategy you used in order to derive this query. A suitable method is to pick one of the search services (say Google) and do some initial searches to find a number of good terms - only submit your queries and do the evaluation when you are happy with the query terms. You will need to do this for your 'bag of words' queries and your 'Boolean' queries. Describe the tactics you used in the framework of your strategy e.g. use of particular operators (Boolean, proximity, truncation) and choice of particular terms in the query. 4. Using the evaluation methods compare and contrast the retrieval effectiveness of the search services. Here are some of the questions you will need to answer. Which is the best search service for your topic? How well does 'bag of words'search compare with that of 'Boolean' search? How does online search compare with web search? How does web search compare with Meta search? Do the images and videos you retrieved help to fulfil your information need? Does Social Search provide useful information over and about that of traditional search services? How do private engines compare with the main search engines? Provide a reflection on the impact of the user interfaces on your results (in terms of the operational aspect of the evaluation). Consider factors such as memory, beliefs, emotion, social factors and domain knowledge and reflect on them. You should also examine the quality of the material you retrieved to provide a more indepth evaluation of your results – this includes not just the documents or objects you retrieved, but the sources of information as well. Please provide a reflection on your relevance assessment in the light of the use of information from documents and the impact this has had on the final result in terms of satisfying your given information needs – e.g. what process did you use, and how did you think you could improve your work in this area? You must record the figure for each of the evaluation methods, for each of the search services and type of search specified above. Declare the assumptions you have made for the diagnostic evaluation measures (the precision measures are given an no assumptions need be made). You only need declare the final figures for your evaluations - if you want to include your detailed calculations then please put them in an Appendix. You must split your sections into: 1. Introduction: Topic and Evaluation Policy, 2. Facet Analysis, 3. Search Strategy, 4. Evaluation, 5. Summary. Please do not use any other structure for your submission. The module leader has made every endeavour to make this assignment as clear as possible. If any aspect of this specification is still unclear, please feel free to state any further assumptions that you feel you have to make in order to complete the coursework. No marks will be deducted in this instance. Assessment and Marking scheme The final marks for the assessment are allocated using the following criteria: • Details of your facet analysis for your topic (15 marks) • Reflection on your search strategy how the final query was generated (25 marks) • Evaluation of the required search engines and reflection on process and result of relevance assessment (50 marks) • Presentation, writing and organisation (10 marks) Your submission should be no more than 3500 words for the main body of the text. These limits will be strictly enforced. Please note that you will not be assessed on how well your searches do in the evaluation - your information needs will have varying levels of difficulty and therefore the retrieval effectiveness will vary between each of your searches. The assessment will be used to measure your understanding of search and evaluation methods in information retrieval.Please refer any queries about the coursework to Andrew MacFarlane (email:[email protected]). The deadline for this assignment is Sunday 30th April 2017 @ 5.00pm, though the relevant Moodle submission. Please submit a MS Word file only (do not submit a PDF file). General guidelines for what is expected in your assessed work are as follows: Class % Description Distinction 70+ Critical ability and analysis of information needs is strongly evident. Reflection of search strategies demonstrates a strong or comprehensive knowledge of information retrieval. Evaluation of results demonstrates strong or comprehensive knowledge of given evaluation methodologies. Reflection of relevance assessment in evaluation demonstrates strong or comprehensive knowledge of the difficulties in satisfying information needs (resolving ASKs). The learning outcomes will be realised in full. Presentation is of a professional standard. Merit 60-69% Some critical ability and analysis of information needs is evident. Reflection of search strategies demonstrates a sound knowledge of information retrieval. Evaluation of results demonstrates sound knowledge of given evaluation methodologies. Reflection of relevance assessment in evaluation demonstrates sound knowledge of the difficulties in satisfying information needs (resolving ASKs). Understanding of several of the learning outcomes (if not necessarily all) is of a high standard. Presentation is logical, well-structured and demonstrates good academic practice. Pass 50-59% An attempt is made at critical analysis of information needs, but the development of the ideas is limited. Reflection of search strategies demonstrates an adequate knowledge of information retrieval. Evaluation of results demonstrates an adequate knowledge of given evaluation methodologies. Reflection of relevance assessment in evaluation demonstrates an adequate knowledge of thedifficulties in satisfying information needs (resolving ASKs). Understanding of several of the learning outcomes is demonstrated, but does not provide a full answer (some important material is missing) and/or provides some information that is incorrect/inaccurate. Presentation may lack clarity and evidence of academic practice will be limited. Fail 49% or lower Limited attempt is made at critical analysis of information needs, and no development of the ideas is demonstrated. Reflection of search strategies demonstrates an inadequate knowledge of information retrieval. Evaluation of results demonstrates an inadequate knowledge of given evaluation methodologies. Reflection of relevance assessment in evaluation demonstrates an inadequate knowledge of the difficulties in satisfying information needs (resolving ASKs). Learning outcomes will not be realised. Presentation may lack clarity and evidence of academic practice will be limited Submission Process : The following information on coursework submission re-emphasises the information in your programme handbook. • All submissions are by Moodle. No other form of submission will be accepted. • Please note that you are not required to submit a coversheet when submitting by Moodle. Clicking the Submit button on the Assignment Submission screen indicates that you have read and agreed to the declaration on the left of the submission screen. This takes the place of the coversheet previously used for paper-based submissions. • Once the deadline has passed coursework cannot be changed, nor can additional materials be submitted. • Text beyond any specified word limit will not be marked. • Plagiarism will not be tolerated under any circumstances and where found will lead to a formal investigation of your work and reference to the Academic Misconduct Panel. This might result in penalties ranging from mark deduction to withdrawal from the University. See your programme handbook for details on the nature of plagiarism and the department's policy. • IT IS ENTIRELY YOUR RESPONSIBILITY TO ENSURE THAT YOUR WORK IS SUBMITTED FULLY, CORRECTLY AND ON TIME. • It is therefore strongly recommended that you set yourself a 'hard' personal deadline for submission well in advance of the Moodle closing date.©City University 21st February 2017