INM305: Information Retrieval
The deadline for this assignment is Sunday 30th April 2017 @ 5.00pm
This assignment is a combination of written and practical work, allowing you to
be assessed on various aspects of the INM305 module. The assignment is worth
100 marks - 100% of the total marks for this module. The educational purpose of
this part of the assignment is to more firmly establish some of the more practical
aspects of search and evaluation in information retrieval, by undertaking a wide
ranging search on different types of systems.
Evaluation of online and web search
In lecture 5: Web evaluation case study a real world evaluation is presented
which Andrew MacFarlane had conducted in industry in order to establish the
usefulness of the Web Search engine being run by the company he worked for.
The assignment will require you to use the same techniques as this study, albeit
on a much smaller scale - we evaluated 50 queries, you will only do one. You will
also extend your evaluation to the online services e.g. ProQuest Dialog and other
image and video search services as well as social search and private engines.
The part requires you to complete a number of search and evaluation tasks.
These are to:
1. You will pick a topic and create a TREC style topic description for it. You
are free to choose a topic of your choice e.g. related to your work,
personal interests etc. This topic needs to be current so that the social
search systems can be used to find information. You are required to do a
facet analysis on your topic. You will need to create an evaluation policy
for your topic (the narrative field should help you with this).
2. Using that facet analysis create appropriate search strategies to build 'bag of
words' queries and 'Boolean' queries with which to search the following
search services [Query types are listed with each search engine]:
• Google: http://www.google.co.uk - the most used Web Search engine.
[Boolean and ‘Bag of words queries]. Advanced Search URL:
https://www.google.co.uk/advanced_search
• Bing: http://www.bing.com/?cc=gb - the second most used Web Search
engine. [Boolean and ‘Bag of words queries].
• Google Images: http://www.google.co.uk/imghp?hl=en&tab=wi -
Google’s image search engine. [‘Bag of words’ query only].
• Bing Images:
http://www.bing.com/?scope=images&nr=1&FORM=NOFORM - Bing’s
image search engine. [‘Bag of words’ query only].
• YouTube: http://www.youtube.com/?gl=GB&hl=en-GB - Google’s Video
Search. [‘Bag of words’ query only].
• Bing Video: http://www.bing.com/videos/browse - Bing’s Video Search.
[‘Bag of words’ query only].
• ProQuest Dialog:
http://search.proquest.com/professional/?accountid=143640 - an
online service with a command line interface. [Boolean query only].• DuckDuckGo: https://duckduckgo.com/, - A Meta Search engine [‘Bag
of words’ and Boolean query].
• Social Searcher : https://www.social-searcher.com/ - A search engine
for social media search and analysis of user generated content. [‘Bag of
words’ query].
• Startpage: https://startpage.com/ - A private search engine with
Advanced Search. [Boolean and ‘Bag of words’ query]. Advanced search
URL: https://startpage.com/uk/advanced-search.html?hmb=1
• One other Online system of your choice e.g. Trip Database or Factiva. You
may use any of the systems listed on the Library A-Z link:
http://libguides.city.ac.uk/az.php
3. Do evaluations on those searches using those search services. Remember to
use the evaluation policy you've defined. Only look at the top 10 ranked
documents (usually just one screens worth). You must use the following
evaluation methods (lecture 5):
• Precision at 5 documents retrieved (P @ 5): this figure should be in the
range 0 to 1 for each search.
• Precision at 10 documents retrieved (P @ 10): this figure should be in the
range 0 to 1 for each search.
• Estimated Average Precision (EAP) for the top 10 documents: Assume
that for all queries there are at least 10 relevant documents: this figure
should be in the range 0 to 1 for each search.
• Rate of Repeated documents (RT): Record the number of duplicates per
search. This figure should be in the range 0 to 10.
• Link Broken (LB): Record the number of broken links per search. This
figure should be in the range 0 to 10.
• Not retrieved (NT): Record the total number of documents not retrieved by
that search. This figure should be in the range 0 to 10.
• Spam: Record the number of Spam documents per search. This figure
should be in the range 0 to 10.
You must use the following table to report your results. Please do not deviate
from this format.
Search
Service
Query
Type
P@5 P@10 EAP RTDup
LB NT SPAM
Google Web
Search
Boolean
Bag of
Words
Bing Web
Search
Boolean
Bag of
Words
Google
mages
Bag of
Words
Bing Images Bag of
Words
YouTube Bag of
Words
Bing Video Bag of
Words
ProQuest
Dialog
Boolean
DuckDuckGo
Boolean
Bag of
WordsSocial
Searcher
Bag of
Words
Startpage
Boolean
Bag of
Words
Other
You will need to make a number of assumptions, particularly with the diagnostic
measures. For example, how would you define a Spam document; what is a
repeated document? Some sites have mirrors across the world which may be
retrieved more than once etc. The evaluation metrics described above are
precision based: we are looking for anything that might affect this precision
(hence measures such as LB, NT & Spam). You will use the diagnostic
measures to examine deficiencies found in the precision scores.
Students should ensure that they tackle the following learning outcomes in their
report:
• Use a range of information retrieval systems and services to resolve
information needs.
• Evaluate information retrieval systems and services, by using appropriate
methodologies.
• Evaluate new developments in information retrieval research, understanding
the problems which new ideas in IR are attempting to address.
Coursework Deliverable
After completion you must produce a document that contains a report on Search
and Evaluation conducted - this should include:
1. Specification of the evaluation policy you have derived for your topic (Your
TREC topic plus the assumptions you made on diagnostic measures).
2. Your facet analysis of this information need, together with a discussion on how
you developed the facets. This should be a reflection on the process you used to
produce the final facet set.
3. Reflect on the process you went through in order to generate the final queries,
showing what you learned while undertaking searching. Declare both the query
you used for search and the strategy you used in order to derive this query. A
suitable method is to pick one of the search services (say Google) and do some
initial searches to find a number of good terms - only submit your queries and do
the evaluation when you are happy with the query terms. You will need to do this
for your 'bag of words' queries and your 'Boolean' queries. Describe the tactics
you used in the framework of your strategy e.g. use of particular operators
(Boolean, proximity, truncation) and choice of particular terms in the query.
4. Using the evaluation methods compare and contrast the retrieval effectiveness
of the search services. Here are some of the questions you will need to answer.
Which is the best search service for your topic? How well does 'bag of words'search compare with that of 'Boolean' search? How does online search compare
with web search? How does web search compare with Meta search? Do the
images and videos you retrieved help to fulfil your information need? Does Social
Search provide useful information over and about that of traditional search
services? How do private engines compare with the main search engines?
Provide a reflection on the impact of the user interfaces on your results (in terms
of the operational aspect of the evaluation). Consider factors such as memory,
beliefs, emotion, social factors and domain knowledge and reflect on them. You
should also examine the quality of the material you retrieved to provide a more
indepth evaluation of your results – this includes not just the documents or
objects you retrieved, but the sources of information as well. Please provide a
reflection on your relevance assessment in the light of the use of information
from documents and the impact this has had on the final result in terms of
satisfying your given information needs – e.g. what process did you use, and
how did you think you could improve your work in this area? You must record
the figure for each of the evaluation methods, for each of the search services and
type of search specified above. Declare the assumptions you have made for the
diagnostic evaluation measures (the precision measures are given an no
assumptions need be made). You only need declare the final figures for your
evaluations - if you want to include your detailed calculations then please put
them in an Appendix.
You must split your sections into: 1. Introduction: Topic and Evaluation Policy, 2.
Facet Analysis, 3. Search Strategy, 4. Evaluation, 5. Summary. Please do not
use any other structure for your submission.
The module leader has made every endeavour to make this assignment as clear
as possible. If any aspect of this specification is still unclear, please feel free to
state any further assumptions that you feel you have to make in order to
complete the coursework. No marks will be deducted in this instance.
Assessment and Marking scheme
The final marks for the assessment are allocated using the following criteria:
• Details of your facet analysis for your topic (15 marks)
• Reflection on your search strategy how the final query was generated (25
marks)
• Evaluation of the required search engines and reflection on process and
result of relevance assessment (50 marks)
• Presentation, writing and organisation (10 marks)
Your submission should be no more than 3500 words for the main body of the
text. These limits will be strictly enforced.
Please note that you will not be assessed on how well your searches do in the
evaluation - your information needs will have varying levels of difficulty and
therefore the retrieval effectiveness will vary between each of your searches. The
assessment will be used to measure your understanding of search and
evaluation methods in information retrieval.Please refer any queries about the coursework to Andrew MacFarlane
(email:[email protected]).
The deadline for this assignment is Sunday 30th April 2017 @ 5.00pm, though
the relevant Moodle submission. Please submit a MS Word file only (do not
submit a PDF file).
General guidelines for what is expected in your assessed work are as
follows:
Class % Description
Distinction 70+ Critical ability and analysis of information needs
is strongly evident. Reflection of search
strategies demonstrates a strong or
comprehensive knowledge of information
retrieval. Evaluation of results demonstrates
strong or comprehensive knowledge of given
evaluation methodologies. Reflection of
relevance assessment in evaluation
demonstrates strong or comprehensive
knowledge of the difficulties in satisfying
information needs (resolving ASKs). The
learning outcomes will be realised in full.
Presentation is of a professional standard.
Merit 60-69% Some critical ability and analysis of information
needs is evident. Reflection of search strategies
demonstrates a sound knowledge of information
retrieval. Evaluation of results demonstrates
sound knowledge of given evaluation
methodologies. Reflection of relevance
assessment in evaluation demonstrates sound
knowledge of the difficulties in satisfying
information needs (resolving ASKs).
Understanding of several of the learning
outcomes (if not necessarily all) is of a high
standard. Presentation is logical, well-structured
and demonstrates good academic practice.
Pass 50-59% An attempt is made at critical analysis of
information needs, but the development of the
ideas is limited. Reflection of search strategies
demonstrates an adequate knowledge of
information retrieval. Evaluation of results
demonstrates an adequate knowledge of given
evaluation methodologies. Reflection of
relevance assessment in evaluation
demonstrates an adequate knowledge of thedifficulties in satisfying information needs
(resolving ASKs). Understanding of several of
the learning outcomes is demonstrated, but does
not provide a full answer (some important
material is missing) and/or provides some
information that is incorrect/inaccurate.
Presentation may lack clarity and evidence of
academic practice will be limited.
Fail 49% or
lower
Limited attempt is made at critical analysis of
information needs, and no development of the
ideas is demonstrated. Reflection of search
strategies demonstrates an inadequate
knowledge of information retrieval. Evaluation of
results demonstrates an inadequate knowledge
of given evaluation methodologies. Reflection of
relevance assessment in evaluation
demonstrates an inadequate knowledge of the
difficulties in satisfying information needs
(resolving ASKs). Learning outcomes will not be
realised. Presentation may lack clarity and
evidence of academic practice will be limited
Submission Process :
The following information on coursework submission re-emphasises the
information in your programme handbook.
• All submissions are by Moodle. No other form of submission will be accepted.
• Please note that you are not required to submit a coversheet when submitting
by Moodle. Clicking the Submit button on the Assignment Submission screen
indicates that you have read and agreed to the declaration on the left of the
submission screen. This takes the place of the coversheet previously used for
paper-based submissions.
• Once the deadline has passed coursework cannot be changed, nor can
additional materials be submitted.
• Text beyond any specified word limit will not be marked.
• Plagiarism will not be tolerated under any circumstances and where found will
lead to a formal investigation of your work and reference to the Academic
Misconduct Panel. This might result in penalties ranging from mark deduction
to withdrawal from the University. See your programme handbook for details
on the nature of plagiarism and the department's policy.
• IT IS ENTIRELY YOUR RESPONSIBILITY TO ENSURE THAT YOUR
WORK IS SUBMITTED FULLY, CORRECTLY AND ON TIME.
• It is therefore strongly recommended that you set yourself a 'hard' personal
deadline for submission well in advance of the Moodle closing date.©City University
21st February 2017