Assignment title: Information
IFB104 Building IT Systems
Semester 1, 2016
Assignment 2, Part A: The Top Ten of Everything
(15%, due 11:59pm Thursday, May 26th)
Overview
This is the first part of a two-part assignment. This part is worth 15% of your final grade for
IFB104. Part B will be worth a further 10%. Part B is intended as a last-minute extension to
the assignment, thereby testing the maintainability of your solution to Part A and your ability
to work under time pressure. The instructions for completing Part B will not be released until
Week 12. Whether or not you complete Part B you will submit only one file, and receive only
one mark, for the whole 25% assignment.
Motivation
People love creating rankings, comparative lists and league tables and numerous examples
can be found online. Here you will use your new-found skills in developing Graphical User
Interfaces, accessing Web documents and searching for patterns in text to create an
interactive tool that allows its user to view the latest "top ten" lists in a variety of categories.
Most importantly, your application will get all of its data and images directly from online
sources.
To do this you will need to develop (a) "back end" functions which fetch Web documents and
extract relevant text and images from them, and (b) a "front end" Graphical User Interface
that makes it easy for the user to view such data.
To complete this part of the assignment you will need to use Python, the Tkinter module,
regular expressions and (probably) the Python Imaging Library. We have not used the last of
these in IFB104 before, so you may need to install the necessary software first (see below).
Illustrative Example
Taking our cue from the popular book series The Top Ten of Everything, our goal is to create
a program which displays the current top ten lists in (at least) three different categories. The
user can select a category to display, and the program must respond by downloading and
displaying the current top ten items in this category from the Internet. To make the program
visually interesting it must also display images related to the chosen categories.
For the purposes of this assignment you have a free choice of what kind of "top ten" lists to
display, however they should all be lists that are updated online on a regular basis. Possible
categories of top-ten list that you may consider could include:
• Sports rankings
• Online gaming leader boards
• TV and movie ratings
• Financial data such as Stock Market prices
• Popularity contests
• … or any other ordered/sorted list of items.
Whichever categories you choose, they must be a lists which are updated on a regular basis
and for which information can be found readily on the Internet. Your task is to extract this
data from online sources and display it in an easily-interpreted format.
IFB104 Building IT Systems
Semester 1, 2016
Your program must have the following features.
• When first started it must display an introductory "splash" page and provide an
interactive mechanism via which the user can select which top-ten categories to view.
Selection could be done using push buttons, menus, or any other suitable widget. At
least three clearly distinct categories must be made available.
• When the user selects a category your program must fetch the latest "top ten" items
relating to it from online and display them in an easy-to-read format. In addition,
your program must display an image relating to this category and the online address
(URL) from which the data was obtained.
• The downloaded top-ten data must come from (at least) three distinct Web sites. It is
not acceptable to download three lists from the same site. The Web sites chosen must
be ones that update their lists of data on a regular basis.
• The various images displayed can be extracted from any Web documents you like.
They do not have to come from the same sites as the top-ten lists themselves.
To illustrate the idea, consider the following "splash" page, waiting for the user to select a
category.
IFB104 Building IT Systems
Semester 1, 2016
Our "splash" page consists of a fixed image with three interactive buttons at the bottom. The
instruction "Choose from the Top Ten …" is a static label. The image is the cover for the
2015 edition of the book The Top Ten of Everything. Importantly, this image is not contained
in a local file. It is downloaded each time the program starts via the following URL:
http://www.boorooandtiggertoo.com/wp-content/uploads/2014/07/Top-10-ofEverything-2015.jpg
All images and data displayed by your program must be freshly downloaded from online
sources. You can submit only one Python file, with no other files, including image files.
When one of the buttons is pushed the corresponding top-ten list, an associated image, and
the URL from which the top-ten data was obtained must be displayed. For instance, pushing
the "AFL teams" button in our example causes the following new window to pop up.
In this case the "AFL Ladder" image at the top was downloaded from www.
krockfootball.com.au, and the list of football teams was extracted from the "Real
Footy Ladder" page published by The Age newspaper, at the address shown at the bottom of
the "view" above. (The original Age document accessed is shown in the appendix below.)
IFB104 Building IT Systems
Semester 1, 2016
Importantly, of course, the Australian Rules football ladder changes at least once a week
during the winter months. The top-ten list of teams shown above was current when we ran
our demonstration program on Sunday, May 1st. Your program must always download the
latest data from the Web, and must be resilient to changes in the Web documents it relies on.
Pushing the "conspiracies" button on our splash page (again on May 1st) then produced the
following new window.
In this case we have used a list of conspiracy theories sourced from the Ranker site, which
has many lists of rankings voted on by the public. The original page from which we
extracted our top ten list is shown in the appendix below. Again, this page is updated
regularly. The X-Files inspired poster image on the left was downloaded from a different
Web site.
Also notice that the layout and colours used in this second top-ten list differs markedly from
the first. Each of your top-ten lists must have a distinct visual appearance.
Finally, pushing the button for the Top Ten Australian Google searches caused the window
shown overleaf to pop up when we tested our program on May 1st. Again we have given the
GUI a distinct appearance for this top-ten list, based around the standard "Google colours".
IFB104 Building IT Systems
Semester 1, 2016
At the end of this process we have four distinct "views" of our GUI on show, as four different
Tkinter windows. (We can close the individual top-ten views and re-open them again by
pushing the corresponding button on the splash page.)
IFB104 Building IT Systems
Semester 1, 2016
In each of these "views" we downloaded the image using a fixed URL address, having
previously used a standard Web browser to find the online images we wanted.
For the top-ten lists we first downloaded the appropriate Web document and then used a
regular expression to extract the parts of the text we needed, based on our prior analysis of
the Web document's HTML source code structure. For instance, in simple cases you may
find that the items you want to extract appear between … tags in the
HTML source, so you can easily extract them using a corresponding regular expression and
Python's findall function. Generally, however, the patterns to search for are not quite this
simple. (None of the ones for our examples above were as straightforward as this, although
none of them were very challenging either.)
A particularly good source of easily-searched Web documents are Rich Site Summary, a.k.a.
Really Simple Syndication, Web feed documents. RSS documents are written in the XML
extension of HTML and are used for publishing information that is updated frequently, in a
form that allows them to be used as the source for other Web documents. They usually have
a simple standardised format, so we can rely on them always formatting their contents in the
same way, making it relatively easy to extract specific elements from the document's source
code via pattern matching.
Also note that in some cases the text you extract may contain HTML markups for special
fonts or characters, such as "—" for a dash and "’" for a single quote. If so
you will need to use Python's string functions to delete or replace these mark-ups before
displaying the text in your GUI. No HTML mark-ups should be displayed to the user.
Requirements and marking guide
To complete this task you are required to extend the provided top_ten.py template file
with your version of a program possessing equivalent capabilities to those shown above.
Your submitted solution will consist of a single Python file, and must satisfy the following
criteria. Marks available are as shown.
• Creating four distinct interfaces (4%). Your program must be able to create (at
least) four distinct visual interfaces, the opening "splash" view and the three top-ten
lists. An appropriate interactive mechanism must be provided to allow the user to
choose top-ten lists to view (buttons, menus, etc). Each of the four "views" must have
a distinct appearance in terms of layout, text fonts, colours, etc. Each view must have
a title describing its purpose. NB: In our demonstration program above we created a
separate pop-up window for each "view", but you do not need to follow this example.
All four "views" can be displayed in the same Tkinter window if you prefer.
• Displaying four distinct online images (4%). Your program must be able to
download and display (at least) four distinct online images, one in each of the
"views". Each of the images must come from a different Web site. The images for
the three top-ten lists must be clearly related to the list's subject matter. The "splash"
image is a free choice; it could relate to the generic theme of the assignment, as in our
example above, or could relate to some consistent theme linking your choice of topten lists. The images must be of an appropriate size for the GUI and must not be
distorted either horizontally or vertically.
• Displaying three distinct, current top-ten lists (5%). Most importantly, your
program must be able to download and display (at least) three distinct top-ten lists,
IFB104 Building IT Systems
Semester 1, 2016
one in each of the corresponding "views". Each of the lists must come from a
different Web site and must be "live" data, downloaded at the time the user selected
which list to view. The list items must be clearly numbered and must not contain any
HTML mark-ups or other extraneous characters when displayed. The address of the
Web document from which the list was obtained must be displayed. Your solution
must continue to work even when the Web documents you access have been updated.
For this reason it is unacceptable to "hardwire" your solution to the particular text and
images appearing on the Web site on a particular day. Instead you will need to use
regular expressions or some equivalent pattern matching technique to actively search
for the relevant text in the document, regardless of any updates that may have
occurred since you wrote your program.
• Code quality and presentation (2%). Your program code must be presented in a
professional manner. See the coding guidelines in the IFB104 Code Presentation
Guide (on Blackboard under Assessment) for suggestions on how to achieve this. In
particular, each significant code segment must be clearly commented to say what it
does, e.g., "Create the splash image", "Extract the top ten items from the document's
source code", etc.
• Extra feature (10%). Part B of this assignment will require you to make a 'lastminute extension' to your solution. The instructions for Part B will not be released
until just before the final deadline for Assignment 2.
You must complete the task using only basic Python features and the modules already
imported into the provided template. You may not import any additional modules or files
into your program. In particular, you may not import any local image files. All displayed
images and "top ten" text must be downloaded from online sources each time the program is
run.
Most importantly, you are not required to copy the precise GUI layouts shown in the example
above. Instead you are strongly encouraged to be creative and to choose your own top-ten
lists and design your own graphical interfaces.
Support tools
To get started on this task you need to download various Web documents of your choice and
work out how to extract two things:
• The images you want to display for your opening "splash" page and to illustrate each
of your top-ten lists.
• The textual top-ten lists.
Note that each top-ten list and its associated image do not have to come from the same Web
document.
You also need to allow for the fact that the contents of the Web documents from which you
get your top-ten data will be updated occasionally, so you cannot hardwire the locations of
the document elements in your solution. The answer to this problem, of course, is to use
Python's regular expression function findall to extract the necessary elements, no matter
where they appear in the HTML source code.
To help you develop your regular expressions, we have included two small Python programs
with these instructions.
IFB104 Building IT Systems
Semester 1, 2016
1. downloader.py is a small script that downloads and displays the source code of a
Web document. Use it to see a copy of your chosen Web document in exactly the
form that it will be received by your Python program. (This is helpful because the
version of a Web document opened by a Python program may not be the same as one
opened by a particular Web browser! This can occur because some Web servers
produce different HTML/XML source code for different browser clients. Some Web
servers will also block access to Web pages by Python programs in the belief that they
are malware!)
2. regex_tester.py is an interactive program introduced in the Week 9 lecture and
workshops which makes it easy to experiment with different regular expressions on
small text segments. You can use this together with the downloaded text from the
Web to help you perfect your regular expressions. (The advantage of using this tool,
rather than some of the online tools available, is that it is guaranteed to follow
Python's regular expression syntax because it is written in Python itself.)
Image formats and the Python Imaging Library
As part of this assignment you need to download and display some image files, so we have
included two utility functions, gif_to_PhotoImage and image_to_PhotoImage, in
the Python template accompanying these instructions. Image files come in a wide variety of
formats, including GIF, JPG, PNG, BMP, and so on. Python's Tkinter module can display
'photo images' but these must be in GIF format encoded as base-64 character strings, so these
functions will help you convert the images into the format needed by Tkinter.
If you have identified a GIF image from a particular URL of the form 'http://….gif' it
can be made ready for display in a Tkinter widget as follows. Greek letters have been used
for the Python variables below; obviously you should replace these with meaningful variable
names. Let α be a string containing the URL.
# Read the GIF image as a raw byte stream
β = urlopen(α).read()
# Create the Tkinter version of the image
γ = gif_to_PhotoImage(β)
The resulting variable γ can then be used as the image attribute in a Tkinter Label widget
or any other such widget that can display images.
You will find, however, that most images on the web are in JPEG, PNG or some other format
and therefore cannot be used in Tkinter without conversion. If you are unable to find suitable
GIF images you will need to use an additional software package called the Python Imaging
Library to allow conversion of the image.
The Python Imaging Library is a well-established (stable since 2009) toolkit for manipulating
image data in Python programs. It contains a large number of functions for converting
images, resizing them, rotating them, changing colours, etc.
• To use it, Microsoft Windows users should download and install PIL version 1.1.7 for
Python 2.7. Download the library from http://www.pythonware.com/products/pil/ and
select the installation for Python 2.7.
IFB104 Building IT Systems
Semester 1, 2016
• Alternatively, Mac OS X users should install disk image PIL-1.1.7-py2.7-python.orgmacosx10.6.dmg. You can find a copy at
http://www.astro.washington.edu/users/rowen/python/.
We have not used the PIL library previously in this unit. PIL documentation can be found at
http://effbot.org/imagingbook/ but you should not need to consult this unless you intend to do
complex image processing yourself.
To convert a non-GIF image into Tkinter's 'photo image' format the process is similar to the
one above. For instance, if you have the URL of an image in JPEG format,
'http://….jpg', then you can convert it to a format ready for use in a Tkinter widget as
follows. Greek letters have been used for the Python variables below; obviously you should
replace these with meaningful variable names. Let α be a string containing the URL.
# Read the JPEG or PNG image as a raw byte stream
β = urlopen(α).read()
# Create the Tkinter version of the image
γ = image_to_PhotoImage(β)
In addition, the image_to_PhotoImage function allows the image to be resized. To do
so, simply supply two additional arguments specifying the desired width and height in pixels.
Updating images in widgets
If, as part of your solution, you decide to replace the image in a Label widget you may run
across a problem where, after updating the Label's 'image' attribute, the label comes out
blank. This occurs if you create the new image within a function because the image is
automatically "garbage collected" when the function terminates, i.e., when we leave the
function's scope. In other words, an image "local" to a function will no longer exist after the
function ends. A simple solution is to make the variable holding the image a global variable,
so that it will exist after the function terminates.
Another solution is to entirely "destroy" the existing widget and replace it with a new one
containing the updated image. To do this you can use widget.destroy() to remove the old
widget entirely. Alternatively, if you used the 'grid' geometry manager to place your widgets
in the window you can also use widget.grid_forget() to remove the widget from the
grid but without destroying the widget itself.
Extra features
Beyond the basic requirements above you are encouraged to be creative! You can add extra
features to display more information about the "top ten" items in a category and you can
display more than three categories of top-ten list (provided they all come from different Web
sites). Sadly, we cannot award extra marks for such enterprise, but we will create a "Hall of
Fame" in Blackboard to display some of the best solutions!
Development hints
This is a substantial task, so you should not attempt to do it all at once. In particular, you
should work on the "back end" parts first, before attempting the Graphical User Interface
"front end". If you are unable to complete the whole task, just submit those stages you can
get working. You will receive partial marks for incomplete solutions.
IFB104 Building IT Systems
Semester 1, 2016
It is strongly suggested that you use the following development process:
1. Decide what kind of lists you want to display and search the Web for appropriate
HTML or XML documents that contain the necessary text and images. For the topten lists choose only documents that are updated regularly.
2. Using the provided downloader.py application, download each document so that
you can examine its structure. You will want to study the HTML or XML source
code of the document to determine how the parts you want to extract are marked up.
For the images you simply need to find the URL address of the image. However, for
the top-ten text you will want to identify the markup tags, and perhaps other
unchanging parts of the document, that uniquely identify the beginning and end of the
text you want to extract.
3. Using the provided regex_tester.py application, devise regular expressions
which extract just the necessary elements from the relevant parts of the Web
document. Using these regular expressions and Python's urllib module and
findall function you can now develop a simple prototype of your "back end"
solution that just extracts and prints the required data, i.e., the top-ten text and the
addresses of the images, from the Web documents in IDLE's shell window. Doing
this will give you confidence that you are heading in the right direction, and is a
useful outcome in its own right.
4. Once you have got the data extraction functions working you can add a Graphical
User Interface "front end" to your program. Decide whether you want to use push
buttons, radio buttons, menus, lists or some other mechanism for choosing which topten categories to display, and extend your back-end code accordingly. Developing the
GUI is the "messiest" step, and is best left to last. Trying to develop the front-end and
back-end code simultaneously is very difficult. Focus on one thing at a time.
Deliverable
You should develop your solution by completing and submitting the provided Python
template file top_ten.py as follows.
1. Complete the "statement" at the beginning of the Python file to confirm that this is
your own individual work by inserting your name and student number in the places
indicated. Submissions without a completed statement will not be marked.
2. Complete your solution by developing Python code at the place indicated. You must
complete your solution using only the modules imported by the provided template.
You may not use or import any other modules or files to complete this program.
3. Submit only a single, self-contained Python file. Do not submit multiple files. Do
not submit an archive containing multiple files. This means that all images and
text you want to display must be downloaded from online Web documents, not from
local files.
Apart from working correctly your program code must be well-presented and easy to
understand, thanks to (sparse) commenting that explains the purpose of significant code
segments and helpful choices of variable and function names. Professional presentation of
your code will be taken into account when marking this assignment.
If you are unable to solve the whole problem, submit whatever parts you can get working.
You will receive partial marks for incomplete solutions.
IFB104 Building IT Systems
Semester 1, 2016
How to submit your solution
A link will be created on Blackboard under Assessment for uploading your solution file
before the deadline (11:59pm on Thursday, May 26th). Note that you will be able to submit
as many drafts of your solution as you like. You are strongly encouraged to submit draft
solutions before the deadline.
IFB104 Building IT Systems
Semester 1, 2016
Appendix: Original Web documents
The original Web documents from which we extracted our top-ten lists in the examples
shown above were as follows at the time our demonstration program was executed (Sunday,
May 1st). Notice that each of them contains many more elements than we needed, so we
used regular expressions to extract just the required text from the HTML source code.
IFB104 Building IT Systems
Semester 1, 2016