Assignment title: Information


signment Specification ­ Hard and Easy TSP: Problem Solving & Software Development Assignment Specification ­ Hard and Easy TSP Due Date: COB Monday Week 13. Handin details to be announced ­ but assignment will be assessed manually. Background This assignment requires you to create different types of solutions to an inverse problem. An inverse problem is one where you try to find a model that matches data or fits some constraint or requirement. Most of the interesting design and search problems in the world are inverse problems. Examples of inverse problems include: finding a diagnosis of illness that best fits a set of symptoms; tracing the origin of an intrusion to a computer network; finding the source of infection in an outbreak; finding a geological map that best fits seismic data; finding a design for aircraft surfaces that fit speed and efficiency requirements; and finding effective layouts for wind and wave farms. A lot of problems can be framed as search problems ­ you take some initial guesses of the solution ­ then you find a way to measure how good the solutions are ­ then you find the ones that are best and push search in the direction of those guesses. With good guesses, and careful framing of the search problem computers are extremely effective in helping us to solve inverse problems. The inverse problem in this assignment is very welldefined but still challenging enough to be interesting. We describe the problem next. Easy and Hard Travelling Salesperson Problem Instances The Travelling Salesperson Problem (TSP) is a classic problem in computing. The task in TSP is to find the minimum length Hamiltonian path/tour through a graph. More concretely, the problem can be framed as follows. A salesperson has a set of N cities they need to visit. They can visit the cities in any order and take any of the available roads between the cities. The aim is to visit the cities in an order that minimises the distance travelled. The TSP is a famous NP­Complete problem which means, that to the best of of our current knowledge, finding the optimal tour for N cities takes exponential compute time. However, a lot of very dedicated people have worked on TSP over several decades. As a result, we have a number of algorithms that produce good solutions in polynomial time. In fact, for TSP in two dimensions we have approximation algorithms that come quite close to optimal performance on most input data. One interesting remaining area of research is identifying strengths and weaknesses in TSP approximation algorithms. In particular we want to know what sorts of inputs are easy for each approximation algorithm and what sorts of inputs trip them up. Can we find really easy and really hard inputs (in the literature you will sometimes see inputs referred to as "instances") to our TSP approximation algorithms and what features do these inputs have? The ultimate aim is to be able to look at a handful of features of a TSP input and quickly decide which algorithms with thrive on the data and which to avoid. In this assignment you have to find hard and easy instances of TSP for an approximation algorithm and identify some useful features of these instances. You will be free to choose the TSP approximation that is of interest to you. You will be expected to make use of findings and algorithms that are freely available online. Your contribution is in terms of the how you combine these ingredients to run experiments and produce results. Because you will be using other people's work in your assignment it is vital that you properly reference your work. This means you should unambiguously say where you used someone else's work and what that work is. References can be in the form of a citation of an article or a reference to a URL if it is an online resource (include the date of access in the reference). The assignment is in three parts. For each part you have to include your code and a short report of your results. Read all three parts carefully before starting ­ later parts may rely on data you can collect while running the experiments in the first part. In this part of the assignment you have to write code to generate easy and hard Euclidian TSP instances. Euclidian TSP is a simplified version of TSP where every city occupies a single (x,y) position on a 2­D surface and every city is assumed to be directly connected to every other city. This means that the salesperson doesn't have to worry if a road exists between any two cities ­ they can travel anywhere directly! We can simplify the problem even further by assuming that every city sits in a square bounded by the points (0.0,0.0) on the bottom­left and (1.0,1.0) on the top right. Thus the problems look like a set of points on a 1x1 square where we can define an arbitrary path between the cities as our chosen tour. Six example problems and tours are shown below (taken from Mersmann 2012[1]). Part 1 ­ Finding Easy and Hard Euclidian TSP Instancesg p y g p The input for this problem is very simple. It is a file consisting of a city on each line. Each city consists of two numbers between 0 and 1 representing, respectively, the x and y coordinate of the city. The file ends in a blank line. The distances between the cities are simply the Euclidian distances between the points representing the cities and this can be calculated as the algorithm runs (distances do not need to be included in the input file). In all of the benchmarks in this assignment we have only 25 cities. This is to allow for a relatively fast solution. What you have to do You have to: Run experiments to produce hard and easy instances of TSP for a given algorithm. Your hard instances should produce performance measurably worse than optimal on your chosen approximation algorithm, your easy instances should cause your chosen approximation algorithm to produce results close to optimal. Produce a short report, called part1.pdf, (max 3 pages including graphs) on outcomes of your experiments ­ you will be expected to produce graphs to demonstrate your outcomes and describe the setup of your experiments. We will assign your marks based on your report. We will check your code as necessary. Handin the software you used to run your experiments (with source with possible) and adequate instructions for the marker to run your code. You must include a simple Makefile to allow the marker to build your code. For your report. You will be awarded marks based on what you have achieved. How clearly and carefully your arguments have been made (are your assertions, explained, or backed up with data or a citation?). You should run experiments multiple times to ensure that your results are not a fluke. If you can run enough experiments then you can apply stats (think about the stats test t­tests only make sense if the distributions are normal). Make sure you include enough information for the reader to be able replicate your experiments. Your introduction can be short ­ basically a precise statement of what you are trying to do. Most of the report should be methodology (probably up to a page) and results.. a lot of which will be graphs and tables with explanations. Resources TSP Approximation algorithms. You can access these from a variety of sources. You are allowed to use published code (no need to write your own versions of these). Common algorithms are: 2­Opt (https://en.wikipedia.org/wiki/2­opt (https://en.wikipedia.org/wiki/2­opt) ), Christofides (https://en.wikipedia.org/wiki/Christofides_algorithm (https://en.wikipedia.org/wiki/Christofides_algorithm) ), and LinKernighan (https://en.wikipedia.org/wiki/Lin–Kernighan_heuristic (https://en.wikipedia.org/wiki/Lin%E2%80%93Kernighan_heuristic) ). You will choose one of these and try to find hard and easy TSP input (instances) for these algorithms. An exact TSP algorithm. This algorithm will find the absolute optimal solution for your TSP instance. Such algorithms are only practical on small data but in this case we are only looking at 25 cities. Having an exact solution means we can compare how good the solution produced by our chosen approximation algorithm is compared to the best solution. A common exact TSP algorithm is the Concorde algorithm from Waterloo (http://www.math.uwaterloo.ca/tsp/concorde.html (http://www.math.uwaterloo.ca/tsp/concorde.html) ). It's written in C. References. Mersmann 2012[1] (see reference list below) describes an algorithm for finding a hard or easy instance of TSP for the 2­Opt approximation algorithm. Nallaperuma 2012[2] generated both hard and easy instances for both 2­Opt and Christofides algorithm. Note that you can choose your own approach if you want but these are starting points. How to measure performance In theory, we could measure how hard an instance is by simply timing how long it takes to run but this discounts the quality of the solution produced. A more reliable measure is the length of the tour produced by the approximation algorithm compared to the optimally short tour. This can be done using the following steps: 1. Generating a TSP input (locations of 25 cities in the square. 2. Running the Concorde solver on TSP input from step 1 to get an optimum tour length of l 3. Running your chosen approximation algorithm on the same input and getting an approximate tour length l Hard input (instances) will give a larger value for l compared to l . Note that there other ways to measure performance (see Jiang 2014[3] for some notes on this) but the measure above is what you will use here. opt approx approx optsignment Specification ­ Hard and Easy TSP: Problem Solving & Software Development some notes on this) but the measure above is what you will use here. As mentioned before what we would really like is to be able to look at the features of a TSP input and quickly tell if it is easy or hard for an algorithm. Why? ­ because this saves us from having to actually solve the TSP to find out if it is hard. In this part you have to pick a set of three features and graph how these features vary with the hardness of TSP inputs. There are many features of input that you might use. One feature is the standard deviation of distances between cities. For a list of other potential features you can look at Jiang 2014[3], Mersmann 2012 [1], Gao 2016 (page 5)[4]. For this section you will need to write a very short report called part2.pdf where you include graphs that correlate each of your three features with problem difficulty. Your report should be short ­ max 2 pages and should be clearly written and referenced. Make sure your graphs show not just the results for hard and easy inputs but a variety of inputs in between. This means saving some of your intermediate results during your search process. As with part 1 you must also include your code and instructions on how to use this code. This section is the challenge section. In this section you have to find hard and easy instances ­ like you did in part 1 but ­ instead of comparing your algorithm performance on Concorde and an approximation solver ­ you simply judge the difficulty of an instance based on one or more features. This means that your evaluation of each candidate solution can be very fast. At the end you should compare the results of the search with the results you get from part 1. If possible, you should run multiple experiments so you can get statistical validity. Note that you will have to chose a feature or feature that correlates strongly with the hardness or easiness of the input data ­ otherwise your search will not be very informed. If you didn't have strong correlations in part 2 above, borrow a feature with strong correlations from the literature. As with the previous parts you should write a very short report called part3.pdf (2 pages max) that compares the performance of your algorithm against your results in part 1 in terms of performance and how long it takes the search to run (it should be a lot faster!). Again you will need to include your code with brief instructions on how to run it. References [1] Mersmann, O., Bischl, B., Bossek, J., Trautmann, H., Wagner, M. and Neumann, F., 2012. Local search and the traveling salesman problem: A feature­based characterization of problem hardness. In Learning and Intelligent Optimization(pp. 115­129). Springer Berlin Heidelberg. [2] Nallaperuma, S., Wagner, M., Neumann, F., Bischl, B., Mersmann, O. and Trautmann, H., Features of Easy and Hard Instances for Approximation Algorithms and the Traveling Salesperson Problem. [3] Jiang, H., Sun, W., Ren, Z., Lai, X. and Piao, Y., 2014, December. Evolving hard and easy traveling salesman problem instances: a multi­objective approach. In Asia­Pacific Conference on Simulated Evolution and Learning (pp. 216­227). Springer International Publishing. [4] Gao, W., Nallaperuma, S. and Neumann, F., 2015. Feature­Based Diversity Optimization for Problem Instance Classification. arXiv preprint arXiv:1510.08568. End of Specification. Part 2 ­ Features of Hard and Easy Instances Part 3 ­ Using Features to search for Hard and Easy instances.