Assignment title: Management


Midterm CS536, Machine Learning, Spring 2017 March 4, 2017 This test is a take-home test. You have until 11:59pm on March 10 to answer the questions and submit your test. The test should be submitted electronically through Sakai. All work should be strictly individual. Be very clear and precise in your answers. This does not mean being verbose, use the language of mathematics! For questions that ask for evaluation on a specific problem include the code (MATLAB or Python) with clear instructions on how to run the code and reproduce your results. Name: ID: Problem Score Max. score 1 80 2 100 1 30 2 20 Max. Total 230 (230 perfect grade) 1Problem 1 In many machine learning problems (e.g., Bayesian learning for linear regression) one seeks to estimate the evidence of a model family Pr(Djθ) = f(θ) for some set of model family (hyper)parameters1. For instance, in the case of the linear regression and Bayesian learning those were the parameters α and β. We then seek to maximize the evidence by varying those parameters and find the model that has good generalization performance. However, it is sometimes the case that the evidence function f we seek to optimize cannot be computed analytically. More critically, it is often challenging to optimize this function. Instead, we approach the problem by taking samples fθkgK k=1 of the parameters of this function and evaluating its value on those samples ff(θk)gK k=1, selecting the sample that results in the highest probability of evidence θ∗ = argmaxk f(θk). 1. Suppose you approach the above parameter optimization problem using a regular grid of points GR = fθkgK k=1, where θ 2 <. Assume that you can evaluate the function f(θ) for any argument θ in some fixed time τ, which could be large. What are the potential issues in the strategy outlined in the paragraph above under these settings? 2. Now consider a different evaluation strategy. Instead of first constructing a regular grid of points GR you will be given two initial values of θ, call them G2 = fθkg2 k=1. Your goal is to select the next "best" point θK+1, where this function is to be evaluated. The "tool" you have at your disposal is the Bayesian linear regression with radial basis functions placed at the points in G2. In other words, you can model f from the samples G2 and the corresponding F2 = ff(θk)g. Propose a strategy to select the next best point θ3 where to evaluate f. Justify clearly the mathematical model underpinning this strategy and the set of steps (i.e., algorithm) that you will use to select that next point. Be precise. Generalize this to selecting θK+1 given previous evaluations GK FK. 3. Demonstrate this strategy on the problem of finding the maximum of the function f(x) = e−x2 − e−(x−5)2 on the interval I = [−3;8]. Choose two starting points, x1 = −0:8 and x2 = 5:7. Show the sequence of ten subsequent choices of xk and in each iteration plot the estimated mean value of your function f and the uncertainty, over the given evaluation interval I, and the location of the next chosen evaluation point xk+1. Problem 2 In class we discussed several approaches to modeling data using Generalized Linear Models (GLMs). In the regression case, the setting was that one seeks to find the LR model f(x), defined by basis functions φ, from the dataset D = f(x; t)gN i=1, where t are the noisy samples of that unknown function t = f(x). 1. Suppose that your LR model is y(xjw) = wT φ for x 2