Assignment title: Management
Midterm
CS536, Machine Learning, Spring 2017
March 4, 2017
This test is a take-home test. You have until 11:59pm on March 10 to answer the questions and submit your test.
The test should be submitted electronically through Sakai. All work should be strictly individual. Be very clear and
precise in your answers. This does not mean being verbose, use the language of mathematics! For questions that ask
for evaluation on a specific problem include the code (MATLAB or Python) with clear instructions on how to run the
code and reproduce your results.
Name:
ID:
Problem Score Max. score
1 80
2 100
1 30
2 20
Max. Total 230
(230 perfect grade)
1Problem 1
In many machine learning problems (e.g., Bayesian learning for linear regression) one seeks to estimate the evidence
of a model family Pr(Djθ) = f(θ) for some set of model family (hyper)parameters1. For instance, in the case of the
linear regression and Bayesian learning those were the parameters α and β. We then seek to maximize the evidence
by varying those parameters and find the model that has good generalization performance.
However, it is sometimes the case that the evidence function f we seek to optimize cannot be computed analytically. More critically, it is often challenging to optimize this function. Instead, we approach the problem by taking
samples fθkgK k=1 of the parameters of this function and evaluating its value on those samples ff(θk)gK k=1, selecting
the sample that results in the highest probability of evidence θ∗ = argmaxk f(θk).
1. Suppose you approach the above parameter optimization problem using a regular grid of points GR = fθkgK k=1,
where θ 2 <. Assume that you can evaluate the function f(θ) for any argument θ in some fixed time τ, which
could be large. What are the potential issues in the strategy outlined in the paragraph above under these settings?
2. Now consider a different evaluation strategy. Instead of first constructing a regular grid of points GR you will
be given two initial values of θ, call them G2 = fθkg2 k=1. Your goal is to select the next "best" point θK+1,
where this function is to be evaluated. The "tool" you have at your disposal is the Bayesian linear regression
with radial basis functions placed at the points in G2. In other words, you can model f from the samples G2
and the corresponding F2 = ff(θk)g. Propose a strategy to select the next best point θ3 where to evaluate f.
Justify clearly the mathematical model underpinning this strategy and the set of steps (i.e., algorithm) that you
will use to select that next point. Be precise. Generalize this to selecting θK+1 given previous evaluations GK
FK.
3. Demonstrate this strategy on the problem of finding the maximum of the function f(x) = e−x2 − e−(x−5)2
on the interval I = [−3;8]. Choose two starting points, x1 = −0:8 and x2 = 5:7. Show the sequence of
ten subsequent choices of xk and in each iteration plot the estimated mean value of your function f and the
uncertainty, over the given evaluation interval I, and the location of the next chosen evaluation point xk+1.
Problem 2
In class we discussed several approaches to modeling data using Generalized Linear Models (GLMs). In the regression
case, the setting was that one seeks to find the LR model f(x), defined by basis functions φ, from the dataset D =
f(x; t)gN i=1, where t are the noisy samples of that unknown function t = f(x).
1. Suppose that your LR model is y(xjw) = wT φ for x 2