Assignment title: Management
Homework Assignment # 1
Due: Wednesday, February 1, 2017, 11:59 p.m.
Total marks: 100
Question 1. [10 marks]
Let X be a random variable with outcome space Ω = fa; b; cg and p(a) = 0:1; p(b) = 0:2, and
p(c) = 0:7. Let
f(x) =
8<:
10 if x = a
5 if x = b
10=7 if x = c
(a) [3 marks] What is E[f(X)]?
(b) [3 marks] What is E[1=p(X)]?
(c) [4 marks] For an arbitrary pmf p, what is E[1=p(X)]?
Question 2. [15 marks]
Let X1; : : : ; Xm be independent multivariate Gaussian random variables, with Xi ∼ N(µi; Σi),
with µi 2 Rd and Σi 2 Rd×d for dimension d 2 N. Define X = a1X1 + a2X2 + : : : + amXm as a
convex combination, ai ≥ 0 and Pm i=1 ai = 1.
(a) [5 marks] Write the expected value E[X] in terms of the givens ai; µi; Σi. Show all you steps.
What is the dimension of E[X]?
(b) [10 marks] Write the covariance Cov[X] in terms of the givens ai; µi; Σi. Show all you steps.
What is the dimension of Cov[X]? Briefly explain how the result for Cov[X] would be different if
the variables X1 and X2 are not independent and have covariance Cov[X1; X2] = Λ for Λ 2 Rd×d.
Question 3. [15 marks]
This question involves some simple simulations, to better visualize random variables and get
some intuition for sampling, which is a central theme in machine learning. Use the attached code
called simulate.py. This code is a simple script for sampling and plotting with python; play with
some of the parameters to see what it is doing. Calling simulate.py runs with default parameters;
simulate.py 1 100 simulates 100 samples from a 1d Gaussian.
(a) [5 marks] Run the code for 10, 100 and 1000 samples with dim=1 and σ = 1:0. Next run
the code for 10, 100 and 1000 samples with dim=1 and σ = 10:0. What do you notice about the
sample mean?
(b) [5 marks] The current covariance for dim=3 is
Σ = 2 4 1 0 0 0 1 0 0 0 1 3 5 :
What does that mean about the multivariate Gaussian (i.e., about X, Y and Z)?
1/3Spring 2017 CSCI-B455: Machine Learning
(c) [5 marks] Change the covariance to
Σ = 2 4 1 0 1 0 1 0 1 0 1 3 5 :
What happens?
Question 4. [30 marks]
Suppose that the number of accidents occurring daily in a certain plant has a Poisson distribution with an unknown mean λ. Based on previous experience in similar industrial plants, suppose
that our initial feelings about the possible value of λ can be expressed by an exponential distribution
with parameter θ = 1 2. That is, the prior density is
f(λ) = θe−θλ
where λ 2 (0; 1).
(a) [5 marks] Before observing any data (any reported accidents), what is the most likely value
for λ?
(b) [5 marks] Now imagine there are 79 accidents over 9 days. Determine the maximum likelihood
estimate of λ.
(c) [5 marks] Again imagine there are 79 accidents over 9 days. Determine the maximum a
posteriori (MAP) estimate of λ.
(d) [5 marks] Imagine you now want to predict the number of accidents for tomorrow. How
can you use the maximum likelihood estimate computed above? What about the MAP estimate?
What would they predict?
(e) [5 marks] For the MAP estimate, what is the purpose of the prior once we observe this data?
(f) [5 marks] Look at the plots of some exponential distributions to better understand the prior
chosen on λ. Imagine that now new safety measures have been put in place and you believe that
the number of accidents per day should sharply decrease. How might you change θ to better reflect
this new belief about the number of accidents?
Question 5. [30 marks]
Imagine that you would like to predict if your favorite table will be free at your favorite restaurant. The only additional piece of information you can collect, however, is if it is sunny or not
sunny. You collect paired samples from visit of the form (is sunny, is table free), where it is either
sunny (1) or not sunny (0) and the table is either free (1) or not free(0).
(a) [10 marks] How can this be formulated as a maximum likelihood problem?
(b) [10 marks] Assume you have collected data for the last 10 days and computed the maximum
likelihood solution to the problem formulated in (a). If it is sunny today, how would you predict if
your table will be free?
(c) [10 marks] Imagine now that you could further gather information about if it is morning,
afternoon, or evening. How does this change the maximum likelihood problem?
2/3Spring 2017 CSCI-B455: Machine Learning
Homework policies:
Your assignment will be submitted as a single pdf document and a zip file with code, on
canvas. The questions must be typed; for example, in Latex, Microsoft Word, Lyx, etc. or must
be written legibly and scanned. Images may be scanned and inserted into the document if it is too
complicated to draw them properly. All code (if applicable) should be turned in when you submit
your assignment. Use Matlab, Python, R, Java or C.
Policy for late submission assignments: Unless there are legitimate circumstances, late assignments will be accepted up to 5 days after the due date and graded using the following rule:
on time: your score 1
1 day late: your score 0.9
2 days late: your score 0.7
3 days late: your score 0.5
4 days late: your score 0.3
5 days late: your score 0.1
For example, this means that if you submit 3 days late and get 80 points for your answers, your
total number of points will be 80 × 0:5 = 40 points.
All assignments are individual, except when collaboration is explicitly allowed. All the sources
used for problem solution must be acknowledged, e.g. web sites, books, research papers, personal
communication with people, etc. Academic honesty is taken seriously; for detailed information see
Indiana University Code of Student Rights, Responsibilities, and Conduct.
Good luck!
3/3