Problem Set 1
CS 4301
Due: 9/16/2020 by 11:59pm
Note: all answers should be accompanied by explanations and code for full credit. Late homeworks
will not be accepted.
Problem 1: Gradients & More (10 pts)
- For each of the following functions: (1) Prove that they are convex, compute gradients/subgradients, and write the updates for (sub)gradient descent with a fixed step size.
(a) f(w) = PM
m=1 max(0, 1 − ymw
T x
(m)
)
2
for w ∈ R
n and fixed y1, . . . , yM ∈ R and
x
(1), . . . , x(M) ∈ R
n
.
(b) f(x) = log [Pn
i=1 exp(aixi)] for x ∈ R
n and constants a ∈ R
n
.
(c) f(x) = maxm∈{1,…,m}
[a
(m)
T
x + bm] for x ∈ R
n and constants a
(1), . . . , a(M) ∈ R
n and
b1, . . . , bM ∈ R.
Problem 2: Line Search vs. Fixed Step Size (60 pts) - Consider the least squares optimization problem discussed in class. That is, given a collection
of data observations (x
(1), y(M)
), . . . ,(x
(M)
, y(M)
) ∈ R
2
, consider finding the line f(x) = ax+b
that minimizes the squared error.
min
a,b
1
M
X
M
m=1
y
(m) − ax(m) − b
2
(a) In Python, implement gradient descent to find the optima a and b in the following ways:
• Standard gradient descent with a fixed step size, γ.
• Standard gradient descent with a diminishing step size, p
q+t
for a given p and q.
• Gradient descent with exact line search.
• Gradient descent with backtracking line search for a given α and β.
Each different approach should be written as a single Python function that takes as
input the x
0
s, the y
0
s, an initial guess for a, an initial guess for b, and any parameters
that describe the method, e.g., γ for the fixed step size method.
(b) Plot the value of the objective function versus iteration starting at a = b = 0 for each
of the four different methods (use γ = .5, p = q = 1, α = .3, β = .1 ) for 100 iterations.
You should use the data observations provided in the file linreg.data.
1 - Consider the same function fitting problem as above but with absolute error.
min
a,b
1
M
X
M
m=1
y
(m) − ax(m) − b
(a) In Python, implement subgradient descent to find the optima a and b in the following
ways:
• Subgradient descent with a diminishing step size, p
q+t
for a given p > 0 and q.
• Subgradient descent with a diminishing step size, p
√
t
for a given p > 0.
(b) Plot the value of the objective function versus iteration starting at a = b = 0 for each
of the different methods (use p = q = 1) for 100 iterations. Again, you should use the
data observations provided in the file linreg.data.
Problem 3: Numerical Gradients (30 pts)
Sometimes a function may be differentiable, but its gradients may be difficult to compute or the
gradients are unavailable. In such cases, numerical methods can be used to approximate the
derivative. One such strategy is to use the the so-called symmetric difference quotient,
f(x + v) − f(x − v)
2
,
with a small > 0 to approximate the directional derivative in the direction v. - Consider the following univariate optimization problem.
inf
a
exp(a)
1 + exp(a)
ln
exp(a)
1 + exp(a)
+
1
1 + exp(a)
ln
1
1 + exp(a)
(a) Is this a convex optimization problem? Prove your answer.
(b) Use gradient descent with a fixed step size to find the global minimum.
(c) Use gradient descent with a fixed step size and numerical approximations of the derivatives to find the global minimum. - Consider the convex function in cvxnum.py. Apply numerical gradient descent to minimize
this function.