5/5 - (1 vote)

Biologically Inspired Methods

Nature-Inspired Learning Algorithms (7CCSMBIM)
Tutorial 2: Solutions
1

Q1. What are the advantages and disadvantages of gradient descent method?
2

Q1. What are the advantages and disadvantages of gradient descent method?
3

Q1. What are the advantages and disadvantages of gradient descent method?

https://www.cs.toronto.edu/~frossard/post/linear_regression/
4

Q1. What are the advantages and disadvantages of gradient descent method?

Q1. What are the advantages and disadvantages of gradient descent method?
Advantages:
Simple
Robust and quick convergence
Tractable solution

Disadvantages:
Works with gradient
Does not guarantee global minimum
Does not work well with discrete variables
Sensitive to initial guess
Trapped in local minimum
6

Q2. Show how gradient descent method works using pseudo code
7

oindent $mathbf{x}^* = left( left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] right)^{-1} left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} -4 \ 4.5 end{array} right]$ \~\~
ewline
%
textbf{Verification:} $mathbf{A}mathbf{x}^{*}-mathbf{B} = mathbf{0}$?\
$left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] mathbf{x}^* left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} 0 \ 0 end{array} right]$

B
G
W
M

x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,b,MarkerSize, 12,linewidth, 1);
holdon;
plot(x,y,mo,MarkerSize, 12,linewidth, 3);
plot(0.5, 1,rx,MarkerSize, 12,linewidth, 3);
x = [0, 3], y = [0, 4];
plot(x,y,b-,linewidth, 1);
xlabel(itx);
ylabel(ity);

plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);
9

B
G
W
M

plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);
10

plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);
11

B
G
W
M
R

plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);

begin{align*}
&mathbf{B}: f(0, 0) = 0\
&mathbf{G}: f(1, 2) = 6\
&mathbf{W}: f(3, 4) = 26~(text{before replacement})\
&mathbf{R}: f(-2, -2) = 8\
&mathbf{C}: f(-0.75, -0.5) = 1.0625
end{align*}
12

k = 0:
k = 1:
k = 2:

a
b
c
d
e
f

oindent textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k h_k triangledown f(mathbf{z}_k)$\~\
oindent 1$^{st}$ iteration: $mathbf{z}_{1} = left[ begin{array}{c} 7 \ 8 end{array} right] 0.1 left[ begin{array}{c} 2 times 7 1 \ 2 times 8 + 1 end{array} right] = left[ begin{array}{c} 5.7 \ 6.3 end{array} right]$;

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_k + 1 end{array} right]$

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_2 + 1 end{array} right]$

oindent $f(x, y)$ = 72.7800.
\~\

oindent 2$^{nd}$ iteration: $mathbf{z}_{2} = left[ begin{array}{c} 5.7 \ 6.3 end{array} right] 0.1 left[ begin{array}{c} 2 times 5.7 1 \ 2 times 6.3 + 1 end{array} right] = left[ begin{array}{c} 4.66 \ 4.94 end{array} right]$;

oindent $f(x, y)$ = 46.3992. \~\
oindent 3$^{rd}$ iteration: $mathbf{z}_{3} = left[ begin{array}{c} 4.66\ 4.94 end{array} right] 0.1 left[ begin{array}{c} 2 times 4.66 1 \ 2 times 4.94 + 1 end{array} right] = left[ begin{array}{c} 3.828\ 3.852 end{array} right]$;

oindent $f(x, y)$ = 29.5155.

frac{df(x,y)}{dx}\
frac{df(x,y)}{dy}

k = 0:
k = 1:
k = 2:

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_k + 1 end{array} right]$

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_2 + 1 end{array} right]$

oindent $f(x, y)$ = 72.7800.
\~\

oindent $f(x, y)$ = 29.5155.

frac{df(x,y)}{dx}\
frac{df(x,y)}{dy}

Initial guess
k = 0
k = 1
k = 2

[X,Y] = meshgrid(1:0.5:10,1:0.5:10);
Z = (X-1).*X + (Y+1).*Y;
mesh(X,Y,Z);
alpha0.5;
holdon;
plot3(7, 8, 114,ro,MarkerSize, 12,linewidth, 3);
plot3(5.7, 6.3, 72.78,rx,MarkerSize, 12,linewidth, 3);
plot3(4.66, 4.94, 46.3992,rx,MarkerSize, 12,linewidth, 3);
plot3(3.828, 3.852, 29.5155,rx,MarkerSize, 12,linewidth, 3);
xlabel(itx);
ylabel(ity);
zlabel({itf}(itx,ity));

fbox{$h_k = frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)}$}

$f(x, y) = frac{1}{2} begin{bmatrix} x \ y end{bmatrix}^T begin{bmatrix} 2&0 \ 0&2 end{bmatrix} begin{bmatrix} x \ y end{bmatrix} begin{bmatrix} 1&-1 end{bmatrix} begin{bmatrix} x \ y end{bmatrix}$

textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k h_k triangledown f(mathbf{z}_k)$\~\

fbox{$h_k = frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)}$}
18

textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)} triangledown f(mathbf{z}_k)$

textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k h_k triangledown f(mathbf{z}_k)$\~\

Randomly pick an initial condition as $mathbf{z}_k = left[ begin{array}{c} 1 \ -2 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_k + frac{1}{2} triangledown f(mathbf{z}_k) \
&= left[ begin{array}{c} 1 \ -2 end{array} right] frac{1}{2} left[ begin{array}{c} 2x_k-1 \ 2y_k+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] frac{1}{2} left[ begin{array}{c} 2times 1-1 \ 2times-2+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] frac{1}{2} left[ begin{array}{c} 1 \ -3 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

Run another iteration (e.g., $k+1 rightarrow k+2$) by taking $mathbf{z}_{k+1} = left[ begin{array}{c} 0.5 \ -0.5 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_{k+1} + frac{1}{2} triangledown f(mathbf{z}_{k+1}) \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] frac{1}{2} left[ begin{array}{c} 2x_{k+1}-1 \ 2y_{k+1}+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] frac{1}{2} left[ begin{array}{c} 2times 0.5-1 \ 2times-0.5+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] frac{1}{2} left[ begin{array}{c} 0 \ 0 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

[a, b]
[c, d]
e
[f, g]
h
[i, j]

$mathbf{x}_{k+1} = left[ begin{array}{c} -1 \ -2 end{array} right] + left[ begin{array}{cc} 1 & 0 \ 0 & 1 end{array} right]left[ begin{array}{c} 0.5 \ 0.5 end{array} right]$

$mathbf{x}_{k+1} = left[ begin{array}{c} -0.5 \ -1.5 end{array} right] + left[ begin{array}{cc} 1 & 0 \ 0 & 1 end{array} right]left[ begin{array}{c} 0.5 \ 0.5 end{array} right]

Approximate

Decision variables

y = g(x_1, x_2)

hat{y} = w_1 x_1 + w_2 x_2 + b

Ideal case: $y = hat{y}$, i.e., $y hat{y} = 0$

Minimise MSE subject to w1, w2 and b

Simplified notations:

frac{(hat{y}_1 y_1)^2 + (hat{y}_2 y_2)^2 + cdots + (hat{y}_M y_M)^2 }{M}

y(M) leftarrow y_m, hat{y}(M) leftarrow hat{y}_m

= frac{(hat{y}_1 y_1)^2 + (hat{y}_2 y_2)^2 + cdots + (hat{y}_M y_M)^2 }{M}

= frac{big( (w_1x_1(1) +w_2x_2(1) + b) y_1 big)^2 + big((w_1x_1(2) +w_2x_2(2) + b) y_2 big)^2 + cdots + big((w_1x_1(M) +w_2x_2(M) + b) y_M big)^2 }{M}

min_{w_1, w_2, b} frac{1}{M} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)^2

y(M) leftarrow y_m, hat{y}(M) leftarrow hat{y}_m

Remark: The ideal set of $w_1$, $w_2$ and $b$ will lead to the cost $f(w_1, w_2, b) = 0$.

textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k h_k triangledown f(mathbf{z}_k)$ where $mathbf{z}_k = left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]$
begin{align*}
triangledown f(mathbf{z}_k) &= left[ begin{array}{c} frac{partial f(mathbf{z}_k)}{partial w_1 } \ \ frac{partial f(mathbf{z}_k)}{partial w_2} \ \ frac{partial f(mathbf{z}_k)}{partial b} end{array} right] = left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big) end{array} right]
end{align*}

f(w_1, w_2, b) = f(mathbf{z}) = frac{1}{M} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)^2

textbf{Update rule:}\ begin{align*} mathbf{z}_{k+1} &= mathbf{z}_k h_k triangledown f(mathbf{z}_k)\ Rightarrow left[ begin{array}{c} w_{1_{k+1}} \ w_{2_{k+1}} \ b_{k+1} end{array} right] &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] h_k left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big) end{array} right] \ &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] frac{2h_k}{M} left[ begin{array}{c} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)x_1(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)x_2(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big) end{array} right] end{align*}
27

Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $Rightarrow M = 3$.

$mathbf{z}_0 = mathbf{z}_0 = left[ begin{array}{c} 1 \ 1 \ 1 end{array} right] Rightarrow
w_1 = w_2 = b = 1$
begin{align*}
hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 times 1) + (1 times -2) + 1 = 0\
hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 times 2) + (1 times 4) + 1 = 7\
hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 times 3) + (1 times 0) + 1 = 4
end{align*}

begin{align*}
MSE &= frac{1}{M} big( (hat{y}_1 y_1)^2 + (hat{y}_2 y_2)^2 + (hat{y}_3 y_3)^2 ) big)\
&= frac{1}{3} big( (0 3)^2 + (7 (-1))^2 + (4 5)^2 ) big)\
&= 24.6667
end{align*}

Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $Rightarrow M = 3$.

begin{align*}
MSE &= frac{1}{M} big( (hat{y}_1 y_1)^2 + (hat{y}_2 y_2)^2 + (hat{y}_3 y_3)^2 ) big)\
&= frac{1}{3} big( (0 3)^2 + (7 (-1))^2 + (4 5)^2 ) big)\
&= 24.6667
end{align*}

Ingredient 1

Ingredient 2

Others

Food Product

58
Q16. Explain how Recursive least-squares works.

A least-squares problem is described as $displaystyle min_{mathbf{x}} f(mathbf{x}) = midmid mathbf{A}mathbf{x} mathbf{B} midmid_2^2$. Its solution is given as $mathbf{x} = big( mathbf{A}^Tmathbf{A} big)^{-1} mathbf{A}^T mathbf{B}$.\

Denote the $i$-th row of $mathbf{A}$ as $mathbf{a}_i^T$ and the $i$-th row of $mathbf{B}_i$ as $b_i$.\ The solution $mathbf{x}$ can be represented in row form as follows: $$mathbf{x} = Big( displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T Big)^{-1} sum_{i=1}^m b_i mathbf{a}_i.$$

For example, $$mathbf{A} = left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right] = left[ begin{array}{c} mathbf{a}_1^T \ mathbf{a}_2^T \ vdots \ mathbf{a}_m^T \ end{array} right],$$ $$mathbf{B} = left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right].$$
58

mathbf{A}^Tmathbf{A}

displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T

$left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right]$

$left[ begin{array}{cccc} a_{11} & a_{21} & cdots & a_{m1} \ a_{12} & a_{22} & cdots & a_{m2} \ vdots & vdots & ddots & vdots \ a_{1n} & a_{2n} & cdots & a_{mn} end{array} right]$

left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right]

$mathbf{a}_1mathbf{a}_1^T + mathbf{a}_2mathbf{a}_2^T + cdots + mathbf{a}_mmathbf{a}_m^T$

60
What happen if a new sample is coming in?

$$mathbf{A} = left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right] = left[ begin{array}{c} mathbf{a}_1^T \ mathbf{a}_2^T \ vdots \ mathbf{a}_m^T \ end{array} right], qquad mathbf{B} = left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right].$$

$left[ begin{array}{cccc} a_{m+1,1} & a_{m+1,2} & vdots & a_{m+1,n} end{array} right] quad = quad mathbf{a}_{m+1}^T$

$b_{m+1}$

The new solution can be written as $$mathbf{x}_{new} = Big( displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T Big)^{-1} big( sum_{i=1}^m b_i mathbf{a}_i + b_{m+1} mathbf{a}_{m+1} big).$$

At the time that you have $m$ samples, recall that the solution is: $$mathbf{x} = mathbf{P}(m)^{-1} mathbf{q}(m)$$
where $$mathbf{P}(m) = mathbf{A}^Tmathbf{A} = displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T $$ and $$mathbf{q}(m) = mathbf{A}^T mathbf{B} = sum_{i=1}^m b_i mathbf{a}_i.$$

With the new sample $mathbf{a}_{m+1}^T$ and and $b_{m+1}$, the solution is: $$mathbf{x}_{new} = mathbf{P}(m+1)^{-1} mathbf{q}(m+1)$$ where $$mathbf{P}(m+1) = sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T = mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T $$ and $$mathbf{q}(m+1) = sum_{i=1}^m b_i mathbf{a}_i + b_{m+1} mathbf{a}_{m+1}.$$
61

62
When new sample comes in, do we need to compute the inverse of P(m+1)?

Can we use the inverse of P(m) to obtain P(m+1)?

Any mathematical trick?

oindent textit{Rank one update formula:} $$big( mathbf{P} + mathbf{a}mathbf{a}^T big)^{-1} = mathbf{P}^{-1} frac{1}{1+mathbf{a}^T mathbf{P}^{-1} mathbf{a}} big( mathbf{P^{-1} mathbf{a}} big) big( mathbf{P^{-1} mathbf{a}} big)^T$$

oindenttextit{Remark:} Rank one update formula is valid when $mathbf{P} = mathbf{P}^T$, and $mathbf{P}$ and $mathbf{P} + mathbf{a}mathbf{a}^T$ are both invertible.
62

oindent textit{Rank one update formula:} $$big( mathbf{P} + mathbf{a}mathbf{a}^T big)^{-1} = mathbf{P}^{-1} frac{1}{1+mathbf{a}^T mathbf{P}^{-1} mathbf{a}} big( mathbf{P}^{-1} mathbf{a} big) big( mathbf{P}^{-1} mathbf{a} big)^T$$

We have:
$$mathbf{P}(m+1) = sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T = mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T$$

$mathbf{P}(m+1)^{-1} equiv Big( mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T Big)^{-1}$.

Department of Informatics, Kings College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

f(x) =|| AxB ||22 where A =

1 2

3 4

and

B =

. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [1 2]
1

0 0.5 1 1.5 2 2.5 3

0.5

1.5

2.5

3.5

00.5 11.5 22.5 3
x
0
0.5
1
1.5
2
2.5
3
3.5
4
y

Department of Informatics, Kings College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2 (Suggested Solutions)

Q1. Answer can be found in lecture notes.

Q2. Answer can be found in lecture notes.

Q3. x

1 2

3 4

T
1 2

3 4

!1
1 2

3 4

T
5

4
4.5

Verification:

1 2

3 4

Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)

b. M =

B+G
2

(0,0)+(1,2)
2

= (0.5, 1); f(0.5, 1) = 1.75.

c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.
d. As f(R) > f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R.C =W+M2=(2,2)+(0.5,1)2= (0.75,0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (0.75,0.5) and W: (1, 2)Q5. Let z =xy. Of(z) =2x 12y + 1. According to the update rule, zk+1 =zk hkOf(zk), we have1stiteration: z1 =78 0.12 7 12 8 + 1=5.76.3; f(x, y) = 72.7800.2nditeration: z2 =5.76.3 0.12 5.7 12 6.3 + 1=4.664.94; f(x, y) = 46.3992.3rditeration: z3 =4.664.94 0.12 4.66 12 4.94 + 1=3.8283.852; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x 12y + 1where zk =xkyk.f(z) =12zTQz bTz where Q =2 00 2and bT=1 1.1df(x, y)dxdf(x, y)dy010508 10100f(x,y)6 8150y6x42004220 001050810100f(x,y)68150y6x420042200Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2 (Suggested Solutions)Q1. Answer can be found in lecture notes.Q2. Answer can be found in lecture notes.Q3. x= 1 23 4T 1 23 4!1 1 23 4T 56=44.5Verification:1 23 4x 56=00Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;f(1, 2) = (1 1)1 + (2 + 1)2 = 6;f(3, 4) = (3 1)3 + (4 + 1)4 = 26.B: (0, 0); G: (1, 2); W: (3, 4)b. M =B+G2=(0,0)+(1,2)2= (0.5, 1); f(0.5, 1) = 1.75.c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.d. As f(R) > f(G), Case (ii) is performed.

A
A

A
C

w
H

i
c

n
V

F
d

S
+

N
A

F
J

1
E

X
b

W
6

a
9

V
H

X
y

5
b

t
7

i
w

l
i

Q
I

+
i

K
I

v
v

i
o

Y
F

V
o

u
m

U
y

v
W

k
H

J
5

M
4

M
5

H
G

k
D

/
p

i
+

y
/

2
U

k
t

s
t

p
9

E
A

8
M

H
M

4
9

9
2

P
u

j
T

L
B

t
f

G
8

P
4

6
7

s
L

j
0

Z
X

l
l

t
b

G
2

/
v

X
b

R
n

N
z

6
1

q
n

u
W

L
Y

Z
a

l
I

1
W

1
E

N
Q

o
u

s
W

u
4

E
X

i
b

K
a

R
J

J
P

A
m

u
j

u
r

4
z

c
P

q
D

R
P

5
Z

U
p

M
u

w
n

d
C

R
5

z
B

k
1

V
h

o
0

n
3

f
j

v
c

k
v

K
H

7
C

M
Y

S
x

o
q

z
0

q
z

K
o

I
I

x
w

x
G

U
Z

J
d

Q
o

P
q

l
g

A
m

E
I

B
Y

Q
o

h
6

/
i

7
6

s
5

W
9

D
2

a
q

P
X

D
t

5
a

P
1

Q
P

9
u

d
s

f
n

v
f

/
0

S
l

3
c

a
g

2
f

I
6

3
h

Q
w

T
/

w
Z

a
Z

E
Z

L
g

b
N

p
3

C
Y

s
j

x
B

a
Z

i
g

W
v

e
O

M
t

M
v

q
T

K
c

C
a

w
a

Y
a

4
x

o
+

y
O

j
r

B
n

q
a

Q
J

6
n

4
5

3
X

8
F

P
6

w
y

h
D

h
V

9
k

k
D

U
/

X
f

j
J

I
m

W
h

d
J

Z
J

1
2

w
L

F
+

H
6

v
F

/
8

V
6

u
Y

m
P

+
i

W
X

W
W

5
Q

s
p

d
G

c
S

7
A

p
F

A
f

E
4

Z
c

I
T

O
i

s
I

Q
y

x
e

2
s

w
M

b
U

X
t

H
Y

k
z

f
q

J
f

j
v

v
z

x
P

r
o

O
O

f
9

A
5

u
A

x
a

J
6

e
z

d
a

y
Q

H
f

K
d

7
B

G
f

H
J

I
T

c
k

4
u

S
J

c
w

5
9

h
h

j
n

A
S

9
9

Q
d

u
6

l
7

/
2

J
1

n
V

n
O

N
n

k
D

9
/

E
v

j
G

T
Z

f
w

=
=

</latexit>

Department of Informatics, Kings College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2 (Suggested Solutions)

Q1. Answer can be found in lecture notes.

Q2. Answer can be found in lecture notes.

Q3. x =

1 2
3 4

T
1 2
3 4

!1
1 2
3 4

T
5
6

4
4.5

Verification:

1 2
3 4

5
6

0
0

Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)

b. M = B+G
2

= (0,0)+(1,2)
2

= (0.5, 1); f(0.5, 1) = 1.75.

c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.
d. As f(R) > f(G), Case (ii) is performed.

If rand < Threshold Then xk+1 xk;ElseIf f (xbest) > f (xk+1) Then xbest xk+1;
End

k k+1;
end

return xk and xbest;

Table 3: Pseudo Code of Random Walk Optimisation.

Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2018-19 46 / 68

TraditionalNumericalMethods:RandomWalkAlgorithm:RandomWalkOptimisation
input:f(x):
n
!;x
0
:aninitialsolution
output:x

,alocalminimumofthecostfunctionf(x)
k 0;x
best
=x
0
;Threshold;
whileSTOP-CRITandk f(x
k
)
Ifrand f(x
k+1
)Thenx
best
x
k+1
;
End
k k+1;
end
returnx
k
andx
best
;
Table3:PseudoCodeofRandomWalkOptimisation.
DrH.K.Lam(KCL) Optimisation BiologicallyInspiredMethods2018-1946/68

Department of Informatics, Kings College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

f(x) =|| AxB ||22 where A =

1 2

3 4

and

B =

. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [1 2]
1

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

where Of(zk) =

2xk 1
2yk + 1

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

2xk 1 2yk + 1

2xk 1
2yk + 1

2xk 1 2yk + 1

2 0

0 2

2xk 1
2yk + 1

2
k 4xk + 2 + 4y

2
k + 4yk

2
k 8xk + 4 + 8y

2
k + 8yk

b. Update rule: zk+1 = zk
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

Of(zk)

Randomly pick an initial condition as zk =

zk+1 = zk +
1

Of(zk)

2xk 1
2yk + 1

2 1 1
22 + 1

0.5

Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =

0.5

zk+1 = zk+1 +
1

Of(zk+1)

0.5

2xk+1 1
2yk+1 + 1

0.5

2 0.5 1
20.5 + 1

0.5

Q7. Update rule: xk+1 = xk + Dkhk where xk =

, Dk =

1 0

0 1

and hk =

0.5

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

where Of(zk) =

2xk 1
2yk + 1

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

2xk 1 2yk + 1

2xk 1
2yk + 1

2xk 1 2yk + 1

2 0

0 2

2xk 1
2yk + 1

2
k 4xk + 2 + 4y

2
k + 4yk

2
k 8xk + 4 + 8y

2
k + 8yk

b. Update rule: zk+1 = zk
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

Of(zk)

Randomly pick an initial condition as zk =

zk+1 = zk +
1

Of(zk)

2xk 1
2yk + 1

2 1 1
22 + 1

0.5

Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =

0.5

zk+1 = zk+1 +
1

Of(zk+1)

0.5

2xk+1 1
2yk+1 + 1

0.5

2 0.5 1
20.5 + 1

0.5

Q7. Update rule: xk+1 = xk + Dkhk where xk =

, Dk =

1 0

0 1

and hk =

0.5

f(0.5,1.5) = 1.5

A
A

A
B

/
H

i
c

b
V

D
L

S
g

M
x

F
M

3
U

V
x

1
f

o
1

2
6

C
R

a
l

g
h

1
m

p
M

V
u

h
K

I
b

l
x

X
s

A
9

q
h

Z
N

J
M

G
5

p
5

k
G

S
E

Y
a

i
/

4
C

e
4

c
a

G
I

W
z

/
A

T
3

A
n

+
B

v
u

T
R

8
L

b
T

1
w

u
Y

d
z

7
i

U
3

x
4

0
Y

F
d

K
y

P
r

X
M

0
v

L
K

6
l

p
2

X
d

/
Y

3
N

r
e

M
X

b
3

G
i

K
M

O
S

Z
1

H
L

K
Q

t
1

w
k

C
K

M
B

q
U

s
q

G
W

l
F

n
C

D
f

Z
a

T
p

D
i

/
H

f
v

O
W

c
E

H
D

4
E

Y
m

E
X

F
8

1
A

+
o

R
z

G
S

S
u

o
a

O
a

9
Q

t
M

z
y

C
S

z
a

Z
v

k
Y

n
k

P
V

u
k

b
e

M
q

0
J

4
C

K
x

Z
y

R
f

P
f

o
q

f
L

/
f

d
2

p
d

4
6

P
T

C
3

H
s

k
0

B
i

h
o

R
o

V
y

L
p

p
I

h
L

i
h

k
Z

6
Z

1
Y

k
A

j
h

I
e

q
T

t
q

I
B

8
o

l
w

0
s

n
t

I
3

i
o

l
B

7
0

Q
q

4
q

k
H

C
i

/
t

5
I

k
S

9
E

4
r

t
q

0
k

d
y

I
O

a
9

s
f

i
f

1
4

6
l

V
3

F
S

G
k

S
x

J
A

G
e

P
u

T
F

D
M

o
Q

j
o

O
A

P
c

o
J

l
i

x
R

B
G

F
O

1
a

0
Q

D
x

B
H

W
K

q
4

d
F

2
F

Y
M

9
/

e
Z

E
0

T
k

2
7

Z
J

a
u

7
X

z
1

A
k

y
R

B
f

v
g

A
B

S
A

D
c

5
A

F
V

y
B

G
q

g
D

D
B

L
w

A
J

7
A

s
3

a
n

P
W

o
v

2
u

t
0

N
K

P
N

d
n

L
g

D
7

S
3

H
2

A
0

l
O

8
=

</latexit>

Department of Informatics, Kings College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

f(x) =|| AxB ||22 where A =

1 2

3 4

and

B =

. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [1 2]
1

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [1 2] 4 [0.5 1.5] 1.5 [0.5 1.5] 1.5
1 [0.5 1.5] 1.5 [0 1] 0 [0 1] 0
2 [0 1] 0 [0.5 0.5] 0.5 [0.5 0.5] 0.5
3 [0.5 0.5] 0.5 [1 0] 0 0.8 [0.5 0.5] 0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 0.5] 0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 0.5] 0.5

Q8. a. Mean squared error:

1
M

i=1

(yi yi)2

Minimisation problem: min

w1,w2,b

i=1

(yi yi)2

f(w1, w2, b) =
1

i=1

(yi yi)2

i=1

(w1x1(i) + w2x2(i) + b) yi

Update rule: zk+1 = zk hkOf(zk) where zk =

4
w1k

w2k

Of(zk) =

66666
4

@f(zk)
@w1

@f(zk)
@w2

@f(zk)
@b

77777
5

6666666666666
4

1
M

i=1

(w1x1(i) + w2x2(i) + b) yi

x1(i)

1
M

i=1

(w1x1(i) + w2x2(i) + b) yi

x2(i)

1
M

i=1

(w1x1(i) + w2x2(i) + b) yi

7777777777777
5

Update rule:

Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot iteration k against MSE for 200 iterations. Is the choice of hk = 0.1
right or wrong?

2.5

1.5

1-2

-1

-1
4

3
2.5
2
x
1
1.5
1
-2
-1
0
x
2
1
2
3
5
4
3
2
1
0
-1
4
y

Q8. Consider a system taking two input variables x1 and x2 and generate an output

y. A set of M input-output data is collected in an experiment, e.g., the dataset

is (x1(1), x2(1), y1), (x1(2), x2(2), y2), , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean

Squared Error using the gradient descent method, where w1 and w2 and b are the

parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:

(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

Q8. a. Mean squared error:

1
M

i=1

(yi yi)2

Minimisation problem: min

w1,w2,b

i=1

(yi yi)2

f(w1, w2, b) =
1

i=1

(yi yi)2

i=1

(w1x1(i) + w2x2(i) + b) yi

Update rule: zk+1 = zk hkOf(zk) where zk =

4
w1k

w2k

Of(xk) =

66666
4

@f(xk)
@w1

@f(xk)
@w2

@f(xk)
@b