[SOLVED] CS代考计算机代写 flex algorithm AI Biologically Inspired Methods

30 $

File Name: CS代考计算机代写_flex_algorithm_AI_Biologically_Inspired_Methods.zip
File Size: 668.82 KB

SKU: 4771719034 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Biologically Inspired Methods

Nature-Inspired Learning Algorithms (7CCSMBIM)
Tutorial 2: Solutions
1

Q1. What are the advantages and disadvantages of gradient descent method?
2

Q1. What are the advantages and disadvantages of gradient descent method?
3

Q1. What are the advantages and disadvantages of gradient descent method?

4

https://www.cs.toronto.edu/~frossard/post/linear_regression/
4

Q1. What are the advantages and disadvantages of gradient descent method?

5

Q1. What are the advantages and disadvantages of gradient descent method?
Advantages:
Simple
Robust and quick convergence
Tractable solution

Disadvantages:
Works with gradient
Does not guarantee global minimum
Does not work well with discrete variables
Sensitive to initial guess
Trapped in local minimum
6

Q2. Show how gradient descent method works using pseudo code
7

8

oindent $mathbf{x}^* = left( left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] right)^{-1} left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} -4 \ 4.5 end{array} right]$ \~\~
ewline
%
textbf{Verification:} $mathbf{A}mathbf{x}^{*}-mathbf{B} = mathbf{0}$?\
$left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] mathbf{x}^* – left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} 0 \ 0 end{array} right]$

8

9

B
G
W
M

x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘itx’);
ylabel(‘ity’);

plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
9

10

B
G
W
M

x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘itx’);
ylabel(‘ity’);

plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
10

11

x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘itx’);
ylabel(‘ity’);

plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);
11

12

B
G
W
M
R

x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,’b’,’MarkerSize’, 12,’linewidth’, 1);
holdon;
plot(x,y,’mo’,’MarkerSize’, 12,’linewidth’, 3);
plot(0.5, 1,’rx’,’MarkerSize’, 12,’linewidth’, 3);
x = [0, 3], y = [0, 4];
plot(x,y,’b-‘,’linewidth’, 1);
xlabel(‘itx’);
ylabel(‘ity’);

plot(-2, -2,’gs’,’MarkerSize’, 12,’linewidth’, 3);

begin{align*}
&mathbf{B}: f(0, 0) = 0\
&mathbf{G}: f(1, 2) = 6\
&mathbf{W}: f(3, 4) = 26~(text{before replacement})\
&mathbf{R}: f(-2, -2) = 8\
&mathbf{C}: f(-0.75, -0.5) = 1.0625
end{align*}
12

13



k = 0:
k = 1:
k = 2:

a
b
c
d
e
f

oindent textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k – h_k triangledown f(mathbf{z}_k)$\~\
oindent 1$^{st}$ iteration: $mathbf{z}_{1} = left[ begin{array}{c} 7 \ 8 end{array} right] – 0.1 left[ begin{array}{c} 2 times 7 – 1 \ 2 times 8 + 1 end{array} right] = left[ begin{array}{c} 5.7 \ 6.3 end{array} right]$;

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] – h_k left[ begin{array}{c} 2 x_k – 1 \ 2 y_k + 1 end{array} right]$

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] – h_k left[ begin{array}{c} 2 x_k – 1 \ 2 y_2 + 1 end{array} right]$

oindent $f(x, y)$ = 72.7800.
\~\

oindent 2$^{nd}$ iteration: $mathbf{z}_{2} = left[ begin{array}{c} 5.7 \ 6.3 end{array} right] – 0.1 left[ begin{array}{c} 2 times 5.7 – 1 \ 2 times 6.3 + 1 end{array} right] = left[ begin{array}{c} 4.66 \ 4.94 end{array} right]$;

oindent $f(x, y)$ = 46.3992. \~\
oindent 3$^{rd}$ iteration: $mathbf{z}_{3} = left[ begin{array}{c} 4.66\ 4.94 end{array} right] – 0.1 left[ begin{array}{c} 2 times 4.66 – 1 \ 2 times 4.94 + 1 end{array} right] = left[ begin{array}{c} 3.828\ 3.852 end{array} right]$;

oindent $f(x, y)$ = 29.5155.

frac{df(x,y)}{dx}\
frac{df(x,y)}{dy}

13

14



k = 0:
k = 1:
k = 2:

oindent textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k – h_k triangledown f(mathbf{z}_k)$\~\
oindent 1$^{st}$ iteration: $mathbf{z}_{1} = left[ begin{array}{c} 7 \ 8 end{array} right] – 0.1 left[ begin{array}{c} 2 times 7 – 1 \ 2 times 8 + 1 end{array} right] = left[ begin{array}{c} 5.7 \ 6.3 end{array} right]$;

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] – h_k left[ begin{array}{c} 2 x_k – 1 \ 2 y_k + 1 end{array} right]$

$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] – h_k left[ begin{array}{c} 2 x_k – 1 \ 2 y_2 + 1 end{array} right]$

oindent $f(x, y)$ = 72.7800.
\~\

oindent 2$^{nd}$ iteration: $mathbf{z}_{2} = left[ begin{array}{c} 5.7 \ 6.3 end{array} right] – 0.1 left[ begin{array}{c} 2 times 5.7 – 1 \ 2 times 6.3 + 1 end{array} right] = left[ begin{array}{c} 4.66 \ 4.94 end{array} right]$;

oindent $f(x, y)$ = 46.3992. \~\
oindent 3$^{rd}$ iteration: $mathbf{z}_{3} = left[ begin{array}{c} 4.66\ 4.94 end{array} right] – 0.1 left[ begin{array}{c} 2 times 4.66 – 1 \ 2 times 4.94 + 1 end{array} right] = left[ begin{array}{c} 3.828\ 3.852 end{array} right]$;

oindent $f(x, y)$ = 29.5155.

frac{df(x,y)}{dx}\
frac{df(x,y)}{dy}

14

15

Initial guess
k = 0
k = 1
k = 2

[X,Y] = meshgrid(1:0.5:10,1:0.5:10);
Z = (X-1).*X + (Y+1).*Y;
mesh(X,Y,Z);
alpha0.5;
holdon;
plot3(7, 8, 114,’ro’,’MarkerSize’, 12,’linewidth’, 3);
plot3(5.7, 6.3, 72.78,’rx’,’MarkerSize’, 12,’linewidth’, 3);
plot3(4.66, 4.94, 46.3992,’rx’,’MarkerSize’, 12,’linewidth’, 3);
plot3(3.828, 3.852, 29.5155,’rx’,’MarkerSize’, 12,’linewidth’, 3);
xlabel(‘itx’);
ylabel(‘ity’);
zlabel(‘{itf}(itx,ity)’);

15

16

oindent $mathbf{x}^* = left( left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] right)^{-1} left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} -4 \ 4.5 end{array} right]$ \~\~
ewline
%
textbf{Verification:} $mathbf{A}mathbf{x}^{*}-mathbf{B} = mathbf{0}$?\
$left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] mathbf{x}^* – left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} 0 \ 0 end{array} right]$

16

17

fbox{$h_k = frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)}$}

$f(x, y) = frac{1}{2} begin{bmatrix} x \ y end{bmatrix}^T begin{bmatrix} 2&0 \ 0&2 end{bmatrix} begin{bmatrix} x \ y end{bmatrix} – begin{bmatrix} 1&-1 end{bmatrix} begin{bmatrix} x \ y end{bmatrix}$

textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k – h_k triangledown f(mathbf{z}_k)$\~\

17

18

fbox{$h_k = frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)}$}
18

19

textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k – frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)} triangledown f(mathbf{z}_k)$

textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k – h_k triangledown f(mathbf{z}_k)$\~\

Randomly pick an initial condition as $mathbf{z}_k = left[ begin{array}{c} 1 \ -2 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_k + frac{1}{2} triangledown f(mathbf{z}_k) \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 2x_k-1 \ 2y_k+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 2times 1-1 \ 2times-2+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 1 \ -3 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

Run another iteration (e.g., $k+1 rightarrow k+2$) by taking $mathbf{z}_{k+1} = left[ begin{array}{c} 0.5 \ -0.5 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_{k+1} + frac{1}{2} triangledown f(mathbf{z}_{k+1}) \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 2x_{k+1}-1 \ 2y_{k+1}+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 2times 0.5-1 \ 2times-0.5+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 0 \ 0 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

19

20

textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k – frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)} triangledown f(mathbf{z}_k)$

Randomly pick an initial condition as $mathbf{z}_k = left[ begin{array}{c} 1 \ -2 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_k + frac{1}{2} triangledown f(mathbf{z}_k) \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 2x_k-1 \ 2y_k+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 2times 1-1 \ 2times-2+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 1 \ -3 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

Run another iteration (e.g., $k+1 rightarrow k+2$) by taking $mathbf{z}_{k+1} = left[ begin{array}{c} 0.5 \ -0.5 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_{k+1} + frac{1}{2} triangledown f(mathbf{z}_{k+1}) \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 2x_{k+1}-1 \ 2y_{k+1}+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 2times 0.5-1 \ 2times-0.5+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 0 \ 0 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

20

21

[a, b]
[c, d]
e
[f, g]
h
[i, j]

Randomly pick an initial condition as $mathbf{z}_k = left[ begin{array}{c} 1 \ -2 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_k + frac{1}{2} triangledown f(mathbf{z}_k) \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 2x_k-1 \ 2y_k+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 2times 1-1 \ 2times-2+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right] – frac{1}{2} left[ begin{array}{c} 1 \ -3 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

Run another iteration (e.g., $k+1 rightarrow k+2$) by taking $mathbf{z}_{k+1} = left[ begin{array}{c} 0.5 \ -0.5 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_{k+1} + frac{1}{2} triangledown f(mathbf{z}_{k+1}) \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 2x_{k+1}-1 \ 2y_{k+1}+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 2times 0.5-1 \ 2times-0.5+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right] – frac{1}{2} left[ begin{array}{c} 0 \ 0 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*}

$mathbf{x}_{k+1} = left[ begin{array}{c} -1 \ -2 end{array} right] + left[ begin{array}{cc} 1 & 0 \ 0 & 1 end{array} right]left[ begin{array}{c} 0.5 \ 0.5 end{array} right]$

$mathbf{x}_{k+1} = left[ begin{array}{c} -0.5 \ -1.5 end{array} right] + left[ begin{array}{cc} 1 & 0 \ 0 & 1 end{array} right]left[ begin{array}{c} 0.5 \ 0.5 end{array} right]

21

22

Approximate

Decision variables

y = g(x_1, x_2)

hat{y} = w_1 x_1 + w_2 x_2 + b

Ideal case: $y = hat{y}$, i.e., $y – hat{y} = 0$

22

23

oindent $mathbf{x}^* = left( left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] right)^{-1} left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} -4 \ 4.5 end{array} right]$ \~\~
ewline
%
textbf{Verification:} $mathbf{A}mathbf{x}^{*}-mathbf{B} = mathbf{0}$?\
$left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] mathbf{x}^* – left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} 0 \ 0 end{array} right]$

23

24

Minimise MSE subject to w1, w2 and b

Simplified notations:

frac{(hat{y}_1 – y_1)^2 + (hat{y}_2 – y_2)^2 + cdots + (hat{y}_M – y_M)^2 }{M}

y(M) leftarrow y_m, hat{y}(M) leftarrow hat{y}_m

24

25

= frac{(hat{y}_1 – y_1)^2 + (hat{y}_2 – y_2)^2 + cdots + (hat{y}_M – y_M)^2 }{M}

= frac{big( (w_1x_1(1) +w_2x_2(1) + b) – y_1 big)^2 + big((w_1x_1(2) +w_2x_2(2) + b) – y_2 big)^2 + cdots + big((w_1x_1(M) +w_2x_2(M) + b) – y_M big)^2 }{M}

min_{w_1, w_2, b} frac{1}{M} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)^2

y(M) leftarrow y_m, hat{y}(M) leftarrow hat{y}_m

Remark: The ideal set of $w_1$, $w_2$ and $b$ will lead to the cost $f(w_1, w_2, b) = 0$.

25

26

textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k – h_k triangledown f(mathbf{z}_k)$ where $mathbf{z}_k = left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]$
begin{align*}
triangledown f(mathbf{z}_k) &= left[ begin{array}{c} frac{partial f(mathbf{z}_k)}{partial w_1 } \ \ frac{partial f(mathbf{z}_k)}{partial w_2} \ \ frac{partial f(mathbf{z}_k)}{partial b} end{array} right] = left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big) end{array} right]
end{align*}

f(w_1, w_2, b) = f(mathbf{z}) = frac{1}{M} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)^2

26

27

textbf{Update rule:}\ begin{align*} mathbf{z}_{k+1} &= mathbf{z}_k – h_k triangledown f(mathbf{z}_k)\ Rightarrow left[ begin{array}{c} w_{1_{k+1}} \ w_{2_{k+1}} \ b_{k+1} end{array} right] &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] – h_k left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big) end{array} right] \ &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] – frac{2h_k}{M} left[ begin{array}{c} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_1(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_2(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big) end{array} right] end{align*}
27

28

textbf{Update rule:}\ begin{align*} mathbf{z}_{k+1} &= mathbf{z}_k – h_k triangledown f(mathbf{z}_k)\ Rightarrow left[ begin{array}{c} w_{1_{k+1}} \ w_{2_{k+1}} \ b_{k+1} end{array} right] &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] – h_k left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big) end{array} right] \ &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] – frac{2h_k}{M} left[ begin{array}{c} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_1(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_2(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big) end{array} right] end{align*}
28

29

textbf{Update rule:}\ begin{align*} mathbf{z}_{k+1} &= mathbf{z}_k – h_k triangledown f(mathbf{z}_k)\ Rightarrow left[ begin{array}{c} w_{1_{k+1}} \ w_{2_{k+1}} \ b_{k+1} end{array} right] &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] – h_k left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big) end{array} right] \ &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right] – frac{2h_k}{M} left[ begin{array}{c} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_1(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big)x_2(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) – y_i big) end{array} right] end{align*}
29

30

Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $Rightarrow M = 3$.

$mathbf{z}_0 = mathbf{z}_0 = left[ begin{array}{c} 1 \ 1 \ 1 end{array} right] Rightarrow
w_1 = w_2 = b = 1$
begin{align*}
hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 times 1) + (1 times -2) + 1 = 0\
hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 times 2) + (1 times 4) + 1 = 7\
hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 times 3) + (1 times 0) + 1 = 4
end{align*}

begin{align*}
MSE &= frac{1}{M} big( (hat{y}_1 – y_1)^2 + (hat{y}_2 – y_2)^2 + (hat{y}_3 – y_3)^2 ) big)\
&= frac{1}{3} big( (0 – 3)^2 + (7 – (-1))^2 + (4 – 5)^2 ) big)\
&= 24.6667
end{align*}

30

31

Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $Rightarrow M = 3$.

$mathbf{z}_0 = mathbf{z}_0 = left[ begin{array}{c} 1 \ 1 \ 1 end{array} right] Rightarrow
w_1 = w_2 = b = 1$
begin{align*}
hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 times 1) + (1 times -2) + 1 = 0\
hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 times 2) + (1 times 4) + 1 = 7\
hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 times 3) + (1 times 0) + 1 = 4
end{align*}

begin{align*}
MSE &= frac{1}{M} big( (hat{y}_1 – y_1)^2 + (hat{y}_2 – y_2)^2 + (hat{y}_3 – y_3)^2 ) big)\
&= frac{1}{3} big( (0 – 3)^2 + (7 – (-1))^2 + (4 – 5)^2 ) big)\
&= 24.6667
end{align*}

31

32

33

34

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

54

55

Ingredient 1

Ingredient 2

Others

Food Product

56

57

58
Q16. Explain how “Recursive least-squares” works.

A least-squares problem is described as $displaystyle min_{mathbf{x}} f(mathbf{x}) = midmid mathbf{A}mathbf{x} – mathbf{B} midmid_2^2$. Its solution is given as $mathbf{x} = big( mathbf{A}^Tmathbf{A} big)^{-1} mathbf{A}^T mathbf{B}$.\

Denote the $i$-th row of $mathbf{A}$ as $mathbf{a}_i^T$ and the $i$-th row of $mathbf{B}_i$ as $b_i$.\ The solution $mathbf{x}$ can be represented in row form as follows: $$mathbf{x} = Big( displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T Big)^{-1} sum_{i=1}^m b_i mathbf{a}_i.$$

For example, $$mathbf{A} = left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right] = left[ begin{array}{c} mathbf{a}_1^T \ mathbf{a}_2^T \ vdots \ mathbf{a}_m^T \ end{array} right],$$ $$mathbf{B} = left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right].$$
58

59

mathbf{A}^Tmathbf{A}

displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T

$left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn}  end{array} right]$

$left[ begin{array}{cccc} a_{11} & a_{21} & cdots & a_{m1} \ a_{12} & a_{22} & cdots & a_{m2} \ vdots & vdots & ddots & vdots \ a_{1n} & a_{2n} & cdots & a_{mn}  end{array} right]$

left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right]

$mathbf{a}_1mathbf{a}_1^T + mathbf{a}_2mathbf{a}_2^T + cdots + mathbf{a}_mmathbf{a}_m^T$

59

60
What happen if a new sample is coming in?

$$mathbf{A} = left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right] = left[ begin{array}{c} mathbf{a}_1^T \ mathbf{a}_2^T \ vdots \ mathbf{a}_m^T \ end{array} right], qquad mathbf{B} = left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right].$$

$left[ begin{array}{cccc} a_{m+1,1} & a_{m+1,2} & vdots & a_{m+1,n} end{array} right] quad = quad mathbf{a}_{m+1}^T$

$b_{m+1}$

The new solution can be written as $$mathbf{x}_{new} = Big( displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T Big)^{-1} big( sum_{i=1}^m b_i mathbf{a}_i + b_{m+1} mathbf{a}_{m+1} big).$$

60

61

At the time that you have $m$ samples, recall that the solution is: $$mathbf{x} = mathbf{P}(m)^{-1} mathbf{q}(m)$$
where $$mathbf{P}(m) = mathbf{A}^Tmathbf{A} = displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T $$ and $$mathbf{q}(m) = mathbf{A}^T mathbf{B} = sum_{i=1}^m b_i mathbf{a}_i.$$

With the new sample $mathbf{a}_{m+1}^T$ and and $b_{m+1}$, the solution is: $$mathbf{x}_{new} = mathbf{P}(m+1)^{-1} mathbf{q}(m+1)$$ where $$mathbf{P}(m+1) = sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T = mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T $$ and $$mathbf{q}(m+1) = sum_{i=1}^m b_i mathbf{a}_i + b_{m+1} mathbf{a}_{m+1}.$$
61

62
When new sample comes in, do we need to compute the inverse of P(m+1)?

Can we use the inverse of P(m) to obtain P(m+1)?

Any mathematical trick?

oindent textit{Rank one update formula:} $$big( mathbf{P} + mathbf{a}mathbf{a}^T big)^{-1} = mathbf{P}^{-1} – frac{1}{1+mathbf{a}^T mathbf{P}^{-1} mathbf{a}} big( mathbf{P^{-1} mathbf{a}} big) big( mathbf{P^{-1} mathbf{a}} big)^T$$

oindenttextit{Remark:} Rank one update formula is valid when $mathbf{P} = mathbf{P}^T$, and $mathbf{P}$ and $mathbf{P} + mathbf{a}mathbf{a}^T$ are both invertible.
62

63

oindent textit{Rank one update formula:} $$big( mathbf{P} + mathbf{a}mathbf{a}^T big)^{-1} = mathbf{P}^{-1} – frac{1}{1+mathbf{a}^T mathbf{P}^{-1} mathbf{a}} big( mathbf{P}^{-1} mathbf{a} big) big( mathbf{P}^{-1} mathbf{a} big)^T$$

We have:
$$mathbf{P}(m+1) = sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T = mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T$$

$mathbf{P}(m+1)^{-1} equiv Big( mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T Big)^{-1}$.

63

64

65

66

67

68

69

70

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

x

f(x) =|| Ax�B ||22 where A =

1 2

3 4


and

B =


5

6


. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

W.

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2]
1

2

3

4

5

1

0 0.5 1 1.5 2 2.5 3

x

0

0.5

1

1.5

2

2.5

3

3.5

4

y

00.5 11.5 22.5 3
x
0
0.5
1
1.5
2
2.5
3
3.5
4
y

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2 (Suggested Solutions)

Q1. Answer can be found in lecture notes.

Q2. Answer can be found in lecture notes.

Q3. x


=


1 2

3 4

�T 
1 2

3 4

�!�1 
1 2

3 4

�T 
5

6


=


�4
4.5

Verification:


1 2

3 4


x

⇤ �

5

6


=


0

0

Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)

b. M =

B+G
2

=

(0,0)+(1,2)
2

= (0.5, 1); f(0.5, 1) = 1.75.

c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.
d. As f(R) > f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R.C =W+M2=(�2,�2)+(0.5,1)2= (�0.75,�0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2)Q5. Let z =xy�. Of(z) =2x� 12y + 1�. According to the update rule, zk+1 =zk � hkOf(zk), we have1stiteration: z1 =78�� 0.12⇥ 7� 12⇥ 8 + 1�=5.76.3�; f(x, y) = 72.7800.2nditeration: z2 =5.76.3�� 0.12⇥ 5.7� 12⇥ 6.3 + 1�=4.664.94�; f(x, y) = 46.3992.3rditeration: z3 =4.664.94�� 0.12⇥ 4.66� 12⇥ 4.94 + 1�=3.8283.852�; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x� 12y + 1�where zk =xkyk�.f(z) =12zTQz� bTz where Q =2 00 2�and bT=⇥�1 1⇤.1Department of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2 (Suggested Solutions)Q1. Answer can be found in lecture notes.Q2. Answer can be found in lecture notes.Q3. x⇤= 1 23 4�T 1 23 4�!�1 1 23 4�T 56�=�44.5�Verification:1 23 4�x⇤ �56�=00�Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.B: (0, 0); G: (1, 2); W: (3, 4)b. M =B+G2=(0,0)+(1,2)2= (0.5, 1); f(0.5, 1) = 1.75.c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.d. As f(R) > f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R.C =W+M2=(�2,�2)+(0.5,1)2= (�0.75,�0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2)Q5. Let z =xy�. Of(z) =2x� 12y + 1�. According to the update rule, zk+1 =zk � hkOf(zk), we have1stiteration: z1 =78�� 0.12⇥ 7� 12⇥ 8 + 1�=5.76.3�; f(x, y) = 72.7800.2nditeration: z2 =5.76.3�� 0.12⇥ 5.7� 12⇥ 6.3 + 1�=4.664.94�; f(x, y) = 46.3992.3rditeration: z3 =4.664.94�� 0.12⇥ 4.66� 12⇥ 4.94 + 1�=3.8283.852�; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x� 12y + 1�where zk =xkyk�.f(z) =12zTQz� bTz where Q =2 00 2�and bT=⇥�1 1⇤.1Traditional Numerical Methods: Nelder-Mead DownhillSimplex MethodInitialise start-ing pointsEvaluation ofstarting pointsDetermineB, G, and Wf (R) < f (G)?f (B) < f (R)?Case (i): Reflextion/ExpansionReplaceW with RReflectionCompute Ewith f (E)f (E) < f (B)?ReplaceW with EExpansionReplaceW with RReflectionf (R) < f (W)?Case (ii): Contraction/ShrinkingReplaceW with RCompute Cwith f (C)f (C) < f (W)?ReplaceW with CComputeS and f (S)ReplaceW with SReplaceG with MNoYesYesNo f (B) � f (R)YesNo f (E) � f (B)Note: The new W is used from this point onYesNo f (R) � f (W)YesContractionNo f (C) � f (W)ShrinkingFigure 6: Flowchart of Nelder-Mead Downhill Simplex Method.Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2017-18 24 / 68TraditionalNumericalMethods:Nelder-MeadDownhillSimplexMethodInitialisestart-ingpointsEvaluationofstartingpointsDetermineB,G,andWf(R) f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R.C =W+M2=(�2,�2)+(0.5,1)2= (�0.75,�0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2)Q5. Let z =xy�. Of(z) =2x� 12y + 1�. According to the update rule, zk+1 =zk � hkOf(zk), we have1stiteration: z1 =78�� 0.12⇥ 7� 12⇥ 8 + 1�=5.76.3�; f(x, y) = 72.7800.2nditeration: z2 =5.76.3�� 0.12⇥ 5.7� 12⇥ 6.3 + 1�=4.664.94�; f(x, y) = 46.3992.3rditeration: z3 =4.664.94�� 0.12⇥ 4.66� 12⇥ 4.94 + 1�=3.8283.852�; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x� 12y + 1�where zk =xkyk�.f(z) =12zTQz� bTz where Q =2 00 2�and bT=⇥�1 1⇤.1-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3x-2-101234y-2-1.5-1-0.500.511.522.53x-2-101234yDepartment of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| Ax�B ||22 where A =1 23 4�andB =56�. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [�1 � 2]123451Department of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2 (Suggested Solutions)Q1. Answer can be found in lecture notes.Q2. Answer can be found in lecture notes.Q3. x⇤= 1 23 4�T 1 23 4�!�1 1 23 4�T 56�=�44.5�Verification:1 23 4�x⇤ �56�=00�Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.B: (0, 0); G: (1, 2); W: (3, 4)b. M =B+G2=(0,0)+(1,2)2= (0.5, 1); f(0.5, 1) = 1.75.c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.d. As f(R) > f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R.C =W+M2=(�2,�2)+(0.5,1)2= (�0.75,�0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2)Q5. Let z =xy�. Of(z) =2x� 12y + 1�. According to the update rule, zk+1 =zk � hkOf(zk), we have1stiteration: z1 =78�� 0.12⇥ 7� 12⇥ 8 + 1�=5.76.3�; f(x, y) = 72.7800.2nditeration: z2 =5.76.3�� 0.12⇥ 5.7� 12⇥ 6.3 + 1�=4.664.94�; f(x, y) = 46.3992.3rditeration: z3 =4.664.94�� 0.12⇥ 4.66� 12⇥ 4.94 + 1�=3.8283.852�; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x� 12y + 1�where zk =xkyk�.f(z) =12zTQz� bTz where Q =2 00 2�and bT=⇥�1 1⇤.1df(x, y)dxdf(x, y)dy010508 10100f(x,y)6 8150y6x42004220 001050810100f(x,y)68150y6x420042200Department of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| Ax�B ||22 where A =1 23 4�andB =56�. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [�1 � 2]123451Department of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| Ax�B ||22 where A =1 23 4�andB =56�. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [�1 � 2]123451Department of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| Ax�B ||22 where A =1 23 4�andB =56�. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [�1 � 2]123451Department of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2 (Suggested Solutions)Q1. Answer can be found in lecture notes.Q2. Answer can be found in lecture notes.Q3. x⇤= 1 23 4�T 1 23 4�!�1 1 23 4�T 56�=�44.5�Verification:1 23 4�x⇤ �56�=00�Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.B: (0, 0); G: (1, 2); W: (3, 4)b. M =B+G2=(0,0)+(1,2)2= (0.5, 1); f(0.5, 1) = 1.75.c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.d. As f(R) > f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R.C =W+M2=(�2,�2)+(0.5,1)2= (�0.75,�0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2)Q5. Let z =xy�. Of(z) =2x� 12y + 1�. According to the update rule, zk+1 =zk � hkOf(zk), we have1stiteration: z1 =78�� 0.12⇥ 7� 12⇥ 8 + 1�=5.76.3�; f(x, y) = 72.7800.2nditeration: z2 =5.76.3�� 0.12⇥ 5.7� 12⇥ 6.3 + 1�=4.664.94�; f(x, y) = 46.3992.3rditeration: z3 =4.664.94�� 0.12⇥ 4.66� 12⇥ 4.94 + 1�=3.8283.852�; f(x, y) = 29.5155.Q6. Update rule: Of(z) =2x� 12y + 1�where zk =xkyk�.f(z) =12zTQz� bTz where Q =2 00 2�and bT=⇥�1 1⇤.1f(x, y) = 12xy�T 2 00 2� xy��⇥1 �1⇤ xy�<latexit sha1_base64=”OCzHItBfbx0kYQOereEJCFZ7Mfw=”>

A
A

A
C

w
H

i
c

n
V

F
d

S
+

N
A

F
J

1
E

X
b

W
6

a
9

V
H

X
y

5
b

t
7

i
w

l
i

Q
I

+
i

K
I

v
v

i
o

Y
F

V
o

u
m

U
y

v
W

k
H

J
5

M
4

M
5

H
G

k
D

/
p

i
+

y
/

2
U

k
t

s
t

p
9

E
A

8
M

H
M

4
9

9
2

P
u

j
T

L
B

t
f

G
8

P
4

6
7

s
L

j
0

Z
X

l
l

t
b

G
2

/
v

X
b

R
n

N
z

6
1

q
n

u
W

L
Y

Z
a

l
I

1
W

1
E

N
Q

o
u

s
W

u
4

E
X

i
b

K
a

R
J

J
P

A
m

u
j

u
r

4
z

c
P

q
D

R
P

5
Z

U
p

M
u

w
n

d
C

R
5

z
B

k
1

V
h

o
0

n
3

f
j

v
c

k
v

K
H

7
C

M
Y

S
x

o
q

z
0

q
z

K
o

I
I

x
w

x
G

U
Z

J
d

Q
o

P
q

l
g

A
m

E
I

B
Y

Q
o

h
6

/
i

7
6

s
5

W
9

D
2

a
q

P
X

D
t

5
a

P
1

Q
P

9
u

d
s

f
n

v
f

/
0

S
l

3
c

a
g

2
f

I
6

3
h

Q
w

T
/

w
Z

a
Z

E
Z

L
g

b
N

p
3

C
Y

s
j

x
B

a
Z

i
g

W
v

e
O

M
t

M
v

q
T

K
c

C
a

w
a

Y
a

4
x

o
+

y
O

j
r

B
n

q
a

Q
J

6
n

4
5

3
X

8
F

P
6

w
y

h
D

h
V

9
k

k
D

U
/

X
f

j
J

I
m

W
h

d
J

Z
J

1
2

w
L

F
+

H
6

v
F

/
8

V
6

u
Y

m
P

+
i

W
X

W
W

5
Q

s
p

d
G

c
S

7
A

p
F

A
f

E
4

Z
c

I
T

O
i

s
I

Q
y

x
e

2
s

w
M

b
U

X
t

H
Y

k
z

f
q

J
f

j
v

v
z

x
P

r
o

O
O

f
9

A
5

u
A

x
a

J
6

e
z

d
a

y
Q

H
f

K
d

7
B

G
f

H
J

I
T

c
k

4
u

S
J

c
w

5
9

h
h

j
n

A
S

9
9

Q
d

u
6

l
7

/
2

J
1

n
V

n
O

N
n

k
D

9
/

E
v

j
G

T
Z

f
w

=
=

</latexit>

Department of Informatics, King’s College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2 (Suggested Solutions)

Q1. Answer can be found in lecture notes.

Q2. Answer can be found in lecture notes.

Q3. x⇤ =


1 2
3 4

�T 
1 2
3 4

�!�1 
1 2
3 4

�T 
5
6


=


�4
4.5

Verification:


1 2
3 4


x⇤ �


5
6


=


0
0

Q4. a. f(0, 0) = (0� 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1� 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3� 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4)

b. M = B+G
2

= (0,0)+(1,2)
2

= (0.5, 1); f(0.5, 1) = 1.75.

c. R = 2M�W = 2(0.5, 1)� (3, 4) = (�2,�2); f(�2,�2) = 8.
d. As f(R) > f(G), Case (ii) is performed.

As f(R) < f(W), W is replaced with R.C = W+M2= (�2,�2)+(0.5,1)2= (�0.75,�0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (�0.75,�0.5) and W: (1, 2)Q5. Let z =xy�. Of(z) =2x� 12y + 1�. According to the update rule, zk+1 =zk � hkOf(zk), we have1st iteration: z1 =78�� 0.12⇥ 7� 12⇥ 8 + 1�=5.76.3�; f(x, y) = 72.7800.2nd iteration: z2 =5.76.3�� 0.12⇥ 5.7� 12⇥ 6.3 + 1�=4.664.94�; f(x, y) = 46.3992.3rd iteration: z3 =4.664.94�� 0.12⇥ 4.66� 12⇥ 4.94 + 1�=3.8283.852�; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x� 12y + 1�where zk =xkyk�.f(z) = 12zTQz� bTz where Q =2 00 2�and bT =⇥1 �1⇤.1hk =Of(zk)TOf(zk)Of(zk)TQOf(zk)where Of(zk) =2xk � 12yk + 1�.hk =Of(zk)TOf(zk)Of(zk)TQOf(zk)=⇥2xk � 1 2yk + 1⇤ 2xk � 12yk + 1�⇥2xk � 1 2yk + 1⇤ 2 00 2� 2xk � 12yk + 1�=4x2k � 4xk + 2 + 4y2k + 4yk8x2k � 8xk + 4 + 8y2k + 8yk=12.Q7. Update rule: xk+1 = xk + Dkhk where xk =xkyk�, Dk =1 00 1�and hk =0.50.5�.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.51 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 02 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.53 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.54 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.55 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.52Department of Informatics, King’s College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| Ax�B ||22 where A =1 23 4�andB =56�. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [�1 � 2]123451Traditional Numerical Methods: Random WalkAlgorithm: Random Walk Optimisationinput: f (x) : ¬n!¬; x0: an initial solutionoutput: x⇤, a local minimum of the cost function f (x)k 0; xbest = x0; Threshold ;while STOP-CRIT and k < kmax doGenerate Dk and hk;xk+1 xk +Dkhk;If f (xk+1) > f (xk)

If rand < Threshold Then xk+1 xk;ElseIf f (xbest) > f (xk+1) Then xbest xk+1;
End

k k+1;
end

return xk and xbest;

Table 3: Pseudo Code of Random Walk Optimisation.

Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2018-19 46 / 68

TraditionalNumericalMethods:RandomWalkAlgorithm:RandomWalkOptimisation
input:f(x):¬
n
!¬;x
0
:aninitialsolution
output:x

,alocalminimumofthecostfunctionf(x)
k 0;x
best
=x
0
;Threshold;
whileSTOP-CRITandk f(x
k
)
Ifrand f(x
k+1
)Thenx
best
x
k+1
;
End
k k+1;
end
returnx
k
andx
best
;
Table3:PseudoCodeofRandomWalkOptimisation.
DrH.K.Lam(KCL) Optimisation BiologicallyInspiredMethods2018-1946/68

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

x

f(x) =|| Ax�B ||22 where A =

1 2

3 4


and

B =


5

6


. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

W.

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2]
1

2

3

4

5

1

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

where Of(zk) =

2xk � 1
2yk + 1


.

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

=


2xk � 1 2yk + 1

⇤ 
2xk � 1
2yk + 1


2xk � 1 2yk + 1

⇤ 
2 0

0 2

� 
2xk � 1
2yk + 1

=

4x

2
k � 4xk + 2 + 4y

2
k + 4yk

8x

2
k � 8xk + 4 + 8y

2
k + 8yk

=

1

2

.

b. Update rule: zk+1 = zk �
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

Of(zk)

Randomly pick an initial condition as zk =


1

�2


.

zk+1 = zk +
1

2

Of(zk)

=


1

�2


1

2


2xk � 1
2yk + 1

=


1

�2


1

2


2⇥ 1� 1
2⇥�2 + 1

=


1

�2


1

2


1

�3

=


0.5

�0.5

Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =

0.5

�0.5


.

zk+1 = zk+1 +
1

2

Of(zk+1)

=


0.5

�0.5


1

2


2xk+1 � 1
2yk+1 + 1

=


0.5

�0.5


1

2


2⇥ 0.5� 1
2⇥�0.5 + 1

=


0.5

�0.5


1

2


0

0

=


0.5

�0.5

Q7. Update rule: xk+1 = xk + Dkhk where xk =


xk

yk


, Dk =


1 0

0 1


and hk =


0.5

0.5


.

2

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

where Of(zk) =

2xk � 1
2yk + 1


.

hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

=


2xk � 1 2yk + 1

⇤ 
2xk � 1
2yk + 1


2xk � 1 2yk + 1

⇤ 
2 0

0 2

� 
2xk � 1
2yk + 1

=

4x

2
k � 4xk + 2 + 4y

2
k + 4yk

8x

2
k � 8xk + 4 + 8y

2
k + 8yk

=

1

2

.

b. Update rule: zk+1 = zk �
Of(zk)TOf(zk)
Of(zk)TQOf(zk)

Of(zk)

Randomly pick an initial condition as zk =


1

�2


.

zk+1 = zk +
1

2

Of(zk)

=


1

�2


1

2


2xk � 1
2yk + 1

=


1

�2


1

2


2⇥ 1� 1
2⇥�2 + 1

=


1

�2


1

2


1

�3

=


0.5

�0.5

Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =

0.5

�0.5


.

zk+1 = zk+1 +
1

2

Of(zk+1)

=


0.5

�0.5


1

2


2xk+1 � 1
2yk+1 + 1

=


0.5

�0.5


1

2


2⇥ 0.5� 1
2⇥�0.5 + 1

=


0.5

�0.5


1

2


0

0

=


0.5

�0.5

Q7. Update rule: xk+1 = xk + Dkhk where xk =


xk

yk


, Dk =


1 0

0 1


and hk =


0.5

0.5


.

2

f(�0.5,�1.5) = 1.5

<latexit sha1_base64=”5BnV9jHZb8MQGIhUrqY2qCsOWU8=”>

A
A

A
B

/
H

i
c

b
V

D
L

S
g

M
x

F
M

3
U

V
x

1
f

o
1

2
6

C
R

a
l

g
h

1
m

p
M

V
u

h
K

I
b

l
x

X
s

A
9

q
h

Z
N

J
M

G
5

p
5

k
G

S
E

Y
a

i
/

4
C

e
4

c
a

G
I

W
z

/
A

T
3

A
n

+
B

v
u

T
R

8
L

b
T

1
w

u
Y

d
z

7
i

U
3

x
4

0
Y

F
d

K
y

P
r

X
M

0
v

L
K

6
l

p
2

X
d

/
Y

3
N

r
e

M
X

b
3

G
i

K
M

O
S

Z
1

H
L

K
Q

t
1

w
k

C
K

M
B

q
U

s
q

G
W

l
F

n
C

D
f

Z
a

T
p

D
i

/
H

f
v

O
W

c
E

H
D

4
E

Y
m

E
X

F
8

1
A

+
o

R
z

G
S

S
u

o
a

O
a

9
Q

t
M

z
y

C
S

z
a

Z
v

k
Y

n
k

P
V

u
k

b
e

M
q

0
J

4
C

K
x

Z
y

R
f

P
f

o
q

f
L

/
f

d
2

p
d

4
6

P
T

C
3

H
s

k
0

B
i

h
o

R
o

V
y

L
p

p
I

h
L

i
h

k
Z

6
Z

1
Y

k
A

j
h

I
e

q
T

t
q

I
B

8
o

l
w

0
s

n
t

I
3

i
o

l
B

7
0

Q
q

4
q

k
H

C
i

/
t

5
I

k
S

9
E

4
r

t
q

0
k

d
y

I
O

a
9

s
f

i
f

1
4

6
l

V
3

F
S

G
k

S
x

J
A

G
e

P
u

T
F

D
M

o
Q

j
o

O
A

P
c

o
J

l
i

x
R

B
G

F
O

1
a

0
Q

D
x

B
H

W
K

q
4

d
F

2
F

Y
M

9
/

e
Z

E
0

T
k

2
7

Z
J

a
u

7
X

z
1

A
k

y
R

B
f

v
g

A
B

S
A

D
c

5
A

F
V

y
B

G
q

g
D

D
B

L
w

A
J

7
A

s
3

a
n

P
W

o
v

2
u

t
0

N
K

P
N

d
n

L
g

D
7

S
3

H
2

A
0

l
O

8
=

</latexit>

Department of Informatics, King’s College London

Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)

Tutorial 2

Q1. What are the advantages and disadvantages of gradient descent method?

Q2. Show how gradient descent method works using pseudo code.

Q3. Consider a least-squares problem, min

x

f(x) =|| Ax�B ||22 where A =

1 2

3 4


and

B =


5

6


. Find the optimal solution x.

Q4. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method.

a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and

W.

b. Find the midpoint M and f(M).

c. Find the reflection R and f(R).

d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)

is performed in the pseudo code of Nelder-Mead downhill simplex method,

C =

W+M
2

is used.

Q5. Considering the minimisation problem of the function: f(x, y) = (x�1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations

with step size hk = 0.1 and initial guess (7,8).

Q6. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the gradient descent method.

a. Determine the optimal step size hk.

b. Show that it requires one iteration for the gradient descent method to converge

with the optimal step size hk obtained in Q6a.

Q7. Consider the minimisation problem of the function: f(x, y) = (x � 1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the

worse solution) is 0.75 and the random number generator will generate a repeating

sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk

are 0.5. Fill in the content of the following table.

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2]
1

2

3

4

5

1

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5
1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0
2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5
3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5

Q8. a. Mean squared error:

1
M

MX

i=1

(ŷi � yi)2

Minimisation problem: min

w1,w2,b

1

M

MX

i=1

(ŷi � yi)2

b.

f(w1, w2, b) =
1

M

MX

i=1

(ŷi � yi)2

=

1

M

MX

i=1


(w1x1(i) + w2x2(i) + b)� yi

�2

Update rule: zk+1 = zk � hkOf(zk) where zk =

2

4
w1k

w2k

bk

3

5

Of(zk) =

2

66666
4

@f(zk)
@w1

@f(zk)
@w2

@f(zk)
@b

3

77777
5

=

2

6666666666666
4

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x1(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x2(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi

3

7777777777777
5

Update rule:

3

Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?

2

3

2.5

2

x1

1.5

1-2

-1

0

x2

1

2

3

5

4

3

2

1

0

-1
4

y

3
2.5
2
x
1
1.5
1
-2
-1
0
x
2
1
2
3
5
4
3
2
1
0
-1
4
y

Q8. Consider a system taking two input variables x1 and x2 and generate an output

y. A set of M input-output data is collected in an experiment, e.g., the dataset

is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean

Squared Error using the gradient descent method, where w1 and w2 and b are the

parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:

(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

2

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5
1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0
2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5
3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5

Q8. a. Mean squared error:

1
M

MX

i=1

(ŷi � yi)2

Minimisation problem: min

w1,w2,b

1

M

MX

i=1

(ŷi � yi)2

b.

f(w1, w2, b) =
1

M

MX

i=1

(ŷi � yi)2

=

1

M

MX

i=1


(w1x1(i) + w2x2(i) + b)� yi

�2

Update rule: zk+1 = zk � hkOf(zk) where zk =

2

4
w1k

w2k

bk

3

5

Of(xk) =

2

66666
4

@f(xk)
@w1

@f(xk)
@w2

@f(xk)
@b

3

77777
5

=

2

6666666666666
4

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x1(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x2(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi

3

7777777777777
5

Update rule:

3

Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?

2

Q8. Consider a system taking two input variables x1 and x2 and generate an output

y. A set of M input-output data is collected in an experiment, e.g., the dataset

is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean

Squared Error using the gradient descent method, where w1 and w2 and b are the

parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:

(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

2

k x

T
k f(xk) x

T
k+1 f(xk+1) rand() xbest f(xbest)

0 [�1 � 2] 4 [�0.5 � 1.5] 1.5 – [�0.5 � 1.5] 1.5
1 [�0.5 � 1.5] 1.5 [0 � 1] 0 – [0 � 1] 0
2 [0 � 1] 0 [0.5 � 0.5] �0.5 – [0.5 � 0.5] �0.5
3 [0.5 � 0.5] �0.5 [1 0] 0 0.8 [0.5 � 0.5] �0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 � 0.5] �0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 � 0.5] �0.5

Q8. a. Mean squared error:

1
M

MX

i=1

(ŷi � yi)2

Minimisation problem: min

w1,w2,b

1

M

MX

i=1

(ŷi � yi)2

b.

f(w1, w2, b) =
1

M

MX

i=1

(ŷi � yi)2

=

1

M

MX

i=1


(w1x1(i) + w2x2(i) + b)� yi

�2

Update rule: zk+1 = zk � hkOf(zk) where zk =

2

4
w1k

w2k

bk

3

5

Of(xk) =

2

66666
4

@f(xk)
@w1

@f(xk)
@w2

@f(xk)
@b

3

77777
5

=

2

6666666666666
4

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x1(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x2(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi

3

7777777777777
5

Update rule:

3

Q8. Consider a system taking two input variables x1 and x2 and generate an output

y. A set of M input-output data is collected in an experiment, e.g., the dataset

is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean

Squared Error using the gradient descent method, where w1 and w2 and b are the

parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:

(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

2

zk+1 = zk � hkOf(zk)

)

2

4
w1k+1

w2k+1

bk+1

3

5
=

2

4
w1k

w2k

bk

3

5� hk

2

6666666666666
4

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x1(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi


x2(i)

1
M

MX

i=1

2


(w1x1(i) + w2x2(i) + b)� yi

3

7777777777777
5

=

2

4
w1k

w2k

bk

3

5�
2hk

M

2

6666666666666
4

MX

i=1


(w1x1(i) + w2x2(i) + b)� yi


x1(i)

MX

i=1


(w1x1(i) + w2x2(i) + b)� yi


x2(i)

MX

i=1


(w1x1(i) + w2x2(i) + b)� yi

3

7777777777777
5

c. Step size: hk = 0.1

Dateset: (1,�2, 3), (2, 4,�1), (3, 0, 5) ) M = 3.

Initial guess: z0 =

2

4
w10

w20

b0

3

5
=

2

4
1

1

1

3

5

4

First iteration: k = with z0 =

2

4
1

1

1

3

5

2

4
w11

w21

b1

3

5
=

2

4
1

1

1

3

5�
2⇥ 0.1

3

2

6666666666666666
4


(w1x1(1) + w2x2(1) + b)� y1


x1(1)

+


(w1x1(2) + w2x2(2) + b)� y2


x1(2)

+


(w1x1(3) + w2x2(3) + b)� y3


x1(3)


(w1x1(1) + w2x2(1) + b)� y1


x2(1)

+


(w1x1(2) + w2x2(2) + b)� y2


x2(2)

+


(w1x1(3) + w2x2(3) + b)� y3


x2(3)


(w1x1(1) + w2x2(1) + b)� y1

+


(w1x1(2) + w2x2(2) + b)� y2

+


(w1x1(3) + w2x2(3) + b)� y3

3

7777777777777777
5

=

2

4
1

1

1

3

5�
2⇥ 0.1

3

2

6666666666666666
4


(1⇥ w1 � 2⇥ w2 + b)� 3


⇥ 1

+


(2⇥ w1 + 4⇥ w2 + b) + 1


⇥ 2

+


(3⇥ w1 + 0⇥ w2 + b)� 5


⇥ 3


(1⇥ w1 � 2⇥ w2 + b)� 3


⇥�2

+


(2⇥ w1 + 4⇥ w2 + b) + 1


⇥ 4

+


(3⇥ w1 + 0⇥ w2 + b)� 5


⇥ 0


(1⇥ w1 � 2⇥ w2 + b)� 3

+


(2⇥ w1 + 4⇥ w2 + b) + 1

+


(3⇥ w1 + 0⇥ w2 + b)� 5

3

7777777777777777
5

=

2

4
1

1

1

3

5�
2

3

⇥ 0.1⇥

2

4
10

38

4

3

5

=

2

4
0.3333

�1.5333
0.7333

3

5

Second iteration: k = 1 with z1 =

2

4
0.3333

�1.5333
0.7333

3

5

z2 =

2

4
1.4089

�0.3867
1.1244

3

5

Third iteration: k = 2 with z2 =

2

4
1.4089

�0.3867
1.1244

3

5

z3 =

2

4
0.8655

�1.2513
0.8542

3

5

Fourth iteration: k = 3 with z3 =

2

4
0.8655

�1.2513
0.8542

3

5

5

First iteration: k = with z0 =

2

4
1

1

1

3

5

2

4
w11

w21

b1

3

5
=

2

4
1

1

1

3

5�
2⇥ 0.1

3

2

6666666666666666
4


(w1x1(1) + w2x2(1) + b)� y1


x1(1)

+


(w1x1(2) + w2x2(2) + b)� y2


x1(2)

+


(w1x1(3) + w2x2(3) + b)� y3


x1(3)


(w1x1(1) + w2x2(1) + b)� y1


x2(1)

+


(w1x1(2) + w2x2(2) + b)� y2


x2(2)

+


(w1x1(3) + w2x2(3) + b)� y3


x2(3)


(w1x1(1) + w2x2(1) + b)� y1

+


(w1x1(2) + w2x2(2) + b)� y2

+


(w1x1(3) + w2x2(3) + b)� y3

3

7777777777777777
5

=

2

4
1

1

1

3

5�
2⇥ 0.1

3

2

6666666666666666
4


(1⇥ w1 � 2⇥ w2 + b)� 3


⇥ 1

+


(2⇥ w1 + 4⇥ w2 + b) + 1


⇥ 2

+


(3⇥ w1 + 0⇥ w2 + b)� 5


⇥ 3


(1⇥ w1 � 2⇥ w2 + b)� 3


⇥�2

+


(2⇥ w1 + 4⇥ w2 + b) + 1


⇥ 4

+


(3⇥ w1 + 0⇥ w2 + b)� 5


⇥ 0


(1⇥ w1 � 2⇥ w2 + b)� 3

+


(2⇥ w1 + 4⇥ w2 + b) + 1

+


(3⇥ w1 + 0⇥ w2 + b)� 5

3

7777777777777777
5

=

2

4
1

1

1

3

5�
2

3

⇥ 0.1⇥

2

4
10

38

4

3

5

=

2

4
0.3333

�1.5333
0.7333

3

5

Second iteration: k = 1 with z1 =

2

4
0.3333

�1.5333
0.7333

3

5

z2 =

2

4
1.4089

�0.3867
1.1244

3

5

Third iteration: k = 2 with z2 =

2

4
1.4089

�0.3867
1.1244

3

5

z3 =

2

4
0.8655

�1.2513
0.8542

3

5

Fourth iteration: k = 3 with z3 =

2

4
0.8655

�1.2513
0.8542

3

5

5

z4 =

2

4
1.2832

�0.7097
0.9707

3

5

Fifth iteration: k = 4 with z4 =

2

4
1.2832

�0.7097
0.9707

3

5

z5 =

2

4
1.0478

�1.0728
0.8246

3

5

When increasing k, z converges at

2

4
2.0000

�1.0000
�1.0000

3

5

d.

It can be seen from the figure that the MSE is monotonic decreasing and

converges to almost zero, indicating that the choice of hk is fine.

6

z4 =

2

4
1.2832

�0.7097
0.9707

3

5

Fifth iteration: k = 4 with z4 =

2

4
1.2832

�0.7097
0.9707

3

5

z5 =

2

4
1.0478

�1.0728
0.8246

3

5

When increasing k, z converges at

2

4
2.0000

�1.0000
�1.0000

3

5

d.

It can be seen from the figure that the MSE is monotonic decreasing and

converges to almost zero, indicating that the choice of hk is fine.

6

Q8. Consider a system taking two input variables x1 and x2 and generate an output

y. A set of M input-output data is collected in an experiment, e.g., the dataset

is (x1(1), x2(1), y1), (x1(2), x2(2), y2), · · · , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean

Squared Error using the gradient descent method, where w1 and w2 and b are the

parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:

(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1

right or wrong?

2

z4 =

2

4
1.2832

�0.7097
0.9707

3

5

Fifth iteration: k = 4 with z4 =

2

4
1.2832

�0.7097
0.9707

3

5

z5 =

2

4
1.0478

�1.0728
0.8246

3

5

When increasing k, z converges at

2

4
2.0000

�1.0000
�1.0000

3

5

d.

It can be seen from the figure that the MSE is monotonic decreasing and

converges to almost zero, indicating that the choice of hk is fine.

6

Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?

Q9. Find the least-squares solution of Ax = B where:

A =

2

66
4

1 0 1
0 1 1
�1 0 1
0 �1 1

3

77
5 and B =

2

66
4

0
1
3
4

3

77
5 .

Q10. Consider the three data points (x, y) given as follows:

(0, 6), (1, 0), (2, 0),

which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized.

(1,0)

(0,6)

(2,0)

2

z4 =

2

4
1.2832
�0.7097
0.9707

3

5

Fifth iteration: k = 4 with z4 =

2

4
1.2832
�0.7097
0.9707

3

5

z5 =

2

4
1.0478
�1.0728
0.8246

3

5

When increasing k, z converges at

2

4
2.0000
�1.0000
�1.0000

3

5

d.

It can be seen from the figure that the MSE is monotonic decreasing and

converges to almost zero, indicating that the choice of hk is fine.

Q9. Represent a1, a2, a3 as the columns of A. It is found that it is an orthogonal set,
i.e., ai · aj = 0 for all i 6= j.
The solution can thus be calculated as

x =

2

6666666
4

B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3

3

7777777
5

=

2

666666
4


3

2


3

2

2

3

777777
5
.

Remark: The solution is the same as x = (ATA)�1ATB.

Q10. Given by the question, the three data points, in the form (x, y), are from a single
line. However, due to measurement errors, these three points, (0, 6), (1, 0) and (2,

0), obtained from measurement may not lie on a line. When you plot the data, it

can be seen that they are not lying in a straight line.

The general form of a straight line is

y = mx+ c.

where m and c are constants to be determined.

We are going to approximate the three data points using a straight line so it is

expected that the values of y from this straight line produces the following values:

6 = m · 0 + c from the data point (0, 6)

0 = m · 1 + c from the data point (1, 0)

0 = m · 2 + c from the data point (2, 0)

6

Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?

Q9. Find the least-squares solution of Ax = B where:

A =

2

66
4

1 0 1
0 1 1
�1 0 1
0 �1 1

3

77
5 and B =

2

66
4

0
1
3
4

3

77
5 .

Q10. Consider the three data points (x, y) given as follows:

(0, 6), (1, 0), (2, 0),

which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized.

(1,0)

(0,6)

(2,0)

2

Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)), · · · , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of ŷ = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.

a. Formulate the data fitting problem as a minimisation problem.

b. Denote the step size as hk. Derive the update rule for each parameter.

c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,�2, 3), (2, 4,�1), (3, 0, 5), obtain the best set of parameters for the linear
regressor.

d. Plot “iteration k” against “MSE” for 200 iterations. Is the choice of hk = 0.1
right or wrong?

Q9. Find the least-squares solution of Ax = B where:

A =

2

66
4

1 0 1
0 1 1
�1 0 1
0 �1 1

3

77
5 and B =

2

66
4

0
1
3
4

3

77
5 .

Q10. Consider the three data points (x, y) given as follows:

(0, 6), (1, 0), (2, 0),

which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized.

(1,0)

(0,6)

(2,0)

2

z4 =

2

4
1.2832
�0.7097
0.9707

3

5

Fifth iteration: k = 4 with z4 =

2

4
1.2832
�0.7097
0.9707

3

5

z5 =

2

4
1.0478
�1.0728
0.8246

3

5

When increasing k, z converges at

2

4
2.0000
�1.0000
�1.0000

3

5

d.

It can be seen from the figure that the MSE is monotonic decreasing and

converges to almost zero, indicating that the choice of hk is fine.

Q9. Represent a1, a2, a3 as the columns of A. It is found that it is an orthogonal set,
i.e., ai · aj = 0 for all i 6= j.
The solution can thus be calculated as

x =

2

6666666
4

B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3

3

7777777
5

=

2

666666
4


3

2


3

2

2

3

777777
5
.

Remark: The solution is the same as x = (ATA)�1ATB.

Q10. Given by the question, the three data points, in the form (x, y), are from a single
line. However, due to measurement errors, these three points, (0, 6), (1, 0) and (2,

0), obtained from measurement may not lie on a line. When you plot the data, it

can be seen that they are not lying in a straight line.

The general form of a straight line is

y = mx+ c.

where m and c are constants to be determined.

We are going to approximate the three data points using a straight line so it is

expected that the values of y from this straight line produces the following values:

6 = m · 0 + c from the data point (0, 6)

0 = m · 1 + c from the data point (1, 0)

0 = m · 2 + c from the data point (2, 0)

6

In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have

Ax = B

where

A =

2

4
0 1

1 1

2 1

3

5 , x =

m
c


, B =

2

4
6

0

0

3

5 .

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

The least-squares solution is found as

x =


m
c


= (ATA)�1ATB =


�3
5


.

Therefore the best-fit line is

y = mx+ c = �3x+ 5.

(1,0)

(0,6)

(2,0)

y = �3x+ 5

What exactly is the line y = f(x) = �3x+ 5 minimizing?

The least-squares solution x minimizes kAx�Bk22, which is the sum of the squares
of the entries of the vector Ax�B. More specifically,

Ax =

2

4
0 1

1 1

2 1

3

5

�3
5


=

2

4
0⇥�3 + 5
1⇥�3 + 5
2⇥�3 + 5

3

5 =

2

4
f(0)
f(1)
f(2)

3

5 =

2

4
5

2

�1

3

5 .

7

In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have

Ax = B

where

A =

2

4
0 1

1 1

2 1

3

5 , x =

m
c


, B =

2

4
6

0

0

3

5 .

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

The least-squares solution is found as

x =


m
c


= (ATA)�1ATB =


�3
5


.

Therefore the best-fit line is

y = mx+ c = �3x+ 5.

(1,0)

(0,6)

(2,0)

y = �3x+ 5

What exactly is the line y = f(x) = �3x+ 5 minimizing?

The least-squares solution x minimizes kAx�Bk22, which is the sum of the squares
of the entries of the vector Ax�B. More specifically,

Ax =

2

4
0 1

1 1

2 1

3

5

�3
5


=

2

4
0⇥�3 + 5
1⇥�3 + 5
2⇥�3 + 5

3

5 =

2

4
f(0)
f(1)
f(2)

3

5 =

2

4
5

2

�1

3

5 .

7

In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have

Ax = B

where

A =

2

4
0 1

1 1

2 1

3

5 , x =

m
c


, B =

2

4
6

0

0

3

5 .

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

The least-squares solution is found as

x =


m
c


= (ATA)�1ATB =


�3
5


.

Therefore the best-fit line is

y = mx+ c = �3x+ 5.

(1,0)

(0,6)

(2,0)

y = �3x+ 5

What exactly is the line y = f(x) = �3x+ 5 minimizing?

The least-squares solution x minimizes kAx�Bk22, which is the sum of the squares
of the entries of the vector Ax�B. More specifically,

Ax =

2

4
0 1

1 1

2 1

3

5

�3
5


=

2

4
0⇥�3 + 5
1⇥�3 + 5
2⇥�3 + 5

3

5 =

2

4
f(0)
f(1)
f(2)

3

5 =

2

4
5

2

�1

3

5 .

7

The entries of Ax are the y-coordinates of the graph of the line y = �3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax̂�B is the vertical distance of the graph from the data
points. y = f(x) = �3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared di↵erences

between the approximate y value given by the straight line and the desired value of
y given in B.

(1,0)

(0,6)

(2,0)

y = �3x+ 5

�1

�1

2

Q11. The general equation for a parabola is

y = Bx2 + Cx+D (1)

where B, C and D are constants to be determined and the data point is in the form
of (x, y).

Since the four data points, (�1, 1
2
), (1,�1), (2,�1

2
), (3, 2), are approximated by this

parabola equation, it is expected that the value of y from the parabola equation
should produce:

Data point (�1,
1

2
) :

1

2
= B ⇥ (�1)2 + C ⇥ (�1) +D

Data point (1,�1) : �1 = B ⇥ (1)2 + C ⇥ (1) +D

Data point (2,�
1

2
) : �

1

2
= B ⇥ (2)2 + C ⇥ (2) +D

Data point (3, 2) : 2 = B ⇥ (3)2 + C ⇥ (3) +D.

8

The entries of Ax are the y-coordinates of the graph of the line y = �3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax̂�B is the vertical distance of the graph from the data
points. y = f(x) = �3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared di↵erences

between the approximate y value given by the straight line and the desired value of
y given in B.

(1,0)

(0,6)

(2,0)

y = �3x+ 5

�1

�1

2

Q11. The general equation for a parabola is

y = Bx2 + Cx+D (1)

where B, C and D are constants to be determined and the data point is in the form
of (x, y).

Since the four data points, (�1, 1
2
), (1,�1), (2,�1

2
), (3, 2), are approximated by this

parabola equation, it is expected that the value of y from the parabola equation
should produce:

Data point (�1,
1

2
) :

1

2
= B ⇥ (�1)2 + C ⇥ (�1) +D

Data point (1,�1) : �1 = B ⇥ (1)2 + C ⇥ (1) +D

Data point (2,�
1

2
) : �

1

2
= B ⇥ (2)2 + C ⇥ (2) +D

Data point (3, 2) : 2 = B ⇥ (3)2 + C ⇥ (3) +D.

8

Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.

(�1,
1

2
), (1,�1), (2,�

1

2
), (3, 2).

What quantity is being minimized?

(-1,1
2
)

(1,-1)
(2,-1

2
)

(3,2)

Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.

x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4

Q13. Here are several data points


x

y


given as follows:


�4
�1


,


�3
0


,


�2
�1.5


,


�1
0.5


,


0
1


,


1
�1


,


2

�0.5


,


3
2


,


4
�1


.

Find a function in the form

y = B + C cos(x) +D sin(x) + E cos(2x) + F sin(2x) +G cos(3x) +H sin(3x)

that best approximates the data points in the sense of least-squares?

Q14. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:

3

Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.

(�1,
1

2
), (1,�1), (2,�

1

2
), (3, 2).

What quantity is being minimized?

(-1,1
2
)

(1,-1)
(2,-1

2
)

(3,2)

Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.

x y f(x, y)
1 0 0
0 1 1
�1 0 3
0 �1 4

Q13. Here are several data points


x

y


given as follows:


�4
�1


,


�3
0


,


�2
�1.5


,


�1
0.5


,


0
1


,


1
�1


,


2

�0.5


,


3
2


,


4
�1


.

Find a function in the form

y = B + C cos(x) +D sin(x) + E cos(2x) + F sin(2x) +G cos(3x) +H sin(3x)

that best approximates the data points in the sense of least-squares?

Q14. There are three categories of machines I, II and III installed in a factory. Machines
have di↵erent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:

3

The entries of Ax are the y-coordinates of the graph of the line y = �3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax̂�B is the vertical distance of the graph from the data
points. y = f(x) = �3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared di↵erences

between the approximate y value given by the straight line and the desired value of
y given in B.

(1,0)

(0,6)

(2,0)

y = �3x+ 5

�1

�1

2

Q11. The general equation for a parabola is

y = Bx2 + Cx+D

where B, C and D are constants to be determined and the data point is in the form
of (x, y).

Since the four data points, (�1, 1
2
), (1,�1), (2,�1

2
), (3, 2), are approximated by this

parabola equation, it is expected that the value of y from the parabola equation
should produce:

Data point (�1,
1

2
) :

1

2
= B ⇥ (�1)2 + C ⇥ (�1) +D

Data point (1,�1) : �1 = B ⇥ (1)2 + C ⇥ (1) +D

Data point (2,�
1

2
) : �

1

2
= B ⇥ (2)2 + C ⇥ (2) +D

Data point (3, 2) : 2 = B ⇥ (3)2 + C ⇥ (3) +D.

8

Rewrite the above equations in matrix form, we have

Ax = B

where

A =

2

66
4

1 �1 1
1 1 1

4 2 1

9 3 1

3

77
5 , x =

2

4
B
C
D

3

5 , B =

2

66
4

1
2
�1
�1

2
2

3

77
5 .

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

The least-squares solution is found as

x =

2

4
B
C
D

3

5 = (ATA)�1ATB =

2

6666666
4

53

88


379

440


41

44

3

7777777
5

.

The best-fit parabola is

y = Bx2 + Cx+D =
53

88
x2 �

379

440
x�

41

44
,

which is equivalent to

88y = 53×2 �
379

5
x� 82.

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

88y = 53×2 � 379
5
x� 82

What exactly the parabola y = f(x) is minimizing?

9

Rewrite the above equations in matrix form, we have

Ax = B

where

A =

2

66
4

1 �1 1
1 1 1

4 2 1

9 3 1

3

77
5 , x =

2

4
B
C
D

3

5 , B =

2

66
4

1
2
�1
�1

2
2

3

77
5 .

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

The least-squares solution is found as

x =

2

4
B
C
D

3

5 = (ATA)�1ATB =

2

6666666
4

53

88


379

440


41

44

3

7777777
5

.

The best-fit parabola is

y = Bx2 + Cx+D =
53

88
x2 �

379

440
x�

41

44
,

which is equivalent to

88y = 53×2 �
379

5
x� 82.

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

88y = 53×2 � 379
5
x� 82

What exactly the parabola y = f(x) is minimizing?

9

The least-squares solution x minimizes kAx� bk22. In this case, Ax is

Ax =

2

66
4

1 �1 1
1 1 1

4 2 1

9 3 1

3

77
5

2

6666666
4

53

88


379

440


41

44

3

7777777
5

=

2

6666
4

53
88
(�1)2 � 379

440
(�1)� 41

44

53
88
(1)

2 � 379
440

(1)� 41
44

53
88
(2)

2 � 379
440

(2)� 41
44

53
88
(3)

2 � 379
440

(3)� 41
44

3

7777
5
=

2

6666
4

f(�1)
f(1)

f(2)

f(3)

3

7777
5
=

2

6666
4

117
220

�131
110

� 27
110

419
220

3

7777
5
.

The entries of Ax are the y-coordinates of the parabola 88y = 53×2� 379
5
x�82 with

x chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax�B is the vertical distance of the graph from the data
points. 88y = 53×2� 379

5
x�82 minimizes the sum of the squares of the these vertical

distances, i.e., that is to minimise the sum of the squared di↵erences between the

approximate y value and the desired value of y given in B.

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

88y = 53×2 � 379
5
x� 82

7
220

� 21
110

�14
55

21
220

Q12. The general equation for a linear function with two variables is

f(x, y) = Bx+ Cy +D. (1)

where B, C and D are constants to be determined.

If we feed the provided data into the general equation given above:

Data point (1, 0) : B ⇥ 1 + C ⇥ 0 +D = 0
Data point (0, 1) : B ⇥ 0 + C ⇥ 1 +D = 1

Data point (�1, 0) : B ⇥�1 + C ⇥ 0 +D = 3
Data point (0,�1) : B ⇥ 0 + C ⇥�1 +D = 4.

Rewrite the above equations in matrix form, we have

Ax = B

where

A =

2

66
4

1 0 1

0 1 1

�1 0 1
0 �1 1

3

77
5 , x =

2

4
B
C
D

3

5 , B =

2

66
4

0

1

3

4

3

77
5 .

10

Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.

(�1,
1

2
), (1,�1), (2,�

1

2
), (3, 2).

What quantity is being minimized?

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.

x y f(x, y)
1 0 0

0 1 1

�1 0 3
0 �1 4

Q13. There are three categories of machines I, II and III installed in a factory. Machines

have di↵erent requirements on operating time. Machines I and II can be operated

for most 12 hours whereas machine III needs to be operated for at least 5 hours

per day. This factory produces two items M and N. Both require the use of all the

three machines. The operating time of producing 1 unit of each item on the three

machines are presented in Table 1 as follows:

Items Machine I Machine II Machine III

M 1 2 1

N 2 1 1.25

Table 1: Operating time of producing one unit of each item on each machine.

The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise

the profit of the factory? What would be the maximum profit in this case?

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

3

The least-squares solution x minimizes kAx� bk22. In this case, Ax is

Ax =

2

66
4

1 �1 1
1 1 1

4 2 1

9 3 1

3

77
5

2

6666666
4

53

88


379

440


41

44

3

7777777
5

=

2

6666
4

53
88
(�1)2 � 379

440
(�1)� 41

44

53
88
(1)

2 � 379
440

(1)� 41
44

53
88
(2)

2 � 379
440

(2)� 41
44

53
88
(3)

2 � 379
440

(3)� 41
44

3

7777
5
=

2

6666
4

f(�1)
f(1)

f(2)

f(3)

3

7777
5
=

2

6666
4

117
220

�131
110

� 27
110

419
220

3

7777
5
.

The entries of Ax are the y-coordinates of the parabola 88y = 53×2� 379
5
x�82 with

x chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. Ax�B is the vertical distance of the graph from the data
points. 88y = 53×2� 379

5
x�82 minimizes the sum of the squares of the these vertical

distances, i.e., that is to minimise the sum of the squared di↵erences between the

approximate y value and the desired value of y given in B.

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

88y = 53×2 � 379
5
x� 82

7
220

� 21
110

�14
55

21
220

Q12. The general equation for a linear function with two variables is

f(x, y) = Bx+ Cy +D.

where B, C and D are constants to be determined.

If we feed the provided data into the general equation given above:

Data point (1, 0) : B ⇥ 1 + C ⇥ 0 +D = 0
Data point (0, 1) : B ⇥ 0 + C ⇥ 1 +D = 1

Data point (�1, 0) : B ⇥�1 + C ⇥ 0 +D = 3
Data point (0,�1) : B ⇥ 0 + C ⇥�1 +D = 4.

Rewrite the above equations in matrix form, we have

Ax = B

where

A =

2

66
4

1 0 1

0 1 1

�1 0 1
0 �1 1

3

77
5 , x =

2

4
B
C
D

3

5 , B =

2

66
4

0

1

3

4

3

77
5 .

10

Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.

(�1,
1

2
), (1,�1), (2,�

1

2
), (3, 2).

What quantity is being minimized?

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.

x y f(x, y)
1 0 0

0 1 1

�1 0 3
0 �1 4

Q13. There are three categories of machines I, II and III installed in a factory. Machines

have di↵erent requirements on operating time. Machines I and II can be operated

for most 12 hours whereas machine III needs to be operated for at least 5 hours

per day. This factory produces two items M and N. Both require the use of all the

three machines. The operating time of producing 1 unit of each item on the three

machines are presented in Table 1 as follows:

Items Machine I Machine II Machine III

M 1 2 1

N 2 1 1.25

Table 1: Operating time of producing one unit of each item on each machine.

The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise

the profit of the factory? What would be the maximum profit in this case?

3

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as

x =

2

6666666
4

B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3

3

7777777
5

=

2

666666
4


3

2


3

2

2

3

777777
5
.

Remark: The solution is the same as x = (ATA)�1ATB.

So the linear equation that approximates the data in the sense of least-squares is

f(x, y) = �
3

2
x�

3

2
y + 2.

Let’s move one step further to investigate the quantity that is being minimized.

Function f(x, y) forms a graph of surface as shown below. The vector Ax are:

11

Theaboveisformulatedasaleast-squaresproblem,i.e.,
min
x
||Ax−B||
2
2
.
Itisnoticedthatthecolumnsa
1
,a
2
,a
3
ofmatrixAareorthogonal.Therefore,the
solutioncanthusbecalculatedas
x=
2
6
6
6
6
6
6
6
4
B·a
1
a
1
·a
1
B·a
2
a
2
·a
2
B·a
3
a
3
·a
3
3
7
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4

3
2

3
2
2
3
7
7
7
7
7
7
5
.
Remark:Thesolutionisthesameasx=(A
T
A)
−1
A
T
B.
Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresis
f(x,y)=−
3
2
x−
3
2
y+2.
Let’smoveonestepfurthertoinvestigatethequantitythatisbeingminimized.
Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:
11

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as

x =

2

6666666
4

B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3

3

7777777
5

=

2

666666
4


3

2


3

2

2

3

777777
5
.

Remark: The solution is the same as x = (ATA)�1ATB.

So the linear equation that approximates the data in the sense of least-squares is

f(x, y) = �
3

2
x�

3

2
y + 2.

Let’s move one step further to investigate the quantity that is being minimized.

Function f(x, y) forms a graph of surface as shown below. The vector Ax are:

11

Theaboveisformulatedasaleast-squaresproblem,i.e.,minx||Ax−B||22.Itisnoticedthatthecolumnsa1,a2,a3ofmatrixAareorthogonal.Therefore,thesolutioncanthusbecalculatedasx=266666664B·a1a1·a1B·a2a2·a2B·a3a3·a3377777775=26666664−32−32237777775.Remark:Thesolutionisthesameasx=(ATA)−1ATB.Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresisf(x,y)=−32x−32y+2.Let’smoveonestepfurthertoinvestigatethequantitythatisbeingminimized.Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:11

Ax =

2

66
4

1 0 1

0 1 1

�1 0 1
0 �1 1

3

77
5

2

666666
4


3

2


3

2

2

3

777777
5
==

2

66666666
4

�3
2
⇥ 1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥ 1 + 2

�3
2
⇥�1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥�1 + 2

3

77777777
5

=

2

66
4

f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)

3

77
5 =

2

666666666666
4

1

2

1

2

7

2

7

2

3

777777777777
5

.

The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares

method minimizes the sum of these vertical distances, i.e., that is to minimise the

sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.

Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.

The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:

max
x,y

Z = 600x+ 400y

subject to

x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II)

x+
5

4
y � 5 (constraint on Machine III)

x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

12

The above is formulated as a least-squares problem, i.e.,

min
x

||Ax�B||22.

It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as

x =

2

6666666
4

B · a1
a1 · a1
B · a2
a2 · a2
B · a3
a3 · a3

3

7777777
5

=

2

666666
4


3

2


3

2

2

3

777777
5
.

Remark: The solution is the same as x = (ATA)�1ATB.

So the linear equation that approximates the data in the sense of least-squares is

f(x, y) = �
3

2
x�

3

2
y + 2.

Let’s move one step further to investigate the quantity that is being minimized.

Function f(x, y) forms a graph of surface as shown below. The vector Ax are:

11

Theaboveisformulatedasaleast-squaresproblem,i.e.,minx||Ax−B||22.Itisnoticedthatthecolumnsa1,a2,a3ofmatrixAareorthogonal.Therefore,thesolutioncanthusbecalculatedasx=266666664B·a1a1·a1B·a2a2·a2B·a3a3·a3377777775=26666664−32−32237777775.Remark:Thesolutionisthesameasx=(ATA)−1ATB.Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresisf(x,y)=−32x−32y+2.Let’smoveonestepfurthertoinvestigatethequantitythatisbeingminimized.
Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:
11

Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.

(�1,
1

2
), (1,�1), (2,�

1

2
), (3, 2).

What quantity is being minimized?

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.

x y f(x, y)
1 0 0

0 1 1

�1 0 3
0 �1 4

Q13. There are three categories of machines I, II and III installed in a factory. Machines

have di↵erent requirements on operating time. Machines I and II can be operated

for most 12 hours whereas machine III needs to be operated for at least 5 hours

per day. This factory produces two items M and N. Both require the use of all the

three machines. The operating time of producing 1 unit of each item on the three

machines are presented in Table 1 as follows:

Items Machine I Machine II Machine III

M 1 2 1

N 2 1 1.25

Table 1: Operating time of producing one unit of each item on each machine.

The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise

the profit of the factory? What would be the maximum profit in this case?

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

3

Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.

(�1,
1

2
), (1,�1), (2,�

1

2
), (3, 2).

What quantity is being minimized?

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.

x y f(x, y)
1 0 0

0 1 1

�1 0 3
0 �1 4

Q13. There are three categories of machines I, II and III installed in a factory. Machines

have di↵erent requirements on operating time. Machines I and II can be operated

for most 12 hours whereas machine III needs to be operated for at least 5 hours

per day. This factory produces two items M and N. Both require the use of all the

three machines. The operating time of producing 1 unit of each item on the three

machines are presented in Table 1 as follows:

Items Machine I Machine II Machine III

M 1 2 1

N 2 1 1.25

Table 1: Operating time of producing one unit of each item on each machine.

The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise

the profit of the factory? What would be the maximum profit in this case?

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

3

Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares.

(�1,
1

2
), (1,�1), (2,�

1

2
), (3, 2).

What quantity is being minimized?

(-1,
1
2
)

(1,-1)

(2,-
1
2
)

(3,2)

Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized.

x y f(x, y)
1 0 0

0 1 1

�1 0 3
0 �1 4

Q13. There are three categories of machines I, II and III installed in a factory. Machines

have di↵erent requirements on operating time. Machines I and II can be operated

for most 12 hours whereas machine III needs to be operated for at least 5 hours

per day. This factory produces two items M and N. Both require the use of all the

three machines. The operating time of producing 1 unit of each item on the three

machines are presented in Table 1 as follows:

Items Machine I Machine II Machine III

M 1 2 1

N 2 1 1.25

Table 1: Operating time of producing one unit of each item on each machine.

The profit of item M and N are £600 and £400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise

the profit of the factory? What would be the maximum profit in this case?

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

3

Ax =

2

66
4

1 0 1

0 1 1

�1 0 1
0 �1 1

3

77
5

2

666666
4


3

2


3

2

2

3

777777
5
==

2

66666666
4

�3
2
⇥ 1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥ 1 + 2

�3
2
⇥�1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥�1 + 2

3

77777777
5

=

2

66
4

f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)

3

77
5 =

2

666666666666
4

1

2

1

2

7

2

7

2

3

777777777777
5

.

The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares

method minimizes the sum of these vertical distances, i.e., that is to minimise the

sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.

Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.

The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:

max
x,y

Z = 600x+ 400y

subject to

x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II)

x+
5

4
y � 5 (constraint on Machine III)

x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

12

Ax =

2

66
4

1 0 1

0 1 1

�1 0 1
0 �1 1

3

77
5

2

666666
4


3

2


3

2

2

3

777777
5
==

2

66666666
4

�3
2
⇥ 1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥ 1 + 2

�3
2
⇥�1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥�1 + 2

3

77777777
5

=

2

66
4

f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)

3

77
5 =

2

666666666666
4

1

2

1

2

7

2

7

2

3

777777777777
5

.

The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares

method minimizes the sum of these vertical distances, i.e., that is to minimise the

sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.

Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.

The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:

max
x,y

Z = 600x+ 400y

subject to

x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II)

x+
5

4
y � 5 (constraint on Machine III)

x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

12

Ax =

2

66
4

1 0 1

0 1 1

�1 0 1
0 �1 1

3

77
5

2

666666
4


3

2


3

2

2

3

777777
5
==

2

66666666
4

�3
2
⇥ 1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥ 1 + 2

�3
2
⇥�1� 3

2
⇥ 0 + 2

�3
2
⇥ 0� 3

2
⇥�1 + 2

3

77777777
5

=

2

66
4

f(1, 0)
f(0, 1)
f(�1, 0)
f(0,�1)

3

77
5 =

2

666666666666
4

1

2

1

2

7

2

7

2

3

777777777777
5

.

The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The di↵erence Ax�B refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares

method minimizes the sum of these vertical distances, i.e., that is to minimise the

sum of the squared di↵erences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B.

Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production.

The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem:

max
x,y

Z = 600x+ 400y

subject to

x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II)

x+
5

4
y � 5 (constraint on Machine III)

x � 0 (quantiy of item M is non-negative)
y � 0 (quantiy of item N is non-negative)

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

12

x – axis
-4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4

y – axis

1 4

1 3

1 2

1 1

1 0

9

8

7

6

5

4

3

2

1

-1

-2

-3

-4

www.softschools.com

x – axis
-4-3-2-11234567891011121314
y – axis
14
13
12
11
10
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com

The feasible regions is the shaded area and A, B, C, D, E are corner-point feasible

solutions (CPF solutions). Evaluate Z = 600x+ 400y at these corner points.

Corner point (x, y) Z = 600x+ 400y
A: (5,0) 3000

B: (6,0) 3600

C: (4,4) 4000 (maximum)
D: (0,6) 2400

E: (0,4) 1600

Therefore, producing 4 units of each item (x = 4 and y = 4) can produce the
maximum profit of £4000.

Q14. Denote x and y as the units of commodity transported from factory at P to the
depots at A and B, respectively.

As all produced commodities from both factories will be transported and factory at

P can produce 8 units, (8� x� y) units will be transported from P to C.

As the number of units to be transported must be non-negative, we have the fol-

lowing constraints: x � 0, y � 0 and 8� x� y � 0.

Depot A requires 5 units of commodity weekly and x units have been provided by
factory P. Thus, the remaining (5 � x) units need to be transported to deport A
from the factory at Q.

13

ThefeasibleregionsistheshadedareaandA,B,C,D,Earecorner-pointfeasible
solutions(CPFsolutions).EvaluateZ=600x+400yatthesecornerpoints.
Cornerpoint(x,y)Z=600x+400y
A:(5,0) 3000
B:(6,0) 3600
C:(4,4)4000(maximum)
D:(0,6) 2400
E:(0,4) 1600
Therefore,producing4unitsofeachitem(x=4andy=4)canproducethe
maximumprofitof£4000.
Q14.DenotexandyastheunitsofcommoditytransportedfromfactoryatPtothe
depotsatAandB,respectively.
Asallproducedcommoditiesfrombothfactorieswillbetransportedandfactoryat
Pcanproduce8units,(8−x−y)unitswillbetransportedfromPtoC.
Asthenumberofunitstobetransportedmustbenon-negative,wehavethefol-
lowingconstraints:x≥0,y≥0and8−x−y≥0.
DepotArequires5unitsofcommodityweeklyandxunitshavebeenprovidedby
factoryP.Thus,theremaining(5−x)unitsneedtobetransportedtodeportA
fromthefactoryatQ.
13

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the

production capacity of the factories at P and Q are 8 and 6 units correspondingly.

The commodities produced at both factories will be all transported. The cost of

transportation varies from di↵erent factories to di↵erent depots where the details of

the cost of transportation per unit commodity are in Table 2.

From/To Cost of A Cost of B Cost of C

P 160 100 150

Q 100 120 100

Table 2: Cost of transportation per unit commodity.

Find the units of commodity from factories to depots so the cost of transportation

is minimum? What will be the minimum cost of transportation?

Q15. A factory wishes to produce a new type of food which has two ingredients denoted

as Ingredient I and Ingredient II. This new type of food need to contains vitamin A

and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2

units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg

of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient

II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?

Q16. Explain how “Recursive least-squares” works.

4

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the

production capacity of the factories at P and Q are 8 and 6 units correspondingly.

The commodities produced at both factories will be all transported. The cost of

transportation varies from di↵erent factories to di↵erent depots where the details of

the cost of transportation per unit commodity are in Table 2.

From/To Cost of A Cost of B Cost of C

P 160 100 150

Q 100 120 100

Table 2: Cost of transportation per unit commodity.

Find the units of commodity from factories to depots so the cost of transportation

is minimum? What will be the minimum cost of transportation?

Q15. A factory wishes to produce a new type of food which has two ingredients denoted

as Ingredient I and Ingredient II. This new type of food need to contains vitamin A

and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2

units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg

of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient

II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?

Q16. Explain how “Recursive least-squares” works.

4

The feasible regions is the shaded area and A, B, C, D, E are corner-point feasible

solutions (CPF solutions). Evaluate Z = 600x+ 400y at these corner points.

Corner point (x, y) Z = 600x+ 400y
A: (5,0) 3000

B: (6,0) 3600

C: (4,4) 4000 (maximum)
D: (0,6) 2400

E: (0,4) 1600

Therefore, producing 4 units of each item (x = 4 and y = 4) can produce the
maximum profit of £4000.

Q14. Denote x and y as the units of commodity transported from factory at P to the
depots at A and B, respectively.

As all produced commodities from both factories will be transported and factory at

P can produce 8 units, (8� x� y) units will be transported from P to C.

As the number of units to be transported must be non-negative, we have the fol-

lowing constraints: x � 0, y � 0 and 8� x� y � 0.

Depot A requires 5 units of commodity weekly and x units have been provided by
factory P. Thus, the remaining (5 � x) units need to be transported to deport A
from the factory at Q.

13

ThefeasibleregionsistheshadedareaandA,B,C,D,Earecorner-pointfeasiblesolutions(CPFsolutions).EvaluateZ=600x+400yatthesecornerpoints.Cornerpoint(x,y)Z=600x+400yA:(5,0) 3000B:(6,0) 3600C:(4,4)4000(maximum)D:(0,6) 2400E:(0,4) 1600Therefore,producing4unitsofeachitem(x=4andy=4)canproducethemaximumprofitof£4000.Q14.DenotexandyastheunitsofcommoditytransportedfromfactoryatPtothe
depotsatAandB,respectively.
Asallproducedcommoditiesfrombothfactorieswillbetransportedandfactoryat
Pcanproduce8units,(8−x−y)unitswillbetransportedfromPtoC.
Asthenumberofunitstobetransportedmustbenon-negative,wehavethefol-
lowingconstraints:x≥0,y≥0and8−x−y≥0.
DepotArequires5unitsofcommodityweeklyandxunitshavebeenprovidedby
factoryP.Thus,theremaining(5−x)unitsneedtobetransportedtodeportA
fromthefactoryatQ.
13

Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.

(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can

produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.

As the number of units to be transported must be non-negative, we have another

three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.

The total transportation cost Z in terms of x and y is defined as

Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).

The problem of minimising the cost of transport subject to constraints can be de-

scribed as follow:

min
x,y

Z = 10(x� 7y + 190)

subject to

x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative

x+ y  8 (8� x� y) � 0 units to be transported to depart C from factor at P
x  5 (5� x) � 0 units to be transported to deport A from factory at Q
y  5 (5� y) � 0 units to be transported to deport B from factory at Q

x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

14

Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.

(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can

produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.

As the number of units to be transported must be non-negative, we have another

three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.

The total transportation cost Z in terms of x and y is defined as

Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).

The problem of minimising the cost of transport subject to constraints can be de-

scribed as follow:

min
x,y

Z = 10(x� 7y + 190)

subject to

x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative

x+ y  8 (8� x� y) � 0 units to be transported to depart C from factor at P
x  5 (5� x) � 0 units to be transported to deport A from factory at Q
y  5 (5� y) � 0 units to be transported to deport B from factory at Q

x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

14

x – axis
-4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4

y – axis

1 4

1 3

1 2

1 1

1 0

9

8

7

6

5

4

3

2

1

-1

-2

-3

-4

www.softschools.com

x – axis
-4-3-2-11234567891011121314
y – axis
14
13
12
11
10
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com

Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.

(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can

produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.

As the number of units to be transported must be non-negative, we have another

three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.

The total transportation cost Z in terms of x and y is defined as

Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).

The problem of minimising the cost of transport subject to constraints can be de-

scribed as follow:

min
x,y

Z = 10(x� 7y + 190)

subject to

x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative

x+ y  8 (8� x� y) � 0 units to be transported to depart C from factor at P
x  5 (5� x) � 0 units to be transported to deport A from factory at Q
y  5 (5� y) � 0 units to be transported to deport B from factory at Q

x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

14

/

The shaded area represents the feasible region and A(0, 4), B(0, 5), C(3, 5), D(5, 3),
E(5, 0) and F(4, 0) are corner-point feasible solutions (CPF solutions). The evalua-
tion of Z at corner points are shown as below. It could be noticed that point (0, 5)
provides the minimum value 1550 of Z.

Corner point (x, y) Z = 10(x� 7y + 190)
A: (0,4) 1620

B: (0,5) 1550 (minimum)
C: (3,5) 1580

D: (5,3) 1740

E: (5,0) 1950

F: (4,0) 1940

Therefore, delivering 0, 5 and 3 units from the factory at P and 5, 0 and 1 units

from the factory at Q to the depots at A, B and C, respectively can minimize the

cost of transportation. The minimum cost of transportation is 1550.

Q15. Denote the decision variables x and y as the weight of Ingredient I and Ingredient
II, respectively, in the new type of food.

As the weights are non-negative, we have the constraints x � 0 and y � 0.

15

/TheshadedarearepresentsthefeasibleregionandA(0,4),B(0,5),C(3,5),D(5,3),
E(5,0)andF(4,0)arecorner-pointfeasiblesolutions(CPFsolutions).Theevalua-
tionofZatcornerpointsareshownasbelow.Itcouldbenoticedthatpoint(0,5)
providestheminimumvalue1550ofZ.
Cornerpoint(x,y)Z=10(x−7y+190)
A:(0,4) 1620
B:(0,5)1550(minimum)
C:(3,5) 1580
D:(5,3) 1740
E:(5,0) 1950
F:(4,0) 1940
Therefore,delivering0,5and3unitsfromthefactoryatPand5,0and1units
fromthefactoryatQtothedepotsatA,BandC,respectivelycanminimizethe
costoftransportation.Theminimumcostoftransportationis1550.
Q15.DenotethedecisionvariablesxandyastheweightofIngredientIandIngredient
II,respectively,inthenewtypeoffood.
Astheweightsarenon-negative,wehavetheconstraintsx≥0andy≥0.
15

Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5 � y) units need to be transported to deport B
from the factory at Q.

(5 � x) units and (5 � y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can

produce 6 units. Thus, the remaining 6� (5� x)� (5� y)) = x+ y � 4 units need
to be transported to deport C from the factory at Q.

As the number of units to be transported must be non-negative, we have another

three constraints: 5� x � 0, 5� y � 0 and x+ y � 4 � 0.

The total transportation cost Z in terms of x and y is defined as

Z = 160x+ 100y + 100(5� x) + 120(5� y) + 150(8� x� y) + 100(x+ y � 4)
= 10(x� 7y + 190).

The problem of minimising the cost of transport subject to constraints can be de-

scribed as follow:

min
x,y

Z = 10(x� 7y + 190)

subject to

x � 0 units to be transported to deport A from factor P is non-negative
y � 0 units to be transported to depart B froma factor P is non-negative

x+ y  8 (8� x� y) � 0 units to be transported to depart C from factor at P
x  5 (5� x) � 0 units to be transported to deport A from factory at Q
y  5 (5� y) � 0 units to be transported to deport B from factory at Q

x+ y � 4 x+ y � 4 � 0 units to be transported to deport C from factory at Q

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

14

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the

production capacity of the factories at P and Q are 8 and 6 units correspondingly.

The commodities produced at both factories will be all transported. The cost of

transportation varies from di↵erent factories to di↵erent depots where the details of

the cost of transportation per unit commodity are in Table 2.

From/To Cost of A Cost of B Cost of C

P 160 100 150

Q 100 120 100

Table 2: Cost of transportation per unit commodity.

Find the units of commodity from factories to depots so the cost of transportation

is minimum? What will be the minimum cost of transportation?

Q15. A factory wishes to produce a new type of food which has two ingredients denoted

as Ingredient I and Ingredient II. This new type of food need to contains vitamin A

and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2

units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg

of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient

II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?

Q16. Explain how “Recursive least-squares” works.

4

/

The shaded area represents the feasible region and A(0, 4), B(0, 5), C(3, 5), D(5, 3),
E(5, 0) and F(4, 0) are corner-point feasible solutions (CPF solutions). The evalua-
tion of Z at corner points are shown as below. It could be noticed that point (0, 5)
provides the minimum value 1550 of Z.

Corner point (x, y) Z = 10(x� 7y + 190)
A: (0,4) 1620

B: (0,5) 1550 (minimum)
C: (3,5) 1580

D: (5,3) 1740

E: (5,0) 1950

F: (4,0) 1940

Therefore, delivering 0, 5 and 3 units from the factory at P and 5, 0 and 1 units

from the factory at Q to the depots at A, B and C, respectively can minimize the

cost of transportation. The minimum cost of transportation is 1550.

Q15. Denote the decision variables x and y as the weight of Ingredient I and Ingredient
II, respectively, in the new type of food.

As the weights are non-negative, we have the constraints x � 0 and y � 0.

15

/TheshadedarearepresentsthefeasibleregionandA(0,4),B(0,5),C(3,5),D(5,3),E(5,0)andF(4,0)arecorner-pointfeasiblesolutions(CPFsolutions).Theevalua-tionofZatcornerpointsareshownasbelow.Itcouldbenoticedthatpoint(0,5)providestheminimumvalue1550ofZ.Cornerpoint(x,y)Z=10(x−7y+190)A:(0,4) 1620B:(0,5)1550(minimum)C:(3,5) 1580D:(5,3) 1740E:(5,0) 1950F:(4,0) 1940Therefore,delivering0,5and3unitsfromthefactoryatPand5,0and1unitsfromthefactoryatQtothedepotsatA,BandC,respectivelycanminimizethecostoftransportation.Theminimumcostoftransportationis1550.Q15.DenotethedecisionvariablesxandyastheweightofIngredientIandIngredient
II,respectively,inthenewtypeoffood.
Astheweightsarenon-negative,wehavetheconstraintsx≥0andy≥0.
15

Considering the requirements for vitamins contained in the new type of food, we

have:

2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)

The cost of this new type of food is the total cost of the two ingredients where the

price £50/kg is for Ingredient I and £70/kg is for Ingredient II. The cost of the new
type of food is thus defined as Z = 50x+ 70y.

min
x,y

Z = 50x+ 70y

subject to

2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)

x � 0 (The weight of Integredient I is non-negative)
y � 0 (The weight of Integredient II is non-negative)

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

16

Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,we
have:
2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswherethe
price£50/kgisforIngredientIand£70/kgisforIngredientII.Thecostofthenew
typeoffoodisthusdefinedasZ=50x+70y.
min
x,y
Z=50x+70y
subjectto
2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
x≥0(TheweightofIntegredientIisnon-negative)
y≥0(TheweightofIntegredientIIisnon-negative)
Asthecostandallconstraintsarelinear,thisproblemisalinearprogramming
problemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraph
oftheconstraintsasfollows:
16

Considering the requirements for vitamins contained in the new type of food, we

have:

2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 8 or above)

The cost of this new type of food is the total cost of the two ingredients where the

price £50/kg is for Ingredient I and £70/kg is for Ingredient II. The cost of the new
type of food is thus defined as Z = 50x+ 70y.

min
x,y

Z = 50x+ 70y

subject to

2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y � 10 (The total weight of vitimin C from both integrident is 8 or above)

x � 0 (The weight of Integredient I is non-negative)
y � 0 (The weight of Integredient II is non-negative)

As the cost and all constraints are linear, this problem is a linear programming

problem which can be solved using linear programming technique. Draw the graph

of the constraints as follows:

16

Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,wehave:2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis8orabove)Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswheretheprice£50/kgisforIngredientIand£70/kgisforIngredientII.ThecostofthenewtypeoffoodisthusdefinedasZ=50x+70y.minx,yZ=50x+70ysubjectto2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis8orabove)x≥0(TheweightofIntegredientIisnon-negative)y≥0(TheweightofIntegredientIIisnon-negative)Asthecostandallconstraintsarelinear,thisproblemisalinearprogramming
problemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraph
oftheconstraintsasfollows:
16

As shown in the figure, the feasible region (faded area in deep color) is unbounded.

The corner-point feasible solutions (CPF solutions) are A(0, 8), B(2, 4) and C(10, 0)
and the evaluation of Z is shown in the table below.

Corner point (x, y) Z = 50x+ 70y
A: (0,8) 560

B: (2,4) 380 (minimum)
C: (10,0) 500

The minimum value 380 is obtained at B(2, 4). However, the feasible region is
unbounded, further steps need to be taken to make sure that there is no other point

in the open half plane resulting that the Z value is smaller than 380. Hence, we
draw the region of inequality which satisfied:

50x+ 70y < 380,which is equivalent to5x+ 7y < 38.The common area which satisfied both the inequality requirement (5x + 7y < 38)and the constraints (x � 0 and y � 0) are marked in light color. We could observethat there is no overlap between the feasible region and the graph of inequality.Therefore we could say, the point (2,4) gives the minimum value Z of 380. So thebest formula of this new type of food is to have 2kg of ingredient I and 4kg ofingredient II. The minimum cost would be £380.Q16. A least-squares problem is described as minxf(x) =|| Ax�B ||22.Its solution is given as x =�ATA��1ATB.Denote the i-th row of A as aTi and the i-th row of Bi as bi.The solution x can be represented in row form as follows:x =⇣ mXi=1aiaTi⌘�1 mXi=1biai.For example,A =26664a11 a12 · · · a1na21 a22 · · · a2n……. . ….am1 am2 · · · amn37775=26664aT1aT2…aTm37775,B =26664b1b2…bm37775.17Considering the requirements for vitamins contained in the new type of food, wehave:2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)The cost of this new type of food is the total cost of the two ingredients where theprice £50/kg is for Ingredient I and £70/kg is for Ingredient II. The cost of the newtype of food is thus defined as Z = 50x+ 70y.minx,yZ = 50x+ 70ysubject to2x+ y � 8 (The total weight of vitimin A from both integrident is 8 or above)x+ 2y � 10 (The total weight of vitimin C from both integrident is 10 or above)x � 0 (The weight of Integredient I is non-negative)y � 0 (The weight of Integredient II is non-negative)As the cost and all constraints are linear, this problem is a linear programmingproblem which can be solved using linear programming technique. Draw the graphof the constraints as follows:16Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,wehave:2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswheretheprice£50/kgisforIngredientIand£70/kgisforIngredientII.ThecostofthenewtypeoffoodisthusdefinedasZ=50x+70y.minx,yZ=50x+70ysubjectto2x+y≥8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y≥10(ThetotalweightofvitiminCfrombothintegridentis10orabove)x≥0(TheweightofIntegredientIisnon-negative)y≥0(TheweightofIntegredientIIisnon-negative)Asthecostandallconstraintsarelinear,thisproblemisalinearprogrammingproblemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraphoftheconstraintsasfollows:16A least-squares problem is described asminxf(x) =|| Ax�B ||22 .Its solution is given asx =�ATA��1ATB.Denote the i-th row of A as aTi and the i-th row of Bi as bi.The solution x can be represented in row form as follows:x =⇣ mXi=1aiaTi⌘�1 mXi=1biai.For example,A =26664a11 a12 · · · a1na21 a22 · · · a2n……. . ….am1 am2 · · · amn37775=26664aT1aT2…aTm37775, B =26664b1b2…bm37775. (null) (null) (null) (null)

ATA
(null) (null) (null) (null)

mX

i=1

aia
T
i

(null) (null) (null) (null)

2

666
4

a11 a12 · · · a1n
a21 a22 · · · a2n


. . .


am1 am2 · · · amn

3

777
5

(null) (null) (null) (null)

,
(null) (null) (null) (null)

a1
(null) (null) (null) (null)

aT1
(null) (null) (null) (null)

a1a
T
1 + a2a

T
2 + · · ·+ ama

T
m

(null) (null) (null) (null)

A =

2

666
4

a11 a12 · · · a1n
a21 a22 · · · a2n


. . .


am1 am2 · · · amn

3

777
5
=

2

666
4

aT1
aT2

aTm

3

777
5
, B =

2

666
4

b1
b2

bm

3

777
5
.

(null) (null) (null) (null)

h
am+1,1 am+1,2

… am+1,n

i
= aTm+1

(null) (null) (null) (null)

bm+1
(null) (null) (null) (null)

The new solution can be written as

xnew =
⇣ mX

i=1

aia
T
i + am+1a

T
m+1

⌘�1� mX

i=1

biai + bm+1am+1

.

(null) (null) (null) (null)

At the time that you have m samples, recall that the solution is:

x = P(m)�1q(m)

where

P(m) = ATA =
mX

i=1

aia
T
i

and

q(m) = ATB =
mX

i=1

biai.
(null) (null) (null) (null)

With the new sample aTm+1 and and bm+1, the solution is:

xnew = P(m+ 1)
�1q(m+ 1)

where

P(m+ 1) =
mX

i=1

aia
T
i + am+1a

T
m+1 = P(m) + am+1a

T
m+1

and

q(m+ 1) =
mX

i=1

biai + bm+1am+1.
(null) (null) (null) (null)

Rank one update formula:


P+ aaT

��1
= P�1 �

1

1 + aTP�1a


P�1a

��
P�1a

�T

Remark: Rank one update formula is valid when P = PT , and P and P+ aaT

are both invertible.
(null) (null) (null) (null)

Rank one update formula:


P+ aaT

��1
= P�1 �

1

1 + aTP�1a


P�1a

��
P�1a

�T

We have:

P(m+ 1) =
mX

i=1

aia
T
i + am+1a

T
m+1 = P(m) + am+1a

T
m+1

P(m+ 1)�1 ⌘

P(m) + am+1a

T
m+1

⌘�1
.

(null) (null) (null) (null)

Q14. There are two factories located in place P and Q. The commodity produced by these

two factories will be delivered to three depots situated at A, B and C. Each week,

the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the

production capacity of the factories at P and Q are 8 and 6 units correspondingly.

The commodities produced at both factories will be all transported. The cost of

transportation varies from di↵erent factories to di↵erent depots where the details of

the cost of transportation per unit commodity are in Table 2.

From/To Cost of A Cost of B Cost of C

P 160 100 150

Q 100 120 100

Table 2: Cost of transportation per unit commodity.

Find the units of commodity from factories to depots so the cost of transportation

is minimum? What will be the minimum cost of transportation?

Q15. A factory wishes to produce a new type of food which has two ingredients denoted

as Ingredient I and Ingredient II. This new type of food need to contains vitamin A

and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2

units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg

of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient

II are £50/kg and £70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum?

Q16. Explain how “Recursive least-squares” works.

Q17. Consider a least-squares problem as Ax = B, where

A =

2

4
5 6

5 8

4 5

3

5 ,B =

2

4
5

2

1

3

5

Given that there are five new samples coming in the the following order,

aT1 =

3 4


, b1 = 7;

aT2 =

10 10


, b2 = 5;

aT3 =

3 8


, b3 = 8;

aT4 =

8 8


, b4 = 2;

aT5 =

7 5


, b5 = 3.

calculate the initial solution, denoted as x0, for Ax = B and update the solution
using the recursive least-squares technique with the given new samples. Show the

working and the solution after adding the samples one by one according to the order

shown in the above.

4

/docProps/thumbnail.jpeg

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考计算机代写 flex algorithm AI Biologically Inspired Methods
30 $