Biologically Inspired Methods
Nature-Inspired Learning Algorithms (7CCSMBIM)
Tutorial 2: Solutions
1 
Q1. What are the advantages and disadvantages of gradient descent method?
2 
Q1. What are the advantages and disadvantages of gradient descent method?
3 
Q1. What are the advantages and disadvantages of gradient descent method?
4
https://www.cs.toronto.edu/~frossard/post/linear_regression/
4 
Q1. What are the advantages and disadvantages of gradient descent method?
5
Q1. What are the advantages and disadvantages of gradient descent method?
Advantages:
Simple
Robust and quick convergence
Tractable solution 
Disadvantages:
Works with gradient
Does not guarantee global minimum
Does not work well with discrete variables
Sensitive to initial guess
Trapped in local minimum
6 
Q2. Show how gradient descent method works using pseudo code
7 
8
oindent $mathbf{x}^* = left( left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] right)^{-1} left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} -4 \ 4.5 end{array} right]$ \~\~
ewline
%
textbf{Verification:} $mathbf{A}mathbf{x}^{*}-mathbf{B} = mathbf{0}$?\
$left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] mathbf{x}^*  left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} 0 \ 0 end{array} right]$ 
8
9
B
G
W
M 
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,b,MarkerSize, 12,linewidth, 1);
holdon;
plot(x,y,mo,MarkerSize, 12,linewidth, 3);
plot(0.5, 1,rx,MarkerSize, 12,linewidth, 3);
x = [0, 3], y = [0, 4];
plot(x,y,b-,linewidth, 1);
xlabel(itx);
ylabel(ity);
plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);
9 
10
B
G
W
M 
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,b,MarkerSize, 12,linewidth, 1);
holdon;
plot(x,y,mo,MarkerSize, 12,linewidth, 3);
plot(0.5, 1,rx,MarkerSize, 12,linewidth, 3);
x = [0, 3], y = [0, 4];
plot(x,y,b-,linewidth, 1);
xlabel(itx);
ylabel(ity);
plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);
10 
11
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,b,MarkerSize, 12,linewidth, 1);
holdon;
plot(x,y,mo,MarkerSize, 12,linewidth, 3);
plot(0.5, 1,rx,MarkerSize, 12,linewidth, 3);
x = [0, 3], y = [0, 4];
plot(x,y,b-,linewidth, 1);
xlabel(itx);
ylabel(ity);
plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);
11 
12
B
G
W
M
R 
x = [0, 1, 3], y = [0, 2, 4];
plot(x,y,b,MarkerSize, 12,linewidth, 1);
holdon;
plot(x,y,mo,MarkerSize, 12,linewidth, 3);
plot(0.5, 1,rx,MarkerSize, 12,linewidth, 3);
x = [0, 3], y = [0, 4];
plot(x,y,b-,linewidth, 1);
xlabel(itx);
ylabel(ity);
plot(-2, -2,gs,MarkerSize, 12,linewidth, 3);
begin{align*}
&mathbf{B}: f(0, 0) = 0\
&mathbf{G}: f(1, 2) = 6\
&mathbf{W}: f(3, 4) = 26~(text{before replacement})\
&mathbf{R}: f(-2, -2) = 8\
&mathbf{C}: f(-0.75, -0.5) = 1.0625
end{align*}
12 
13
k = 0:
k = 1:
k = 2: 
a
b
c
d
e
f 
oindent textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k  h_k triangledown f(mathbf{z}_k)$\~\
oindent 1$^{st}$ iteration: $mathbf{z}_{1} = left[ begin{array}{c} 7 \ 8 end{array} right]  0.1 left[ begin{array}{c} 2 times 7  1 \ 2 times 8 + 1 end{array} right] = left[ begin{array}{c} 5.7 \ 6.3 end{array} right]$; 
$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_k + 1 end{array} right]$
$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_2 + 1 end{array} right]$
oindent $f(x, y)$ = 72.7800.
\~\ 
oindent 2$^{nd}$ iteration: $mathbf{z}_{2} = left[ begin{array}{c} 5.7 \ 6.3 end{array} right] 0.1 left[ begin{array}{c} 2 times 5.7 1 \ 2 times 6.3 + 1 end{array} right] = left[ begin{array}{c} 4.66 \ 4.94 end{array} right]$;
oindent $f(x, y)$ = 46.3992. \~\
oindent 3$^{rd}$ iteration: $mathbf{z}_{3} = left[ begin{array}{c} 4.66\ 4.94 end{array} right]  0.1 left[ begin{array}{c} 2 times 4.66  1 \ 2 times 4.94 + 1 end{array} right] = left[ begin{array}{c} 3.828\ 3.852 end{array} right]$; 
oindent $f(x, y)$ = 29.5155.
frac{df(x,y)}{dx}\
frac{df(x,y)}{dy} 
13
14
k = 0:
k = 1:
k = 2: 
oindent textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k  h_k triangledown f(mathbf{z}_k)$\~\
oindent 1$^{st}$ iteration: $mathbf{z}_{1} = left[ begin{array}{c} 7 \ 8 end{array} right]  0.1 left[ begin{array}{c} 2 times 7  1 \ 2 times 8 + 1 end{array} right] = left[ begin{array}{c} 5.7 \ 6.3 end{array} right]$; 
$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_k + 1 end{array} right]$
$left[ begin{array}{c} x_{k+1} \ y_{k+1} end{array} right] = left[ begin{array}{c} x_k \ y_k end{array} right] h_k left[ begin{array}{c} 2 x_k 1 \ 2 y_2 + 1 end{array} right]$
oindent $f(x, y)$ = 72.7800.
\~\ 
oindent 2$^{nd}$ iteration: $mathbf{z}_{2} = left[ begin{array}{c} 5.7 \ 6.3 end{array} right] 0.1 left[ begin{array}{c} 2 times 5.7 1 \ 2 times 6.3 + 1 end{array} right] = left[ begin{array}{c} 4.66 \ 4.94 end{array} right]$;
oindent $f(x, y)$ = 46.3992. \~\
oindent 3$^{rd}$ iteration: $mathbf{z}_{3} = left[ begin{array}{c} 4.66\ 4.94 end{array} right]  0.1 left[ begin{array}{c} 2 times 4.66  1 \ 2 times 4.94 + 1 end{array} right] = left[ begin{array}{c} 3.828\ 3.852 end{array} right]$; 
oindent $f(x, y)$ = 29.5155.
frac{df(x,y)}{dx}\
frac{df(x,y)}{dy} 
14
15
Initial guess
k = 0
k = 1
k = 2 
[X,Y] = meshgrid(1:0.5:10,1:0.5:10);
Z = (X-1).*X + (Y+1).*Y;
mesh(X,Y,Z);
alpha0.5;
holdon;
plot3(7, 8, 114,ro,MarkerSize, 12,linewidth, 3);
plot3(5.7, 6.3, 72.78,rx,MarkerSize, 12,linewidth, 3);
plot3(4.66, 4.94, 46.3992,rx,MarkerSize, 12,linewidth, 3);
plot3(3.828, 3.852, 29.5155,rx,MarkerSize, 12,linewidth, 3);
xlabel(itx);
ylabel(ity);
zlabel({itf}(itx,ity)); 
15
16
oindent $mathbf{x}^* = left( left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] right)^{-1} left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} -4 \ 4.5 end{array} right]$ \~\~
ewline
%
textbf{Verification:} $mathbf{A}mathbf{x}^{*}-mathbf{B} = mathbf{0}$?\
$left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] mathbf{x}^*  left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} 0 \ 0 end{array} right]$ 
16
17
fbox{$h_k = frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)}$}
$f(x, y) = frac{1}{2} begin{bmatrix} x \ y end{bmatrix}^T begin{bmatrix} 2&0 \ 0&2 end{bmatrix} begin{bmatrix} x \ y end{bmatrix} begin{bmatrix} 1&-1 end{bmatrix} begin{bmatrix} x \ y end{bmatrix}$
textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k h_k triangledown f(mathbf{z}_k)$\~\
17
18
fbox{$h_k = frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)}$}
18 
19
textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)} triangledown f(mathbf{z}_k)$
textbf{Update rule: } $mathbf{z}_{k+1} = mathbf{z}_k h_k triangledown f(mathbf{z}_k)$\~\
Randomly pick an initial condition as $mathbf{z}_k = left[ begin{array}{c} 1 \ -2 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_k + frac{1}{2} triangledown f(mathbf{z}_k) \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 2x_k-1 \ 2y_k+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 2times 1-1 \ 2times-2+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 1 \ -3 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*} 
Run another iteration (e.g., $k+1 rightarrow k+2$) by taking $mathbf{z}_{k+1} = left[ begin{array}{c} 0.5 \ -0.5 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_{k+1} + frac{1}{2} triangledown f(mathbf{z}_{k+1}) \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 2x_{k+1}-1 \ 2y_{k+1}+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 2times 0.5-1 \ 2times-0.5+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 0 \ 0 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*} 
19
20
textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k frac{triangledown f(mathbf{z}_k)^T triangledown f(mathbf{z}_k)}{triangledown f(mathbf{z}_k)^T mathbf{Q} triangledown f(mathbf{z}_k)} triangledown f(mathbf{z}_k)$
Randomly pick an initial condition as $mathbf{z}_k = left[ begin{array}{c} 1 \ -2 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_k + frac{1}{2} triangledown f(mathbf{z}_k) \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 2x_k-1 \ 2y_k+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 2times 1-1 \ 2times-2+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 1 \ -3 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*} 
Run another iteration (e.g., $k+1 rightarrow k+2$) by taking $mathbf{z}_{k+1} = left[ begin{array}{c} 0.5 \ -0.5 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_{k+1} + frac{1}{2} triangledown f(mathbf{z}_{k+1}) \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 2x_{k+1}-1 \ 2y_{k+1}+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 2times 0.5-1 \ 2times-0.5+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 0 \ 0 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*} 
20
21
[a, b]
[c, d]
e
[f, g]
h
[i, j] 
Randomly pick an initial condition as $mathbf{z}_k = left[ begin{array}{c} 1 \ -2 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_k + frac{1}{2} triangledown f(mathbf{z}_k) \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 2x_k-1 \ 2y_k+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 2times 1-1 \ 2times-2+1 end{array} right] \
&= left[ begin{array}{c} 1 \ -2 end{array} right]  frac{1}{2} left[ begin{array}{c} 1 \ -3 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*} 
Run another iteration (e.g., $k+1 rightarrow k+2$) by taking $mathbf{z}_{k+1} = left[ begin{array}{c} 0.5 \ -0.5 end{array} right]$.
begin{align*}
mathbf{z}_{k+1} &= mathbf{z}_{k+1} + frac{1}{2} triangledown f(mathbf{z}_{k+1}) \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 2x_{k+1}-1 \ 2y_{k+1}+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 2times 0.5-1 \ 2times-0.5+1 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]  frac{1}{2} left[ begin{array}{c} 0 \ 0 end{array} right] \
&= left[ begin{array}{c} 0.5 \ -0.5 end{array} right]
end{align*} 
$mathbf{x}_{k+1} = left[ begin{array}{c} -1 \ -2 end{array} right] + left[ begin{array}{cc} 1 & 0 \ 0 & 1 end{array} right]left[ begin{array}{c} 0.5 \ 0.5 end{array} right]$
$mathbf{x}_{k+1} = left[ begin{array}{c} -0.5 \ -1.5 end{array} right] + left[ begin{array}{cc} 1 & 0 \ 0 & 1 end{array} right]left[ begin{array}{c} 0.5 \ 0.5 end{array} right]
21
22
Approximate
Decision variables
y = g(x_1, x_2)
hat{y} = w_1 x_1 + w_2 x_2 + b
Ideal case: $y = hat{y}$, i.e., $y hat{y} = 0$
22
23
oindent $mathbf{x}^* = left( left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] right)^{-1} left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right]^T left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} -4 \ 4.5 end{array} right]$ \~\~
ewline
%
textbf{Verification:} $mathbf{A}mathbf{x}^{*}-mathbf{B} = mathbf{0}$?\
$left[ begin{array}{cc} 1 & 2 \ 3 & 4 end{array} right] mathbf{x}^*  left[ begin{array}{c} 5 \ 6 end{array} right] = left[ begin{array}{c} 0 \ 0 end{array} right]$ 
23
24
Minimise MSE subject to w1, w2 and b
Simplified notations:
frac{(hat{y}_1 y_1)^2 + (hat{y}_2 y_2)^2 + cdots + (hat{y}_M y_M)^2 }{M}
y(M) leftarrow y_m, hat{y}(M) leftarrow hat{y}_m
24
25
= frac{(hat{y}_1 y_1)^2 + (hat{y}_2 y_2)^2 + cdots + (hat{y}_M y_M)^2 }{M}
= frac{big( (w_1x_1(1) +w_2x_2(1) + b) y_1 big)^2 + big((w_1x_1(2) +w_2x_2(2) + b) y_2 big)^2 + cdots + big((w_1x_1(M) +w_2x_2(M) + b) y_M big)^2 }{M}
min_{w_1, w_2, b} frac{1}{M} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)^2
y(M) leftarrow y_m, hat{y}(M) leftarrow hat{y}_m
Remark: The ideal set of $w_1$, $w_2$ and $b$ will lead to the cost $f(w_1, w_2, b) = 0$.
25
26
textbf{Update rule:} $mathbf{z}_{k+1} = mathbf{z}_k  h_k triangledown f(mathbf{z}_k)$ where $mathbf{z}_k = left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]$
begin{align*}
triangledown f(mathbf{z}_k) &= left[ begin{array}{c} frac{partial f(mathbf{z}_k)}{partial w_1 } \ \ frac{partial f(mathbf{z}_k)}{partial w_2} \ \ frac{partial f(mathbf{z}_k)}{partial b} end{array} right] = left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big) end{array} right]
end{align*} 
f(w_1, w_2, b) = f(mathbf{z}) = frac{1}{M} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b) y_i big)^2
26
27
textbf{Update rule:}\ begin{align*} mathbf{z}_{k+1} &= mathbf{z}_k  h_k triangledown f(mathbf{z}_k)\ Rightarrow left[ begin{array}{c} w_{1_{k+1}} \ w_{2_{k+1}} \ b_{k+1} end{array} right] &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]  h_k left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big) end{array} right] \ &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]  frac{2h_k}{M} left[ begin{array}{c} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_1(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_2(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big) end{array} right] end{align*}
27 
28
textbf{Update rule:}\ begin{align*} mathbf{z}_{k+1} &= mathbf{z}_k  h_k triangledown f(mathbf{z}_k)\ Rightarrow left[ begin{array}{c} w_{1_{k+1}} \ w_{2_{k+1}} \ b_{k+1} end{array} right] &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]  h_k left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big) end{array} right] \ &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]  frac{2h_k}{M} left[ begin{array}{c} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_1(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_2(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big) end{array} right] end{align*}
28 
29
textbf{Update rule:}\ begin{align*} mathbf{z}_{k+1} &= mathbf{z}_k  h_k triangledown f(mathbf{z}_k)\ Rightarrow left[ begin{array}{c} w_{1_{k+1}} \ w_{2_{k+1}} \ b_{k+1} end{array} right] &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]  h_k left[ begin{array}{c} frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_1(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_2(i) \\ frac{1}{M} displaystyle sum_{i=1}^M 2big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big) end{array} right] \ &= left[ begin{array}{c} w_{1_k} \ w_{2_k} \ b_k end{array} right]  frac{2h_k}{M} left[ begin{array}{c} displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_1(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big)x_2(i) \\ displaystyle sum_{i=1}^M big( (w_1 x_1(i) + w_2 x_2(i) + b)  y_i big) end{array} right] end{align*}
29 
30
Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $Rightarrow M = 3$.
$mathbf{z}_0 = mathbf{z}_0 = left[ begin{array}{c} 1 \ 1 \ 1 end{array} right] Rightarrow
w_1 = w_2 = b = 1$
begin{align*}
hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 times 1) + (1 times -2) + 1 = 0\
hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 times 2) + (1 times 4) + 1 = 7\
hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 times 3) + (1 times 0) + 1 = 4
end{align*} 
begin{align*}
MSE &= frac{1}{M} big( (hat{y}_1  y_1)^2 + (hat{y}_2  y_2)^2 + (hat{y}_3  y_3)^2 ) big)\
&= frac{1}{3} big( (0  3)^2 + (7  (-1))^2 + (4  5)^2 ) big)\
&= 24.6667
end{align*} 
30
31
Dateset: $(1, -2, 3)$, $(2, 4, -1)$, $(3, 0, 5)$ $Rightarrow M = 3$.
$mathbf{z}_0 = mathbf{z}_0 = left[ begin{array}{c} 1 \ 1 \ 1 end{array} right] Rightarrow
w_1 = w_2 = b = 1$
begin{align*}
hat{y}_1 &= w_1 x_1(1) + w_2 x_2(1) + b = (1 times 1) + (1 times -2) + 1 = 0\
hat{y}_2 &= w_1 x_1(2) + w_2 x_2(2) + b = (1 times 2) + (1 times 4) + 1 = 7\
hat{y}_3 &= w_1 x_1(3) + w_2 x_2(3) + b = (1 times 3) + (1 times 0) + 1 = 4
end{align*} 
begin{align*}
MSE &= frac{1}{M} big( (hat{y}_1  y_1)^2 + (hat{y}_2  y_2)^2 + (hat{y}_3  y_3)^2 ) big)\
&= frac{1}{3} big( (0  3)^2 + (7  (-1))^2 + (4  5)^2 ) big)\
&= 24.6667
end{align*} 
31
32
33
34
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
54
55
Ingredient 1
Ingredient 2
Others
Food Product
56
57
58
Q16. Explain how Recursive least-squares works. 
A least-squares problem is described as $displaystyle min_{mathbf{x}} f(mathbf{x}) = midmid mathbf{A}mathbf{x} mathbf{B} midmid_2^2$. Its solution is given as $mathbf{x} = big( mathbf{A}^Tmathbf{A} big)^{-1} mathbf{A}^T mathbf{B}$.\
Denote the $i$-th row of $mathbf{A}$ as $mathbf{a}_i^T$ and the $i$-th row of $mathbf{B}_i$ as $b_i$.\ The solution $mathbf{x}$ can be represented in row form as follows: $$mathbf{x} = Big( displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T Big)^{-1} sum_{i=1}^m b_i mathbf{a}_i.$$
For example, $$mathbf{A} = left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right] = left[ begin{array}{c} mathbf{a}_1^T \ mathbf{a}_2^T \ vdots \ mathbf{a}_m^T \ end{array} right],$$ $$mathbf{B} = left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right].$$
58 
59
mathbf{A}^Tmathbf{A}
displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T
$left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right]$
$left[ begin{array}{cccc} a_{11} & a_{21} & cdots & a_{m1} \ a_{12} & a_{22} & cdots & a_{m2} \ vdots & vdots & ddots & vdots \ a_{1n} & a_{2n} & cdots & a_{mn} end{array} right]$
left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right]
$mathbf{a}_1mathbf{a}_1^T + mathbf{a}_2mathbf{a}_2^T + cdots + mathbf{a}_mmathbf{a}_m^T$
59
60
What happen if a new sample is coming in? 
$$mathbf{A} = left[ begin{array}{cccc} a_{11} & a_{12} & cdots & a_{1n} \ a_{21} & a_{22} & cdots & a_{2n} \ vdots & vdots & ddots & vdots \ a_{m1} & a_{m2} & cdots & a_{mn} end{array} right] = left[ begin{array}{c} mathbf{a}_1^T \ mathbf{a}_2^T \ vdots \ mathbf{a}_m^T \ end{array} right], qquad mathbf{B} = left[ begin{array}{c} b_1 \ b_2 \ vdots \ b_m \ end{array} right].$$
$left[ begin{array}{cccc} a_{m+1,1} & a_{m+1,2} & vdots & a_{m+1,n} end{array} right] quad = quad mathbf{a}_{m+1}^T$
$b_{m+1}$
The new solution can be written as $$mathbf{x}_{new} = Big( displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T Big)^{-1} big( sum_{i=1}^m b_i mathbf{a}_i + b_{m+1} mathbf{a}_{m+1} big).$$
60
61
At the time that you have $m$ samples, recall that the solution is: $$mathbf{x} = mathbf{P}(m)^{-1} mathbf{q}(m)$$
where $$mathbf{P}(m) = mathbf{A}^Tmathbf{A} = displaystyle sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T $$ and $$mathbf{q}(m) = mathbf{A}^T mathbf{B} = sum_{i=1}^m b_i mathbf{a}_i.$$ 
With the new sample $mathbf{a}_{m+1}^T$ and and $b_{m+1}$, the solution is: $$mathbf{x}_{new} = mathbf{P}(m+1)^{-1} mathbf{q}(m+1)$$ where $$mathbf{P}(m+1) = sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T = mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T $$ and $$mathbf{q}(m+1) = sum_{i=1}^m b_i mathbf{a}_i + b_{m+1} mathbf{a}_{m+1}.$$
61 
62
When new sample comes in, do we need to compute the inverse of P(m+1)? 
Can we use the inverse of P(m) to obtain P(m+1)?
Any mathematical trick?
oindent textit{Rank one update formula:} $$big( mathbf{P} + mathbf{a}mathbf{a}^T big)^{-1} = mathbf{P}^{-1} frac{1}{1+mathbf{a}^T mathbf{P}^{-1} mathbf{a}} big( mathbf{P^{-1} mathbf{a}} big) big( mathbf{P^{-1} mathbf{a}} big)^T$$
oindenttextit{Remark:} Rank one update formula is valid when $mathbf{P} = mathbf{P}^T$, and $mathbf{P}$ and $mathbf{P} + mathbf{a}mathbf{a}^T$ are both invertible.
62 
63
oindent textit{Rank one update formula:} $$big( mathbf{P} + mathbf{a}mathbf{a}^T big)^{-1} = mathbf{P}^{-1} frac{1}{1+mathbf{a}^T mathbf{P}^{-1} mathbf{a}} big( mathbf{P}^{-1} mathbf{a} big) big( mathbf{P}^{-1} mathbf{a} big)^T$$
We have:
$$mathbf{P}(m+1) = sum_{i=1}^m mathbf{a}_i mathbf{a}_i^T + mathbf{a}_{m+1} mathbf{a}_{m+1}^T = mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T$$ 
$mathbf{P}(m+1)^{-1} equiv Big( mathbf{P}(m) + mathbf{a}_{m+1} mathbf{a}_{m+1}^T Big)^{-1}$.
63
64
65
66
67
68
69
70
Department of Informatics, Kings College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2
Q1. What are the advantages and disadvantages of gradient descent method?
Q2. Show how gradient descent method works using pseudo code.
Q3. Consider a least-squares problem, min
x
f(x) =|| AxB ||22 where A =
1 2
3 4
and
B =
5
6
. Find the optimal solution x.
Q4. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method. 
a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and
W.
b. Find the midpoint M and f(M).
c. Find the reflection R and f(R).
d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)
is performed in the pseudo code of Nelder-Mead downhill simplex method,
C =
W+M
2 
is used.
Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations 
with step size hk = 0.1 and initial guess (7,8).
Q6. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using the gradient descent method. 
a. Determine the optimal step size hk.
b. Show that it requires one iteration for the gradient descent method to converge
with the optimal step size hk obtained in Q6a.
Q7. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the 
worse solution) is 0.75 and the random number generator will generate a repeating
sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk 
are 0.5. Fill in the content of the following table.
k x
T
k f(xk) x 
T
k+1 f(xk+1) rand() xbest f(xbest) 
0 [1  2]
1 
2
3
4
5
1
0 0.5 1 1.5 2 2.5 3
x
0
0.5
1
1.5
2
2.5
3
3.5
4
y
00.5 11.5 22.5 3
x
0
0.5
1
1.5
2
2.5
3
3.5
4
y 
Department of Informatics, Kings College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2 (Suggested Solutions)
Q1. Answer can be found in lecture notes.
Q2. Answer can be found in lecture notes.
Q3. x
=
1 2
3 4
T 
1 2 
3 4
!1 
1 2 
3 4
T 
5 
6
=
4
4.5 
Verification:
1 2
3 4
x
5
6
=
0
0
Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4) 
b. M =
B+G
2 
=
(0,0)+(1,2)
2 
= (0.5, 1); f(0.5, 1) = 1.75.
c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.
d. As f(R) > f(G), Case (ii) is performed. 
As f(R) < f(W), W is replaced with R.C =W+M2=(2,2)+(0.5,1)2= (0.75,0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (0.75,0.5) and W: (1, 2)Q5. Let z =xy. Of(z) =2x 12y + 1. According to the update rule, zk+1 =zk hkOf(zk), we have1stiteration: z1 =78 0.12 7 12 8 + 1=5.76.3; f(x, y) = 72.7800.2nditeration: z2 =5.76.3 0.12 5.7 12 6.3 + 1=4.664.94; f(x, y) = 46.3992.3rditeration: z3 =4.664.94 0.12 4.66 12 4.94 + 1=3.8283.852; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x 12y + 1where zk =xkyk.f(z) =12zTQz bTz where Q =2 00 2and bT=1 1.1Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2 (Suggested Solutions)Q1. Answer can be found in lecture notes.Q2. Answer can be found in lecture notes.Q3. x= 1 23 4T 1 23 4!1 1 23 4T 56=44.5Verification:1 23 4x 56=00Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;f(1, 2) = (1 1)1 + (2 + 1)2 = 6;f(3, 4) = (3 1)3 + (4 + 1)4 = 26.B: (0, 0); G: (1, 2); W: (3, 4)b. M =B+G2=(0,0)+(1,2)2= (0.5, 1); f(0.5, 1) = 1.75.c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R.C =W+M2=(2,2)+(0.5,1)2= (0.75,0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (0.75,0.5) and W: (1, 2)Q5. Let z =xy. Of(z) =2x 12y + 1. According to the update rule, zk+1 =zk  hkOf(zk), we have1stiteration: z1 =78 0.12 7 12 8 + 1=5.76.3; f(x, y) = 72.7800.2nditeration: z2 =5.76.3 0.12 5.7 12 6.3 + 1=4.664.94; f(x, y) = 46.3992.3rditeration: z3 =4.664.94 0.12 4.66 12 4.94 + 1=3.8283.852; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x 12y + 1where zk =xkyk.f(z) =12zTQz bTz where Q =2 00 2and bT=1 1.1Traditional Numerical Methods: Nelder-Mead DownhillSimplex MethodInitialise start-ing pointsEvaluation ofstarting pointsDetermineB, G, and Wf (R) < f (G)?f (B) < f (R)?Case (i): Reflextion/ExpansionReplaceW with RReflectionCompute Ewith f (E)f (E) < f (B)?ReplaceW with EExpansionReplaceW with RReflectionf (R) < f (W)?Case (ii): Contraction/ShrinkingReplaceW with RCompute Cwith f (C)f (C) < f (W)?ReplaceW with CComputeS and f (S)ReplaceW with SReplaceG with MNoYesYesNo f (B)  f (R)YesNo f (E)  f (B)Note: The new W is used from this point onYesNo f (R)  f (W)YesContractionNo f (C)  f (W)ShrinkingFigure 6: Flowchart of Nelder-Mead Downhill Simplex Method.Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2017-18 24 / 68TraditionalNumericalMethods:Nelder-MeadDownhillSimplexMethodInitialisestart-ingpointsEvaluationofstartingpointsDetermineB,G,andWf(R)
As f(R) < f(W), W is replaced with R.C =W+M2=(2,2)+(0.5,1)2= (0.75,0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (0.75,0.5) and W: (1, 2)Q5. Let z =xy. Of(z) =2x 12y + 1. According to the update rule, zk+1 =zk hkOf(zk), we have1stiteration: z1 =78 0.12 7 12 8 + 1=5.76.3; f(x, y) = 72.7800.2nditeration: z2 =5.76.3 0.12 5.7 12 6.3 + 1=4.664.94; f(x, y) = 46.3992.3rditeration: z3 =4.664.94 0.12 4.66 12 4.94 + 1=3.8283.852; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x 12y + 1where zk =xkyk.f(z) =12zTQz bTz where Q =2 00 2and bT=1 1.1-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3x-2-101234y-2-1.5-1-0.500.511.522.53x-2-101234yDepartment of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2 (Suggested Solutions)Q1. Answer can be found in lecture notes.Q2. Answer can be found in lecture notes.Q3. x= 1 23 4T 1 23 4!1 1 23 4T 56=44.5Verification:1 23 4x 56=00Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;f(1, 2) = (1 1)1 + (2 + 1)2 = 6;f(3, 4) = (3 1)3 + (4 + 1)4 = 26.B: (0, 0); G: (1, 2); W: (3, 4)b. M =B+G2=(0,0)+(1,2)2= (0.5, 1); f(0.5, 1) = 1.75.c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R.C =W+M2=(2,2)+(0.5,1)2= (0.75,0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (0.75,0.5) and W: (1, 2)Q5. Let z =xy. Of(z) =2x 12y + 1. According to the update rule, zk+1 =zk hkOf(zk), we have1stiteration: z1 =78 0.12 7 12 8 + 1=5.76.3; f(x, y) = 72.7800.2nditeration: z2 =5.76.3 0.12 5.7 12 6.3 + 1=4.664.94; f(x, y) = 46.3992.3rditeration: z3 =4.664.94 0.12 4.66 12 4.94 + 1=3.8283.852; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x 12y + 1where zk =xkyk.f(z) =12zTQz bTz where Q =2 00 2and bT=1 1.1df(x, y)dxdf(x, y)dy010508 10100f(x,y)6 8150y6x42004220 001050810100f(x,y)68150y6x420042200Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2 (Suggested Solutions)Q1. Answer can be found in lecture notes.Q2. Answer can be found in lecture notes.Q3. x= 1 23 4T 1 23 4!1 1 23 4T 56=44.5Verification:1 23 4x 56=00Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;f(1, 2) = (1 1)1 + (2 + 1)2 = 6;f(3, 4) = (3 1)3 + (4 + 1)4 = 26.B: (0, 0); G: (1, 2); W: (3, 4)b. M =B+G2=(0,0)+(1,2)2= (0.5, 1); f(0.5, 1) = 1.75.c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.d. As f(R) > f(G), Case (ii) is performed.
As f(R) < f(W), W is replaced with R.C =W+M2=(2,2)+(0.5,1)2= (0.75,0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (0.75,0.5) and W: (1, 2)Q5. Let z =xy. Of(z) =2x 12y + 1. According to the update rule, zk+1 =zk hkOf(zk), we have1stiteration: z1 =78 0.12 7 12 8 + 1=5.76.3; f(x, y) = 72.7800.2nditeration: z2 =5.76.3 0.12 5.7 12 6.3 + 1=4.664.94; f(x, y) = 46.3992.3rditeration: z3 =4.664.94 0.12 4.66 12 4.94 + 1=3.8283.852; f(x, y) = 29.5155.Q6. Update rule: Of(z) =2x 12y + 1where zk =xkyk.f(z) =12zTQz bTz where Q =2 00 2and bT=1 1.1f(x, y) = 12xyT 2 00 2 xy1 1 xy<latexit sha1_base64=”OCzHItBfbx0kYQOereEJCFZ7Mfw=”>
A
A 
A
C 
w
H 
i
c 
n
V 
F
d 
S
+ 
N
A 
F
J 
1
E 
X
b 
W
6 
a
9 
V
H 
X
y 
5
b 
t
7 
i
w 
l
i 
Q
I 
+
i 
K
I 
v
v 
i
o 
Y
F 
V
o 
u
m 
U
y 
v
W 
k
H 
J
5 
M
4 
M
5 
H
G 
k
D 
/
p 
i
+ 
y
/ 
2
U 
k
t 
s
t 
p
9 
E
A 
8
M 
H
M 
4
9 
9
2 
P
u 
j
T 
L
B 
t
f 
G
8 
P
4 
6
7 
s
L 
j
0 
Z
X 
l
l 
t
b 
G
2 
/
v 
X
b 
R
n 
N
z 
6
1 
q
n 
u
W 
L
Y 
Z
a 
l
I 
1
W 
1
E 
N
Q 
o
u 
s
W 
u
4 
E
X 
i
b 
K
a 
R
J 
J
P 
A
m 
u
j 
u
r 
4
z 
c
P 
q
D 
R
P 
5
Z 
U
p 
M
u 
w
n 
d
C 
R
5 
z
B 
k
1 
V
h 
o
0 
n
3 
f
j 
v
c 
k
v 
K
H 
7
C 
M
Y 
S
x 
o
q 
z
0 
q
z 
K
o 
I
I 
x
w 
x
G 
U
Z 
J
d 
Q
o 
P
q 
l
g 
A
m 
E
I 
B
Y 
Q
o 
h
6 
/
i 
7
6 
s
5 
W
9 
D
2 
a
q 
P
X 
D
t 
5
a 
P
1 
Q
P 
9
u 
d
s 
f
n 
v
f 
/
0 
S
l 
3
c 
a
g 
2
f 
I
6 
3
h 
Q
w 
T
/ 
w
Z 
a
Z 
E
Z 
L
g 
b
N 
p
3 
C
Y 
s
j 
x
B 
a
Z 
i
g 
W
v 
e
O 
M
t 
M
v 
q
T 
K
c 
C
a 
w
a 
Y
a 
4
x 
o
+ 
y
O 
j
r 
B
n 
q
a 
Q
J 
6
n 
4
5 
3
X 
8
F 
P
6 
w
y 
h
D 
h
V 
9
k 
k
D 
U
/ 
X
f 
j
J 
I
m 
W
h 
d
J 
Z
J 
1
2 
w
L 
F
+ 
H
6 
v
F 
/
8 
V
6 
u
Y 
m
P 
+
i 
W
X 
W
W 
5
Q 
s
p 
d
G 
c
S 
7
A 
p
F 
A
f 
E
4 
Z
c 
I
T 
O
i 
s
I 
Q
y 
x
e 
2
s 
w
M 
b
U 
X
t 
H
Y 
k
z 
f
q 
J
f 
j
v 
v
z 
x
P 
r
o 
O
O 
f
9 
A
5 
u
A 
x
a 
J
6 
e
z 
d
a 
y
Q 
H
f 
K
d 
7
B 
G
f 
H
J 
I
T 
c
k 
4
u 
S
J 
c
w 
5
9 
h
h 
j
n 
A
S 
9
9 
Q
d 
u
6 
l
7 
/
2 
J
1 
n
V 
n
O 
N
n 
k
D 
9
/ 
E
v 
j
G 
T
Z 
f
w 
=
= 
</latexit>
Department of Informatics, Kings College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM) 
Tutorial 2 (Suggested Solutions)
Q1. Answer can be found in lecture notes.
Q2. Answer can be found in lecture notes.
Q3. x =
1 2
3 4 
T 
1 2
3 4 
!1 
1 2
3 4 
T 
5
6 
=
4
4.5 
Verification:
1 2
3 4 
x
5
6 
=
0
0 
Q4. a. f(0, 0) = (0 1)0 + (0 + 1)0 = 0;
f(1, 2) = (1 1)1 + (2 + 1)2 = 6;
f(3, 4) = (3 1)3 + (4 + 1)4 = 26.
B: (0, 0); G: (1, 2); W: (3, 4) 
b. M = B+G
2 
= (0,0)+(1,2)
2 
= (0.5, 1); f(0.5, 1) = 1.75.
c. R = 2MW = 2(0.5, 1) (3, 4) = (2,2); f(2,2) = 8.
d. As f(R) > f(G), Case (ii) is performed. 
As f(R) < f(W), W is replaced with R.C = W+M2= (2,2)+(0.5,1)2= (0.75,0.5) and f(C) = 1.0625.As f(C) < f(W), W is replaced with C.The new 3 vertices are B: (0, 0), G: (0.75,0.5) and W: (1, 2)Q5. Let z =xy. Of(z) =2x 12y + 1. According to the update rule, zk+1 =zk hkOf(zk), we have1st iteration: z1 =78 0.12 7 12 8 + 1=5.76.3; f(x, y) = 72.7800.2nd iteration: z2 =5.76.3 0.12 5.7 12 6.3 + 1=4.664.94; f(x, y) = 46.3992.3rd iteration: z3 =4.664.94 0.12 4.66 12 4.94 + 1=3.8283.852; f(x, y) = 29.5155.Q6. a. Update rule: Of(z) =2x 12y + 1where zk =xkyk.f(z) = 12zTQz bTz where Q =2 00 2and bT =1 1.1hk =Of(zk)TOf(zk)Of(zk)TQOf(zk)where Of(zk) =2xk 12yk + 1.hk =Of(zk)TOf(zk)Of(zk)TQOf(zk)=2xk 1 2yk + 1 2xk 12yk + 12xk 1 2yk + 1 2 00 2 2xk 12yk + 1=4x2k 4xk + 2 + 4y2k + 4yk8x2k 8xk + 4 + 8y2k + 8yk=12.Q7. Update rule: xk+1 = xk + Dkhk where xk =xkyk, Dk =1 00 1and hk =0.50.5.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2] 4 [0.5 1.5] 1.5 [0.5 1.5] 1.51 [0.5 1.5] 1.5 [0 1] 0 [0 1] 02 [0 1] 0 [0.5 0.5] 0.5 [0.5 0.5] 0.53 [0.5 0.5] 0.5 [1 0] 0 0.8 [0.5 0.5] 0.54 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5 0.5] 0.55 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5 0.5] 0.52Department of Informatics, Kings College LondonBiologically Inspired Methods (6CCS3BIM/7CCSMBIM)Tutorial 2Q1. What are the advantages and disadvantages of gradient descent method?Q2. Show how gradient descent method works using pseudo code.Q3. Consider a least-squares problem, minxf(x) =|| AxB ||22 where A =1 23 4andB =56. Find the optimal solution x.Q4. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing Nelder-Mead Downhill Simplex Method.a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G andW.b. Find the midpoint M and f(M).c. Find the reflection R and f(R).d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)is performed in the pseudo code of Nelder-Mead downhill simplex method,C =W+M2is used.Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)yusing the gradient descent method, find x and y, and f(x, y) for the first 3 iterationswith step size hk = 0.1 and initial guess (7,8).Q6. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the gradient descent method.a. Determine the optimal step size hk.b. Show that it requires one iteration for the gradient descent method to convergewith the optimal step size hk obtained in Q6a.Q7. Consider the minimisation problem of the function: f(x, y) = (x 1)x + (y + 1)yusing the random walk optimisation algorithm. The Threshold (for accepting theworse solution) is 0.75 and the random number generator will generate a repeatingsequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the firstnumber generated). The diagonal entries of Dk are all 1 and all the entries of hkare 0.5. Fill in the content of the following table.k xTk f(xk) xTk+1 f(xk+1) rand() xbest f(xbest)0 [1 2]123451Traditional Numerical Methods: Random WalkAlgorithm: Random Walk Optimisationinput: f (x) : n!; x0: an initial solutionoutput: x, a local minimum of the cost function f (x)k 0; xbest = x0; Threshold ;while STOP-CRIT and k < kmax doGenerate Dk and hk;xk+1 xk +Dkhk;If f (xk+1) > f (xk)
If rand < Threshold Then xk+1 xk;ElseIf f (xbest) > f (xk+1) Then xbest xk+1;
End 
k k+1;
end 
return xk and xbest;
Table 3: Pseudo Code of Random Walk Optimisation.
Dr H.K. Lam (KCL) Optimisation Biologically Inspired Methods 2018-19 46 / 68
TraditionalNumericalMethods:RandomWalkAlgorithm:RandomWalkOptimisation
input:f(x):
n
!;x
0
:aninitialsolution
output:x
,alocalminimumofthecostfunctionf(x)
k 0;x
best
=x
0
;Threshold;
whileSTOP-CRITandk
 k 
 ) 
 Ifrand 
k+1
)Thenx
best
x
k+1
;
End
k k+1;
end
returnx
k
andx
best
;
Table3:PseudoCodeofRandomWalkOptimisation.
DrH.K.Lam(KCL) Optimisation BiologicallyInspiredMethods2018-1946/68 
Department of Informatics, Kings College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2
Q1. What are the advantages and disadvantages of gradient descent method?
Q2. Show how gradient descent method works using pseudo code.
Q3. Consider a least-squares problem, min
x
f(x) =|| AxB ||22 where A =
1 2
3 4
and
B =
5
6
. Find the optimal solution x.
Q4. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method. 
a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and
W.
b. Find the midpoint M and f(M).
c. Find the reflection R and f(R).
d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)
is performed in the pseudo code of Nelder-Mead downhill simplex method,
C =
W+M
2 
is used.
Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations 
with step size hk = 0.1 and initial guess (7,8).
Q6. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using the gradient descent method. 
a. Determine the optimal step size hk.
b. Show that it requires one iteration for the gradient descent method to converge
with the optimal step size hk obtained in Q6a.
Q7. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the 
worse solution) is 0.75 and the random number generator will generate a repeating
sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk 
are 0.5. Fill in the content of the following table.
k x
T
k f(xk) x 
T
k+1 f(xk+1) rand() xbest f(xbest) 
0 [1  2]
1 
2
3
4
5
1
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk) 
where Of(zk) =
2xk  1
2yk + 1 
.
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk) 
=
2xk 1 2yk + 1
2xk  1
2yk + 1 
2xk 1 2yk + 1
2 0
0 2
2xk  1
2yk + 1 
=
4x
2
k  4xk + 2 + 4y 
2
k + 4yk 
8x
2
k  8xk + 4 + 8y 
2
k + 8yk 
=
1
2
.
b. Update rule: zk+1 = zk 
Of(zk)TOf(zk)
Of(zk)TQOf(zk) 
Of(zk)
Randomly pick an initial condition as zk =
1
2
.
zk+1 = zk +
1 
2
Of(zk)
=
1
2
1
2
2xk  1
2yk + 1 
=
1
2
1
2
2 1 1
22 + 1 
=
1
2
1
2
1
3
=
0.5
0.5
Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =
0.5
0.5
.
zk+1 = zk+1 +
1 
2
Of(zk+1)
=
0.5
0.5
1
2
2xk+1  1
2yk+1 + 1 
=
0.5
0.5
1
2
2 0.5 1
20.5 + 1 
=
0.5
0.5
1
2
0
0
=
0.5
0.5
Q7. Update rule: xk+1 = xk + Dkhk where xk =
xk
yk
, Dk =
1 0
0 1
and hk =
0.5
0.5
.
2
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk) 
where Of(zk) =
2xk  1
2yk + 1 
.
hk =
Of(zk)TOf(zk)
Of(zk)TQOf(zk) 
=
2xk 1 2yk + 1
2xk  1
2yk + 1 
2xk 1 2yk + 1
2 0
0 2
2xk  1
2yk + 1 
=
4x
2
k  4xk + 2 + 4y 
2
k + 4yk 
8x
2
k  8xk + 4 + 8y 
2
k + 8yk 
=
1
2
.
b. Update rule: zk+1 = zk 
Of(zk)TOf(zk)
Of(zk)TQOf(zk) 
Of(zk)
Randomly pick an initial condition as zk =
1
2
.
zk+1 = zk +
1 
2
Of(zk)
=
1
2
1
2
2xk  1
2yk + 1 
=
1
2
1
2
2 1 1
22 + 1 
=
1
2
1
2
1
3
=
0.5
0.5
Run another iteration (e.g., k + 1 ! k + 2) by taking zk+1 =
0.5
0.5
.
zk+1 = zk+1 +
1 
2
Of(zk+1)
=
0.5
0.5
1
2
2xk+1  1
2yk+1 + 1 
=
0.5
0.5
1
2
2 0.5 1
20.5 + 1 
=
0.5
0.5
1
2
0
0
=
0.5
0.5
Q7. Update rule: xk+1 = xk + Dkhk where xk =
xk
yk
, Dk =
1 0
0 1
and hk =
0.5
0.5
.
2
f(0.5,1.5) = 1.5
<latexit sha1_base64=”5BnV9jHZb8MQGIhUrqY2qCsOWU8=”>
A
A 
A
B 
/
H 
i
c 
b
V 
D
L 
S
g 
M
x 
F
M 
3
U 
V
x 
1
f 
o
1 
2
6 
C
R 
a
l 
g
h 
1
m 
p
M 
V
u 
h
K 
I
b 
l
x 
X
s 
A
9 
q
h 
Z
N 
J
M 
G
5 
p
5 
k
G 
S
E 
Y
a 
i
/ 
4
C 
e
4 
c
a 
G
I 
W
z 
/
A 
T
3 
A
n 
+
B 
v
u 
T
R 
8
L 
b
T 
1
w 
u
Y 
d
z 
7
i 
U
3 
x
4 
0
Y 
F
d 
K
y 
P
r 
X
M 
0
v 
L
K 
6
l 
p
2 
X
d 
/
Y 
3
N 
r
e 
M
X 
b
3 
G
i 
K
M 
O
S 
Z
1 
H
L 
K
Q 
t
1 
w
k 
C
K 
M
B 
q
U 
s
q 
G
W 
l
F 
n
C 
D
f 
Z
a 
T
p 
D
i 
/
H 
f
v 
O
W 
c
E 
H
D 
4
E 
Y
m 
E
X 
F
8 
1
A 
+
o 
R
z 
G
S 
S
u 
o
a 
O
a 
9
Q 
t
M 
z
y 
C
S 
z
a 
Z
v 
k
Y 
n
k 
P
V 
u
k 
b
e 
M
q 
0
J 
4
C 
K
x 
Z
y 
R
f 
P
f 
o
q 
f
L 
/
f 
d
2 
p
d 
4
6 
P
T 
C
3 
H
s 
k
0 
B
i 
h
o 
R
o 
V
y 
L
p 
p
I 
h
L 
i
h 
k
Z 
6
Z 
1
Y 
k
A 
j
h 
I
e 
q
T 
t
q 
I
B 
8
o 
l
w 
0
s 
n
t 
I
3 
i
o 
l
B 
7
0 
Q
q 
4
q 
k
H 
C
i 
/
t 
5
I 
k
S 
9
E 
4
r 
t
q 
0
k 
d
y 
I
O 
a
9 
s
f 
i
f 
1
4 
6
l 
V
3 
F
S 
G
k 
S
x 
J
A 
G
e 
P
u 
T
F 
D
M 
o
Q 
j
o 
O
A 
P
c 
o
J 
l
i 
x
R 
B
G 
F
O 
1
a 
0
Q 
D
x 
B
H 
W
K 
q
4 
d
F 
2
F 
Y
M 
9
/ 
e
Z 
E
0 
T
k 
2
7 
Z
J 
a
u 
7
X 
z
1 
A
k 
y
R 
B
f 
v
g 
A
B 
S
A 
D
c 
5
A 
F
V 
y
B 
G
q 
g
D 
D
B 
L
w 
A
J 
7
A 
s
3 
a
n 
P
W 
o
v 
2
u 
t
0 
N
K 
P
N 
d
n 
L
g 
D
7 
S
3 
H
2 
A
0 
l
O 
8
= 
</latexit>
Department of Informatics, Kings College London
Biologically Inspired Methods (6CCS3BIM/7CCSMBIM)
Tutorial 2
Q1. What are the advantages and disadvantages of gradient descent method?
Q2. Show how gradient descent method works using pseudo code.
Q3. Consider a least-squares problem, min
x
f(x) =|| AxB ||22 where A =
1 2
3 4
and
B =
5
6
. Find the optimal solution x.
Q4. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using Nelder-Mead Downhill Simplex Method. 
a. Starting with 3 vertices (0, 0), (1, 2), (3, 4), determine the points B, G and
W.
b. Find the midpoint M and f(M).
c. Find the reflection R and f(R).
d. Determine the 3 vertices (B, G and W) in the next iteration. If Case (ii)
is performed in the pseudo code of Nelder-Mead downhill simplex method,
C =
W+M
2 
is used.
Q5. Considering the minimisation problem of the function: f(x, y) = (x1)x+(y+1)y
using the gradient descent method, find x and y, and f(x, y) for the first 3 iterations 
with step size hk = 0.1 and initial guess (7,8).
Q6. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using the gradient descent method. 
a. Determine the optimal step size hk.
b. Show that it requires one iteration for the gradient descent method to converge
with the optimal step size hk obtained in Q6a.
Q7. Consider the minimisation problem of the function: f(x, y) = (x  1)x + (y + 1)y
using the random walk optimisation algorithm. The Threshold (for accepting the 
worse solution) is 0.75 and the random number generator will generate a repeating
sequence of {0.8, 0.7, 0.2, 0.6, 0.9, 0.7, 0.5, 0.6} (The left-most number is the first
number generated). The diagonal entries of Dk are all 1 and all the entries of hk 
are 0.5. Fill in the content of the following table.
k x
T
k f(xk) x 
T
k+1 f(xk+1) rand() xbest f(xbest) 
0 [1  2]
1 
2
3
4
5
1
k x
T
k f(xk) x 
T
k+1 f(xk+1) rand() xbest f(xbest) 
0 [1  2] 4 [0.5  1.5] 1.5  [0.5  1.5] 1.5
1 [0.5  1.5] 1.5 [0  1] 0  [0  1] 0
2 [0  1] 0 [0.5  0.5] 0.5  [0.5  0.5] 0.5
3 [0.5  0.5] 0.5 [1 0] 0 0.8 [0.5  0.5] 0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5  0.5] 0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5  0.5] 0.5 
Q8. a. Mean squared error:
1
M 
MX
i=1
(yi yi)2
Minimisation problem: min
w1,w2,b
1
M
MX
i=1
(yi yi)2
b.
f(w1, w2, b) =
1 
M
MX
i=1
(yi yi)2
=
1
M
MX
i=1
(w1x1(i) + w2x2(i) + b) yi
2
Update rule: zk+1 = zk hkOf(zk) where zk =
2
4
w1k 
w2k
bk
3
5
Of(zk) =
2
66666
4 
@f(zk)
@w1 
@f(zk)
@w2 
@f(zk)
@b 
3
77777
5 
=
2
6666666666666
4 
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x1(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x2(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
3
7777777777777
5 
Update rule:
3
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)),    , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined. 
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
d. Plot iteration k against MSE for 200 iterations. Is the choice of hk = 0.1
right or wrong? 
2
3
2.5
2
x1
1.5
1-2
-1
0
x2
1
2
3
5
4
3
2
1
0
-1
4 
y
3
2.5
2
x
1
1.5
1
-2
-1
0
x
2
1
2
3
5
4
3
2
1
0
-1
4
y 
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2),    , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean 
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
2
k x
T
k f(xk) x 
T
k+1 f(xk+1) rand() xbest f(xbest) 
0 [1  2] 4 [0.5  1.5] 1.5  [0.5  1.5] 1.5
1 [0.5  1.5] 1.5 [0  1] 0  [0  1] 0
2 [0  1] 0 [0.5  0.5] 0.5  [0.5  0.5] 0.5
3 [0.5  0.5] 0.5 [1 0] 0 0.8 [0.5  0.5] 0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5  0.5] 0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5  0.5] 0.5 
Q8. a. Mean squared error:
1
M 
MX
i=1
(yi yi)2
Minimisation problem: min
w1,w2,b
1
M
MX
i=1
(yi yi)2
b.
f(w1, w2, b) =
1 
M
MX
i=1
(yi yi)2
=
1
M
MX
i=1
(w1x1(i) + w2x2(i) + b) yi
2
Update rule: zk+1 = zk hkOf(zk) where zk =
2
4
w1k 
w2k
bk
3
5
Of(xk) =
2
66666
4 
@f(xk)
@w1 
@f(xk)
@w2 
@f(xk)
@b 
3
77777
5 
=
2
6666666666666
4 
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x1(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x2(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
3
7777777777777
5 
Update rule:
3
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)),    , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined. 
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
d. Plot iteration k against MSE for 200 iterations. Is the choice of hk = 0.1
right or wrong? 
2
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2),    , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean 
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
2
k x
T
k f(xk) x 
T
k+1 f(xk+1) rand() xbest f(xbest) 
0 [1  2] 4 [0.5  1.5] 1.5  [0.5  1.5] 1.5
1 [0.5  1.5] 1.5 [0  1] 0  [0  1] 0
2 [0  1] 0 [0.5  0.5] 0.5  [0.5  0.5] 0.5
3 [0.5  0.5] 0.5 [1 0] 0 0.8 [0.5  0.5] 0.5
4 [1 0] 0 [1.5 0.5] 1.5 0.7 [0.5  0.5] 0.5
5 [1 0] 0 [1.5 0.5] 1.5 0.2 [0.5  0.5] 0.5 
Q8. a. Mean squared error:
1
M 
MX
i=1
(yi yi)2
Minimisation problem: min
w1,w2,b
1
M
MX
i=1
(yi yi)2
b.
f(w1, w2, b) =
1 
M
MX
i=1
(yi yi)2
=
1
M
MX
i=1
(w1x1(i) + w2x2(i) + b) yi
2
Update rule: zk+1 = zk hkOf(zk) where zk =
2
4
w1k 
w2k
bk
3
5
Of(xk) =
2
66666
4 
@f(xk)
@w1 
@f(xk)
@w2 
@f(xk)
@b 
3
77777
5 
=
2
6666666666666
4 
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x1(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x2(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
3
7777777777777
5 
Update rule:
3
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2),    , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean 
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
2
zk+1 = zk hkOf(zk)
)
2
4
w1k+1 
w2k+1
bk+1
3
5
= 
2
4
w1k 
w2k
bk
3
5 hk
2
6666666666666
4 
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x1(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
x2(i)
1
M 
MX
i=1
2
(w1x1(i) + w2x2(i) + b) yi
3
7777777777777
5 
=
2
4
w1k 
w2k
bk
3
5
2hk 
M
2
6666666666666
4 
MX
i=1
(w1x1(i) + w2x2(i) + b) yi
x1(i)
MX
i=1
(w1x1(i) + w2x2(i) + b) yi
x2(i)
MX
i=1
(w1x1(i) + w2x2(i) + b) yi
3
7777777777777
5 
c. Step size: hk = 0.1
Dateset: (1,2, 3), (2, 4,1), (3, 0, 5) ) M = 3.
Initial guess: z0 =
2
4
w10 
w20
b0
3
5
= 
2
4
1 
1
1
3
5
4
First iteration: k = with z0 =
2
4
1 
1
1
3
5
2
4
w11 
w21
b1
3
5
= 
2
4
1 
1
1
3
5
2 0.1 
3
2
6666666666666666
4 
(w1x1(1) + w2x2(1) + b) y1
x1(1)
+
(w1x1(2) + w2x2(2) + b) y2
x1(2)
+
(w1x1(3) + w2x2(3) + b) y3
x1(3)
(w1x1(1) + w2x2(1) + b) y1
x2(1)
+
(w1x1(2) + w2x2(2) + b) y2
x2(2)
+
(w1x1(3) + w2x2(3) + b) y3
x2(3)
(w1x1(1) + w2x2(1) + b) y1
+
(w1x1(2) + w2x2(2) + b) y2
+
(w1x1(3) + w2x2(3) + b) y3
3
7777777777777777
5 
=
2
4
1 
1
1
3
5
2 0.1 
3
2
6666666666666666
4 
(1 w1 2 w2 + b) 3
1
+
(2 w1 + 4 w2 + b) + 1
2
+
(3 w1 + 0 w2 + b) 5
3
(1 w1 2 w2 + b) 3
2
+
(2 w1 + 4 w2 + b) + 1
4
+
(3 w1 + 0 w2 + b) 5
0
(1 w1 2 w2 + b) 3
+
(2 w1 + 4 w2 + b) + 1
+
(3 w1 + 0 w2 + b) 5
3
7777777777777777
5 
=
2
4
1 
1
1
3
5
2 
3
0.1
2
4
10 
38
4
3
5
=
2
4
0.3333 
1.5333
0.7333 
3
5
Second iteration: k = 1 with z1 =
2
4
0.3333 
1.5333
0.7333 
3
5
z2 =
2
4
1.4089 
0.3867
1.1244 
3
5
Third iteration: k = 2 with z2 =
2
4
1.4089 
0.3867
1.1244 
3
5
z3 =
2
4
0.8655 
1.2513
0.8542 
3
5
Fourth iteration: k = 3 with z3 =
2
4
0.8655 
1.2513
0.8542 
3
5
5
First iteration: k = with z0 =
2
4
1 
1
1
3
5
2
4
w11 
w21
b1
3
5
= 
2
4
1 
1
1
3
5
2 0.1 
3
2
6666666666666666
4 
(w1x1(1) + w2x2(1) + b) y1
x1(1)
+
(w1x1(2) + w2x2(2) + b) y2
x1(2)
+
(w1x1(3) + w2x2(3) + b) y3
x1(3)
(w1x1(1) + w2x2(1) + b) y1
x2(1)
+
(w1x1(2) + w2x2(2) + b) y2
x2(2)
+
(w1x1(3) + w2x2(3) + b) y3
x2(3)
(w1x1(1) + w2x2(1) + b) y1
+
(w1x1(2) + w2x2(2) + b) y2
+
(w1x1(3) + w2x2(3) + b) y3
3
7777777777777777
5 
=
2
4
1 
1
1
3
5
2 0.1 
3
2
6666666666666666
4 
(1 w1 2 w2 + b) 3
1
+
(2 w1 + 4 w2 + b) + 1
2
+
(3 w1 + 0 w2 + b) 5
3
(1 w1 2 w2 + b) 3
2
+
(2 w1 + 4 w2 + b) + 1
4
+
(3 w1 + 0 w2 + b) 5
0
(1 w1 2 w2 + b) 3
+
(2 w1 + 4 w2 + b) + 1
+
(3 w1 + 0 w2 + b) 5
3
7777777777777777
5 
=
2
4
1 
1
1
3
5
2 
3
0.1
2
4
10 
38
4
3
5
=
2
4
0.3333 
1.5333
0.7333 
3
5
Second iteration: k = 1 with z1 =
2
4
0.3333 
1.5333
0.7333 
3
5
z2 =
2
4
1.4089 
0.3867
1.1244 
3
5
Third iteration: k = 2 with z2 =
2
4
1.4089 
0.3867
1.1244 
3
5
z3 =
2
4
0.8655 
1.2513
0.8542 
3
5
Fourth iteration: k = 3 with z3 =
2
4
0.8655 
1.2513
0.8542 
3
5
5
z4 =
2
4
1.2832 
0.7097
0.9707 
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832 
0.7097
0.9707 
3
5
z5 =
2
4
1.0478 
1.0728
0.8246 
3
5
When increasing k, z converges at
2
4
2.0000 
1.0000
1.0000 
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
6
z4 =
2
4
1.2832 
0.7097
0.9707 
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832 
0.7097
0.9707 
3
5
z5 =
2
4
1.0478 
1.0728
0.8246 
3
5
When increasing k, z converges at
2
4
2.0000 
1.0000
1.0000 
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
6
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset
is (x1(1), x2(1), y1), (x1(2), x2(2), y2),    , (x1(M), x2(M), yM). Design a linear re-
gressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean 
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined.
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
d. Plot iteration k against MSE for 200 iterations. Is the choice of hk = 0.1
right or wrong?
2
z4 =
2
4
1.2832 
0.7097
0.9707 
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832 
0.7097
0.9707 
3
5
z5 =
2
4
1.0478 
1.0728
0.8246 
3
5
When increasing k, z converges at
2
4
2.0000 
1.0000
1.0000 
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
6
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)),    , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined. 
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
d. Plot iteration k against MSE for 200 iterations. Is the choice of hk = 0.1
right or wrong? 
Q9. Find the least-squares solution of Ax = B where:
A =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 and B = 
2
66
4 
0
1
3
4 
3
77
5 . 
Q10. Consider the three data points (x, y) given as follows:
(0, 6), (1, 0), (2, 0),
which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized. 
(1,0)
(0,6)
(2,0)
2
z4 =
2
4
1.2832
0.7097
0.9707 
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832
0.7097
0.9707 
3
5
z5 =
2
4
1.0478
1.0728
0.8246 
3
5
When increasing k, z converges at
2
4
2.0000
1.0000
1.0000 
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
Q9. Represent a1, a2, a3 as the columns of A. It is found that it is an orthogonal set,
i.e., ai  aj = 0 for all i 6= j.
The solution can thus be calculated as 
x =
2
6666666
4 
B  a1
a1  a1
B  a2
a2  a2
B  a3
a3  a3 
3
7777777
5 
=
2
666666
4 
3
2
3
2
2
3
777777
5
. 
Remark: The solution is the same as x = (ATA)1ATB.
Q10. Given by the question, the three data points, in the form (x, y), are from a single
line. However, due to measurement errors, these three points, (0, 6), (1, 0) and (2, 
0), obtained from measurement may not lie on a line. When you plot the data, it
can be seen that they are not lying in a straight line.
The general form of a straight line is
y = mx+ c.
where m and c are constants to be determined.
We are going to approximate the three data points using a straight line so it is
expected that the values of y from this straight line produces the following values:
6 = m 0 + c from the data point (0, 6)
0 = m 1 + c from the data point (1, 0)
0 = m 2 + c from the data point (2, 0)
6
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)),    , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined. 
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
d. Plot iteration k against MSE for 200 iterations. Is the choice of hk = 0.1
right or wrong? 
Q9. Find the least-squares solution of Ax = B where:
A =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 and B = 
2
66
4 
0
1
3
4 
3
77
5 . 
Q10. Consider the three data points (x, y) given as follows:
(0, 6), (1, 0), (2, 0),
which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized. 
(1,0)
(0,6)
(2,0)
2
Q8. Consider a system taking two input variables x1 and x2 and generate an output
y. A set of M input-output data is collected in an experiment, e.g., the dataset is
(x1(1), x2(1), y(1)), (x1(2), x2(2), y(2)),    , (x1(M), x2(M), y(M)). Design a linear
regressor in the form of y = w1x1 + w2x2 + b to best fit the data in terms of Mean
Squared Error using the gradient descent method, where w1 and w2 and b are the
parameters to be determined. 
a. Formulate the data fitting problem as a minimisation problem.
b. Denote the step size as hk. Derive the update rule for each parameter.
c. Use hk = 0.1 and the initial guess for all variables is 1. Considering the dataset:
(1,2, 3), (2, 4,1), (3, 0, 5), obtain the best set of parameters for the linear
regressor. 
d. Plot iteration k against MSE for 200 iterations. Is the choice of hk = 0.1
right or wrong? 
Q9. Find the least-squares solution of Ax = B where:
A =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 and B = 
2
66
4 
0
1
3
4 
3
77
5 . 
Q10. Consider the three data points (x, y) given as follows:
(0, 6), (1, 0), (2, 0),
which are the data obtained from a single line but have errors due to measurement.
Find a straight line that best fits these points in the sense of least-squares? Explain
the quantity that is being minimized. 
(1,0)
(0,6)
(2,0)
2
z4 =
2
4
1.2832
0.7097
0.9707 
3
5
Fifth iteration: k = 4 with z4 =
2
4
1.2832
0.7097
0.9707 
3
5
z5 =
2
4
1.0478
1.0728
0.8246 
3
5
When increasing k, z converges at
2
4
2.0000
1.0000
1.0000 
3
5
d.
It can be seen from the figure that the MSE is monotonic decreasing and
converges to almost zero, indicating that the choice of hk is fine.
Q9. Represent a1, a2, a3 as the columns of A. It is found that it is an orthogonal set,
i.e., ai  aj = 0 for all i 6= j.
The solution can thus be calculated as 
x =
2
6666666
4 
B  a1
a1  a1
B  a2
a2  a2
B  a3
a3  a3 
3
7777777
5 
=
2
666666
4 
3
2
3
2
2
3
777777
5
. 
Remark: The solution is the same as x = (ATA)1ATB.
Q10. Given by the question, the three data points, in the form (x, y), are from a single
line. However, due to measurement errors, these three points, (0, 6), (1, 0) and (2, 
0), obtained from measurement may not lie on a line. When you plot the data, it
can be seen that they are not lying in a straight line.
The general form of a straight line is
y = mx+ c.
where m and c are constants to be determined.
We are going to approximate the three data points using a straight line so it is
expected that the values of y from this straight line produces the following values:
6 = m 0 + c from the data point (0, 6)
0 = m 1 + c from the data point (1, 0)
0 = m 2 + c from the data point (2, 0)
6
In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have 
Ax = B
where
A =
2
4
0 1 
1 1
2 1
3
5 , x =
m
c 
, B =
2
4
6 
0
0
3
5 .
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
The least-squares solution is found as
x =
m
c 
= (ATA)1ATB =
3
5 
.
Therefore the best-fit line is
y = mx+ c = 3x+ 5.
(1,0)
(0,6)
(2,0)
y = 3x+ 5
What exactly is the line y = f(x) = 3x+ 5 minimizing?
The least-squares solution x minimizes kAxBk22, which is the sum of the squares
of the entries of the vector AxB. More specifically, 
Ax =
2
4
0 1 
1 1
2 1
3
5
3
5 
=
2
4
03 + 5
13 + 5
23 + 5 
3
5 =
2
4
f(0)
f(1)
f(2) 
3
5 =
2
4
5 
2
1
3
5 .
7
In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have 
Ax = B
where
A =
2
4
0 1 
1 1
2 1
3
5 , x =
m
c 
, B =
2
4
6 
0
0
3
5 .
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
The least-squares solution is found as
x =
m
c 
= (ATA)1ATB =
3
5 
.
Therefore the best-fit line is
y = mx+ c = 3x+ 5.
(1,0)
(0,6)
(2,0)
y = 3x+ 5
What exactly is the line y = f(x) = 3x+ 5 minimizing?
The least-squares solution x minimizes kAxBk22, which is the sum of the squares
of the entries of the vector AxB. More specifically, 
Ax =
2
4
0 1 
1 1
2 1
3
5
3
5 
=
2
4
03 + 5
13 + 5
23 + 5 
3
5 =
2
4
f(0)
f(1)
f(2) 
3
5 =
2
4
5 
2
1
3
5 .
7
In the equations given above, m and c are the unknowns. If we put these equations
into matrix form, we have 
Ax = B
where
A =
2
4
0 1 
1 1
2 1
3
5 , x =
m
c 
, B =
2
4
6 
0
0
3
5 .
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
The least-squares solution is found as
x =
m
c 
= (ATA)1ATB =
3
5 
.
Therefore the best-fit line is
y = mx+ c = 3x+ 5.
(1,0)
(0,6)
(2,0)
y = 3x+ 5
What exactly is the line y = f(x) = 3x+ 5 minimizing?
The least-squares solution x minimizes kAxBk22, which is the sum of the squares
of the entries of the vector AxB. More specifically, 
Ax =
2
4
0 1 
1 1
2 1
3
5
3
5 
=
2
4
03 + 5
13 + 5
23 + 5 
3
5 =
2
4
f(0)
f(1)
f(2) 
3
5 =
2
4
5 
2
1
3
5 .
7
The entries of Ax are the y-coordinates of the graph of the line y = 3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. AxB is the vertical distance of the graph from the data
points. y = f(x) = 3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared dierences 
between the approximate y value given by the straight line and the desired value of
y given in B. 
(1,0)
(0,6)
(2,0)
y = 3x+ 5
1
1
2
Q11. The general equation for a parabola is
y = Bx2 + Cx+D (1)
where B, C and D are constants to be determined and the data point is in the form
of (x, y). 
Since the four data points, (1, 1
2
), (1,1), (2,1 
2
), (3, 2), are approximated by this 
parabola equation, it is expected that the value of y from the parabola equation
should produce: 
Data point (1,
1 
2
) : 
1
2
= B  (1)2 + C  (1) +D 
Data point (1,1) : 1 = B (1)2 + C (1) +D
Data point (2,
1 
2
) :  
1
2
= B  (2)2 + C  (2) +D 
Data point (3, 2) : 2 = B (3)2 + C (3) +D.
8
The entries of Ax are the y-coordinates of the graph of the line y = 3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. AxB is the vertical distance of the graph from the data
points. y = f(x) = 3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared dierences 
between the approximate y value given by the straight line and the desired value of
y given in B. 
(1,0)
(0,6)
(2,0)
y = 3x+ 5
1
1
2
Q11. The general equation for a parabola is
y = Bx2 + Cx+D (1)
where B, C and D are constants to be determined and the data point is in the form
of (x, y). 
Since the four data points, (1, 1
2
), (1,1), (2,1 
2
), (3, 2), are approximated by this 
parabola equation, it is expected that the value of y from the parabola equation
should produce: 
Data point (1,
1 
2
) : 
1
2
= B  (1)2 + C  (1) +D 
Data point (1,1) : 1 = B (1)2 + C (1) +D
Data point (2,
1 
2
) :  
1
2
= B  (2)2 + C  (2) +D 
Data point (3, 2) : 2 = B (3)2 + C (3) +D.
8
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares. 
(1,
1 
2
), (1,1), (2, 
1
2
), (3, 2). 
What quantity is being minimized?
(-1,1
2
) 
(1,-1)
(2,-1 
2
) 
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized. 
x y f(x, y)
1 0 0
0 1 1
1 0 3
0 1 4 
Q13. Here are several data points
x
y
given as follows:
4
1 
,
3
0 
,
2
1.5 
,
1
0.5 
,
0
1 
,
1
1 
,
2
0.5
,
3
2 
,
4
1 
.
Find a function in the form
y = B + C cos(x) +D sin(x) + E cos(2x) + F sin(2x) +G cos(3x) +H sin(3x)
that best approximates the data points in the sense of least-squares?
Q14. There are three categories of machines I, II and III installed in a factory. Machines
have dierent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows: 
3
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares. 
(1,
1 
2
), (1,1), (2, 
1
2
), (3, 2). 
What quantity is being minimized?
(-1,1
2
) 
(1,-1)
(2,-1 
2
) 
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized. 
x y f(x, y)
1 0 0
0 1 1
1 0 3
0 1 4 
Q13. Here are several data points
x
y
given as follows:
4
1 
,
3
0 
,
2
1.5 
,
1
0.5 
,
0
1 
,
1
1 
,
2
0.5
,
3
2 
,
4
1 
.
Find a function in the form
y = B + C cos(x) +D sin(x) + E cos(2x) + F sin(2x) +G cos(3x) +H sin(3x)
that best approximates the data points in the sense of least-squares?
Q14. There are three categories of machines I, II and III installed in a factory. Machines
have dierent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows: 
3
The entries of Ax are the y-coordinates of the graph of the line y = 3x+5 with x
chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. AxB is the vertical distance of the graph from the data
points. y = f(x) = 3x+5 is the line that minimizes the sum of the squares of the
these vertical distances, i.e., that is to minimise the sum of the squared dierences 
between the approximate y value given by the straight line and the desired value of
y given in B. 
(1,0)
(0,6)
(2,0)
y = 3x+ 5
1
1
2
Q11. The general equation for a parabola is
y = Bx2 + Cx+D
where B, C and D are constants to be determined and the data point is in the form
of (x, y). 
Since the four data points, (1, 1
2
), (1,1), (2,1 
2
), (3, 2), are approximated by this 
parabola equation, it is expected that the value of y from the parabola equation
should produce: 
Data point (1,
1 
2
) : 
1
2
= B  (1)2 + C  (1) +D 
Data point (1,1) : 1 = B (1)2 + C (1) +D
Data point (2,
1 
2
) :  
1
2
= B  (2)2 + C  (2) +D 
Data point (3, 2) : 2 = B (3)2 + C (3) +D.
8
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4 
1 1 1
1 1 1 
4 2 1
9 3 1
3
77
5 , x = 
2
4
B
C
D 
3
5 , B =
2
66
4 
1
2
1
1 
2
2 
3
77
5 . 
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
The least-squares solution is found as
x =
2
4
B
C
D 
3
5 = (ATA)1ATB =
2
6666666
4 
53
88
379
440
41
44
3
7777777
5 
.
The best-fit parabola is
y = Bx2 + Cx+D =
53 
88
x2  
379
440
x 
41
44
, 
which is equivalent to
88y = 532 
379 
5
x 82. 
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
88y = 532  379
5
x 82 
What exactly the parabola y = f(x) is minimizing?
9
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4 
1 1 1
1 1 1 
4 2 1
9 3 1
3
77
5 , x = 
2
4
B
C
D 
3
5 , B =
2
66
4 
1
2
1
1 
2
2 
3
77
5 . 
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
The least-squares solution is found as
x =
2
4
B
C
D 
3
5 = (ATA)1ATB =
2
6666666
4 
53
88
379
440
41
44
3
7777777
5 
.
The best-fit parabola is
y = Bx2 + Cx+D =
53 
88
x2  
379
440
x 
41
44
, 
which is equivalent to
88y = 532 
379 
5
x 82. 
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
88y = 532  379
5
x 82 
What exactly the parabola y = f(x) is minimizing?
9
The least-squares solution x minimizes kAx bk22. In this case, Ax is
Ax =
2
66
4 
1 1 1
1 1 1 
4 2 1
9 3 1
3
77
5 
2
6666666
4 
53
88
379
440
41
44
3
7777777
5 
=
2
6666
4 
53
88
(1)2  379 
440
(1) 41 
44
53
88
(1) 
2  379
440 
(1) 41
44 
53
88
(2) 
2  379
440 
(2) 41
44 
53
88
(3) 
2  379
440 
(3) 41
44 
3
7777
5
= 
2
6666
4 
f(1)
f(1) 
f(2)
f(3)
3
7777
5
= 
2
6666
4 
117
220 
131
110 
 27
110 
419
220 
3
7777
5
. 
The entries of Ax are the y-coordinates of the parabola 88y = 532 379
5
x82 with 
x chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. AxB is the vertical distance of the graph from the data
points. 88y = 532 379 
5
x82 minimizes the sum of the squares of the these vertical 
distances, i.e., that is to minimise the sum of the squared dierences between the
approximate y value and the desired value of y given in B.
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
88y = 532  379
5
x 82 
7
220 
 21
110 
14
55 
21
220 
Q12. The general equation for a linear function with two variables is
f(x, y) = Bx+ Cy +D. (1)
where B, C and D are constants to be determined.
If we feed the provided data into the general equation given above:
Data point (1, 0) : B  1 + C  0 +D = 0
Data point (0, 1) : B  0 + C  1 +D = 1 
Data point (1, 0) : B 1 + C  0 +D = 3
Data point (0,1) : B  0 + C 1 +D = 4. 
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 , x = 
2
4
B
C
D 
3
5 , B =
2
66
4 
0
1
3
4
3
77
5 . 
10
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares. 
(1,
1 
2
), (1,1), (2, 
1
2
), (3, 2). 
What quantity is being minimized?
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized. 
x y f(x, y)
1 0 0 
0 1 1
1 0 3
0 1 4 
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have dierent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are 600 and 400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise 
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
The least-squares solution x minimizes kAx bk22. In this case, Ax is
Ax =
2
66
4 
1 1 1
1 1 1 
4 2 1
9 3 1
3
77
5 
2
6666666
4 
53
88
379
440
41
44
3
7777777
5 
=
2
6666
4 
53
88
(1)2  379 
440
(1) 41 
44
53
88
(1) 
2  379
440 
(1) 41
44 
53
88
(2) 
2  379
440 
(2) 41
44 
53
88
(3) 
2  379
440 
(3) 41
44 
3
7777
5
= 
2
6666
4 
f(1)
f(1) 
f(2)
f(3)
3
7777
5
= 
2
6666
4 
117
220 
131
110 
 27
110 
419
220 
3
7777
5
. 
The entries of Ax are the y-coordinates of the parabola 88y = 532 379
5
x82 with 
x chosen as our data points, while B is the vector whose entries are the y-coordinates
of the those data points. AxB is the vertical distance of the graph from the data
points. 88y = 532 379 
5
x82 minimizes the sum of the squares of the these vertical 
distances, i.e., that is to minimise the sum of the squared dierences between the
approximate y value and the desired value of y given in B.
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
88y = 532  379
5
x 82 
7
220 
 21
110 
14
55 
21
220 
Q12. The general equation for a linear function with two variables is
f(x, y) = Bx+ Cy +D.
where B, C and D are constants to be determined.
If we feed the provided data into the general equation given above:
Data point (1, 0) : B  1 + C  0 +D = 0
Data point (0, 1) : B  0 + C  1 +D = 1 
Data point (1, 0) : B 1 + C  0 +D = 3
Data point (0,1) : B  0 + C 1 +D = 4. 
Rewrite the above equations in matrix form, we have
Ax = B
where
A =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 , x = 
2
4
B
C
D 
3
5 , B =
2
66
4 
0
1
3
4
3
77
5 . 
10
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares. 
(1,
1 
2
), (1,1), (2, 
1
2
), (3, 2). 
What quantity is being minimized?
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized. 
x y f(x, y)
1 0 0 
0 1 1
1 0 3
0 1 4 
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have dierent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are 600 and 400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise 
the profit of the factory? What would be the maximum profit in this case?
3
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as 
x =
2
6666666
4 
B  a1
a1  a1
B  a2
a2  a2
B  a3
a3  a3 
3
7777777
5 
=
2
666666
4 
3
2
3
2
2
3
777777
5
. 
Remark: The solution is the same as x = (ATA)1ATB.
So the linear equation that approximates the data in the sense of least-squares is
f(x, y) = 
3 
2
x 
3
2
y + 2. 
Lets move one step further to investigate the quantity that is being minimized.
Function f(x, y) forms a graph of surface as shown below. The vector Ax are:
11
Theaboveisformulatedasaleast-squaresproblem,i.e.,
min
x
||AxB||
2
2
.
Itisnoticedthatthecolumnsa
1
,a
2
,a
3
ofmatrixAareorthogonal.Therefore,the
solutioncanthusbecalculatedas
x=
2
6
6
6
6
6
6
6
4
Ba
1
a
1
a
1
Ba
2
a
2
a
2
Ba
3
a
3
a
3
3
7
7
7
7
7
7
7
5
=
2
6
6
6
6
6
6
4
3
2
3
2
2
3
7
7
7
7
7
7
5
.
Remark:Thesolutionisthesameasx=(A
T
A)
1
A
T
B.
Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresis
f(x,y)=
3
2
x
3
2
y+2.
Letsmoveonestepfurthertoinvestigatethequantitythatisbeingminimized.
Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:
11 
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as 
x =
2
6666666
4 
B  a1
a1  a1
B  a2
a2  a2
B  a3
a3  a3 
3
7777777
5 
=
2
666666
4 
3
2
3
2
2
3
777777
5
. 
Remark: The solution is the same as x = (ATA)1ATB.
So the linear equation that approximates the data in the sense of least-squares is
f(x, y) = 
3 
2
x 
3
2
y + 2. 
Lets move one step further to investigate the quantity that is being minimized.
Function f(x, y) forms a graph of surface as shown below. The vector Ax are:
11
Theaboveisformulatedasaleast-squaresproblem,i.e.,minx||AxB||22.Itisnoticedthatthecolumnsa1,a2,a3ofmatrixAareorthogonal.Therefore,thesolutioncanthusbecalculatedasx=266666664Ba1a1a1Ba2a2a2Ba3a3a3377777775=266666643232237777775.Remark:Thesolutionisthesameasx=(ATA)1ATB.Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresisf(x,y)=32x32y+2.Letsmoveonestepfurthertoinvestigatethequantitythatisbeingminimized.Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:11
Ax =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 
2
666666
4 
3
2
3
2
2
3
777777
5
== 
2
66666666
4 
3
2
 1 3 
2
 0 + 2 
3
2
 0 3 
2
 1 + 2 
3
2
1 3 
2
 0 + 2 
3
2
 0 3 
2
1 + 2 
3
77777777
5 
=
2
66
4 
f(1, 0)
f(0, 1)
f(1, 0)
f(0,1) 
3
77
5 = 
2
666666666666
4 
1
2
1
2
7
2
7
2
3
777777777777
5 
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The dierence AxB refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares 
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared dierences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B. 
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production. 
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem: 
max
x,y 
Z = 600x+ 400y
subject to
x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II) 
x+
5 
4
y  5 (constraint on Machine III) 
x  0 (quantiy of item M is non-negative)
y  0 (quantiy of item N is non-negative) 
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
The above is formulated as a least-squares problem, i.e.,
min
x 
||AxB||22.
It is noticed that the columns a1, a2, a3 of matrix A are orthogonal. Therefore, the
solution can thus be calculated as 
x =
2
6666666
4 
B  a1
a1  a1
B  a2
a2  a2
B  a3
a3  a3 
3
7777777
5 
=
2
666666
4 
3
2
3
2
2
3
777777
5
. 
Remark: The solution is the same as x = (ATA)1ATB.
So the linear equation that approximates the data in the sense of least-squares is
f(x, y) = 
3 
2
x 
3
2
y + 2. 
Lets move one step further to investigate the quantity that is being minimized.
Function f(x, y) forms a graph of surface as shown below. The vector Ax are:
11
Theaboveisformulatedasaleast-squaresproblem,i.e.,minx||AxB||22.Itisnoticedthatthecolumnsa1,a2,a3ofmatrixAareorthogonal.Therefore,thesolutioncanthusbecalculatedasx=266666664Ba1a1a1Ba2a2a2Ba3a3a3377777775=266666643232237777775.Remark:Thesolutionisthesameasx=(ATA)1ATB.Sothelinearequationthatapproximatesthedatainthesenseofleast-squaresisf(x,y)=32x32y+2.Letsmoveonestepfurthertoinvestigatethequantitythatisbeingminimized.
Functionf(x,y)formsagraphofsurfaceasshownbelow.ThevectorAxare:
11 
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares. 
(1,
1 
2
), (1,1), (2, 
1
2
), (3, 2). 
What quantity is being minimized?
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized. 
x y f(x, y)
1 0 0 
0 1 1
1 0 3
0 1 4 
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have dierent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are 600 and 400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise 
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares. 
(1,
1 
2
), (1,1), (2, 
1
2
), (3, 2). 
What quantity is being minimized?
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized. 
x y f(x, y)
1 0 0 
0 1 1
1 0 3
0 1 4 
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have dierent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are 600 and 400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise 
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
Q11. Find a parabola equation that best fits the below data points (x, y) in the sense of
least-squares. 
(1,
1 
2
), (1,1), (2, 
1
2
), (3, 2). 
What quantity is being minimized?
(-1,
1
2
) 
(1,-1)
(2,-
1
2
) 
(3,2)
Q12. According to the following data, find the best approximated linear function f(x, y)
in the sense of least-squares and identify the quantity being minimized. 
x y f(x, y)
1 0 0 
0 1 1
1 0 3
0 1 4 
Q13. There are three categories of machines I, II and III installed in a factory. Machines
have dierent requirements on operating time. Machines I and II can be operated
for most 12 hours whereas machine III needs to be operated for at least 5 hours
per day. This factory produces two items M and N. Both require the use of all the
three machines. The operating time of producing 1 unit of each item on the three
machines are presented in Table 1 as follows:
Items Machine I Machine II Machine III
M 1 2 1
N 2 1 1.25
Table 1: Operating time of producing one unit of each item on each machine.
The profit of item M and N are 600 and 400 each per unit, respectively. What
would be the most reasonable strategy to produce the items M and N as to maximise 
the profit of the factory? What would be the maximum profit in this case?
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
3
Ax =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 
2
666666
4 
3
2
3
2
2
3
777777
5
== 
2
66666666
4 
3
2
 1 3 
2
 0 + 2 
3
2
 0 3 
2
 1 + 2 
3
2
1 3 
2
 0 + 2 
3
2
 0 3 
2
1 + 2 
3
77777777
5 
=
2
66
4 
f(1, 0)
f(0, 1)
f(1, 0)
f(0,1) 
3
77
5 = 
2
666666666666
4 
1
2
1
2
7
2
7
2
3
777777777777
5 
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The dierence AxB refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares 
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared dierences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B. 
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production. 
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem: 
max
x,y 
Z = 600x+ 400y
subject to
x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II) 
x+
5 
4
y  5 (constraint on Machine III) 
x  0 (quantiy of item M is non-negative)
y  0 (quantiy of item N is non-negative) 
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
Ax =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 
2
666666
4 
3
2
3
2
2
3
777777
5
== 
2
66666666
4 
3
2
 1 3 
2
 0 + 2 
3
2
 0 3 
2
 1 + 2 
3
2
1 3 
2
 0 + 2 
3
2
 0 3 
2
1 + 2 
3
77777777
5 
=
2
66
4 
f(1, 0)
f(0, 1)
f(1, 0)
f(0,1) 
3
77
5 = 
2
666666666666
4 
1
2
1
2
7
2
7
2
3
777777777777
5 
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The dierence AxB refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares 
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared dierences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B. 
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production. 
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem: 
max
x,y 
Z = 600x+ 400y
subject to
x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II) 
x+
5 
4
y  5 (constraint on Machine III) 
x  0 (quantiy of item M is non-negative)
y  0 (quantiy of item N is non-negative) 
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
Ax =
2
66
4 
1 0 1
0 1 1
1 0 1
0 1 1 
3
77
5 
2
666666
4 
3
2
3
2
2
3
777777
5
== 
2
66666666
4 
3
2
 1 3 
2
 0 + 2 
3
2
 0 3 
2
 1 + 2 
3
2
1 3 
2
 0 + 2 
3
2
 0 3 
2
1 + 2 
3
77777777
5 
=
2
66
4 
f(1, 0)
f(0, 1)
f(1, 0)
f(0,1) 
3
77
5 = 
2
666666666666
4 
1
2
1
2
7
2
7
2
3
777777777777
5 
.
The entries of this vector are the values of f(x, y) when given the data points in
the table. The vector B is the vector whose entries are the desired values of f(x, y)
evaluated at those points. The dierence AxB refers to the distance between the
graph of surface and the data points. The linear function obtained via least-squares 
method minimizes the sum of these vertical distances, i.e., that is to minimise the
sum of the squared dierences between the predicted f(x, y) value given and the
desired value of f(x, y) given in B. 
Q13. Denote the decision variables x and y as the number of items M and N, respectively,
and Z represents the total profit on the production. 
The profit is defined as Z = 600x + 400y and the maximisation of the profit is
defined as the following constrained maximisation problem: 
max
x,y 
Z = 600x+ 400y
subject to
x+ 2y  12 (constraint on Machine I)
2x+ y  12 (constraint on Machine II) 
x+
5 
4
y  5 (constraint on Machine III) 
x  0 (quantiy of item M is non-negative)
y  0 (quantiy of item N is non-negative) 
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
12
x  axis
-4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 
y axis
1 4
1 3
1 2
1 1
1 0
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com
x  axis
-4-3-2-11234567891011121314
y  axis
14
13
12
11
10
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com 
The feasible regions is the shaded area and A, B, C, D, E are corner-point feasible
solutions (CPF solutions). Evaluate Z = 600x+ 400y at these corner points.
Corner point (x, y) Z = 600x+ 400y
A: (5,0) 3000 
B: (6,0) 3600
C: (4,4) 4000 (maximum)
D: (0,6) 2400 
E: (0,4) 1600
Therefore, producing 4 units of each item (x = 4 and y = 4) can produce the
maximum profit of 4000. 
Q14. Denote x and y as the units of commodity transported from factory at P to the
depots at A and B, respectively. 
As all produced commodities from both factories will be transported and factory at
P can produce 8 units, (8 x y) units will be transported from P to C.
As the number of units to be transported must be non-negative, we have the fol-
lowing constraints: x 0, y 0 and 8 x y 0.
Depot A requires 5 units of commodity weekly and x units have been provided by
factory P. Thus, the remaining (5  x) units need to be transported to deport A
from the factory at Q. 
13
ThefeasibleregionsistheshadedareaandA,B,C,D,Earecorner-pointfeasible
solutions(CPFsolutions).EvaluateZ=600x+400yatthesecornerpoints.
Cornerpoint(x,y)Z=600x+400y
A:(5,0) 3000
B:(6,0) 3600
C:(4,4)4000(maximum)
D:(0,6) 2400
E:(0,4) 1600
Therefore,producing4unitsofeachitem(x=4andy=4)canproducethe
maximumprotof4000.
Q14.DenotexandyastheunitsofcommoditytransportedfromfactoryatPtothe
depotsatAandB,respectively.
Asallproducedcommoditiesfrombothfactorieswillbetransportedandfactoryat
Pcanproduce8units,(8xy)unitswillbetransportedfromPtoC.
Asthenumberofunitstobetransportedmustbenon-negative,wehavethefol-
lowingconstraints:x0,y0and8xy0.
DepotArequires5unitsofcommodityweeklyandxunitshavebeenprovidedby
factoryP.Thus,theremaining(5x)unitsneedtobetransportedtodeportA
fromthefactoryatQ.
13 
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from dierent factories to dierent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are 50/kg and 70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum? 
Q16. Explain how Recursive least-squares works.
4
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from dierent factories to dierent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are 50/kg and 70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum? 
Q16. Explain how Recursive least-squares works.
4
The feasible regions is the shaded area and A, B, C, D, E are corner-point feasible
solutions (CPF solutions). Evaluate Z = 600x+ 400y at these corner points.
Corner point (x, y) Z = 600x+ 400y
A: (5,0) 3000 
B: (6,0) 3600
C: (4,4) 4000 (maximum)
D: (0,6) 2400 
E: (0,4) 1600
Therefore, producing 4 units of each item (x = 4 and y = 4) can produce the
maximum profit of 4000. 
Q14. Denote x and y as the units of commodity transported from factory at P to the
depots at A and B, respectively. 
As all produced commodities from both factories will be transported and factory at
P can produce 8 units, (8 x y) units will be transported from P to C.
As the number of units to be transported must be non-negative, we have the fol-
lowing constraints: x 0, y 0 and 8 x y 0.
Depot A requires 5 units of commodity weekly and x units have been provided by
factory P. Thus, the remaining (5  x) units need to be transported to deport A
from the factory at Q. 
13
ThefeasibleregionsistheshadedareaandA,B,C,D,Earecorner-pointfeasiblesolutions(CPFsolutions).EvaluateZ=600x+400yatthesecornerpoints.Cornerpoint(x,y)Z=600x+400yA:(5,0) 3000B:(6,0) 3600C:(4,4)4000(maximum)D:(0,6) 2400E:(0,4) 1600Therefore,producing4unitsofeachitem(x=4andy=4)canproducethemaximumprotof4000.Q14.DenotexandyastheunitsofcommoditytransportedfromfactoryatPtothe
depotsatAandB,respectively.
Asallproducedcommoditiesfrombothfactorieswillbetransportedandfactoryat
Pcanproduce8units,(8xy)unitswillbetransportedfromPtoC.
Asthenumberofunitstobetransportedmustbenon-negative,wehavethefol-
lowingconstraints:x0,y0and8xy0.
DepotArequires5unitsofcommodityweeklyandxunitshavebeenprovidedby
factoryP.Thus,theremaining(5x)unitsneedtobetransportedtodeportA
fromthefactoryatQ.
13 
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5  y) units need to be transported to deport B
from the factory at Q. 
(5  x) units and (5  y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can 
produce 6 units. Thus, the remaining 6 (5 x) (5 y)) = x+ y  4 units need
to be transported to deport C from the factory at Q. 
As the number of units to be transported must be non-negative, we have another
three constraints: 5 x 0, 5 y 0 and x+ y 4 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5 x) + 120(5 y) + 150(8 x y) + 100(x+ y  4)
= 10(x 7y + 190). 
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y 
Z = 10(x 7y + 190)
subject to
x  0 units to be transported to deport A from factor P is non-negative
y  0 units to be transported to depart B froma factor P is non-negative 
x+ y  8 (8 x y)  0 units to be transported to depart C from factor at P
x  5 (5 x)  0 units to be transported to deport A from factory at Q
y  5 (5 y)  0 units to be transported to deport B from factory at Q 
x+ y 4 x+ y 4 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5  y) units need to be transported to deport B
from the factory at Q. 
(5  x) units and (5  y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can 
produce 6 units. Thus, the remaining 6 (5 x) (5 y)) = x+ y  4 units need
to be transported to deport C from the factory at Q. 
As the number of units to be transported must be non-negative, we have another
three constraints: 5 x 0, 5 y 0 and x+ y 4 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5 x) + 120(5 y) + 150(8 x y) + 100(x+ y  4)
= 10(x 7y + 190). 
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y 
Z = 10(x 7y + 190)
subject to
x  0 units to be transported to deport A from factor P is non-negative
y  0 units to be transported to depart B froma factor P is non-negative 
x+ y  8 (8 x y)  0 units to be transported to depart C from factor at P
x  5 (5 x)  0 units to be transported to deport A from factory at Q
y  5 (5 y)  0 units to be transported to deport B from factory at Q 
x+ y 4 x+ y 4 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
x  axis
-4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 
y axis
1 4
1 3
1 2
1 1
1 0
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com
x  axis
-4-3-2-11234567891011121314
y  axis
14
13
12
11
10
9
8
7
6
5
4
3
2
1
-1
-2
-3
-4
www.softschools.com 
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5  y) units need to be transported to deport B
from the factory at Q. 
(5  x) units and (5  y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can 
produce 6 units. Thus, the remaining 6 (5 x) (5 y)) = x+ y  4 units need
to be transported to deport C from the factory at Q. 
As the number of units to be transported must be non-negative, we have another
three constraints: 5 x 0, 5 y 0 and x+ y 4 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5 x) + 120(5 y) + 150(8 x y) + 100(x+ y  4)
= 10(x 7y + 190). 
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y 
Z = 10(x 7y + 190)
subject to
x  0 units to be transported to deport A from factor P is non-negative
y  0 units to be transported to depart B froma factor P is non-negative 
x+ y  8 (8 x y)  0 units to be transported to depart C from factor at P
x  5 (5 x)  0 units to be transported to deport A from factory at Q
y  5 (5 y)  0 units to be transported to deport B from factory at Q 
x+ y 4 x+ y 4 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
/
The shaded area represents the feasible region and A(0, 4), B(0, 5), C(3, 5), D(5, 3),
E(5, 0) and F(4, 0) are corner-point feasible solutions (CPF solutions). The evalua-
tion of Z at corner points are shown as below. It could be noticed that point (0, 5)
provides the minimum value 1550 of Z. 
Corner point (x, y) Z = 10(x 7y + 190)
A: (0,4) 1620 
B: (0,5) 1550 (minimum)
C: (3,5) 1580 
D: (5,3) 1740
E: (5,0) 1950
F: (4,0) 1940
Therefore, delivering 0, 5 and 3 units from the factory at P and 5, 0 and 1 units
from the factory at Q to the depots at A, B and C, respectively can minimize the
cost of transportation. The minimum cost of transportation is 1550.
Q15. Denote the decision variables x and y as the weight of Ingredient I and Ingredient
II, respectively, in the new type of food. 
As the weights are non-negative, we have the constraints x 0 and y 0.
15
/TheshadedarearepresentsthefeasibleregionandA(0,4),B(0,5),C(3,5),D(5,3),
E(5,0)andF(4,0)arecorner-pointfeasiblesolutions(CPFsolutions).Theevalua-
tionofZatcornerpointsareshownasbelow.Itcouldbenoticedthatpoint(0,5)
providestheminimumvalue1550ofZ.
Cornerpoint(x,y)Z=10(x7y+190)
A:(0,4) 1620
B:(0,5)1550(minimum)
C:(3,5) 1580
D:(5,3) 1740
E:(5,0) 1950
F:(4,0) 1940
Therefore,delivering0,5and3unitsfromthefactoryatPand5,0and1units
fromthefactoryatQtothedepotsatA,BandC,respectivelycanminimizethe
costoftransportation.Theminimumcostoftransportationis1550.
Q15.DenotethedecisionvariablesxandyastheweightofIngredientIandIngredient
II,respectively,inthenewtypeoffood.
Astheweightsarenon-negative,wehavetheconstraintsx0andy0.
15 
Deport B requires 5 units of commodity weekly and y units have been provided by
factory P. Thus, the remaining (5  y) units need to be transported to deport B
from the factory at Q. 
(5  x) units and (5  y) are transported from factory at Q to Deports A and B,
receptively. Deport C requires 4 units of commodity weekly and factory at Q can 
produce 6 units. Thus, the remaining 6 (5 x) (5 y)) = x+ y  4 units need
to be transported to deport C from the factory at Q. 
As the number of units to be transported must be non-negative, we have another
three constraints: 5 x 0, 5 y 0 and x+ y 4 0.
The total transportation cost Z in terms of x and y is defined as
Z = 160x+ 100y + 100(5 x) + 120(5 y) + 150(8 x y) + 100(x+ y  4)
= 10(x 7y + 190). 
The problem of minimising the cost of transport subject to constraints can be de-
scribed as follow:
min
x,y 
Z = 10(x 7y + 190)
subject to
x  0 units to be transported to deport A from factor P is non-negative
y  0 units to be transported to depart B froma factor P is non-negative 
x+ y  8 (8 x y)  0 units to be transported to depart C from factor at P
x  5 (5 x)  0 units to be transported to deport A from factory at Q
y  5 (5 y)  0 units to be transported to deport B from factory at Q 
x+ y 4 x+ y 4 0 units to be transported to deport C from factory at Q
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
14
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from dierent factories to dierent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are 50/kg and 70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum? 
Q16. Explain how Recursive least-squares works.
4
/
The shaded area represents the feasible region and A(0, 4), B(0, 5), C(3, 5), D(5, 3),
E(5, 0) and F(4, 0) are corner-point feasible solutions (CPF solutions). The evalua-
tion of Z at corner points are shown as below. It could be noticed that point (0, 5)
provides the minimum value 1550 of Z. 
Corner point (x, y) Z = 10(x 7y + 190)
A: (0,4) 1620 
B: (0,5) 1550 (minimum)
C: (3,5) 1580 
D: (5,3) 1740
E: (5,0) 1950
F: (4,0) 1940
Therefore, delivering 0, 5 and 3 units from the factory at P and 5, 0 and 1 units
from the factory at Q to the depots at A, B and C, respectively can minimize the
cost of transportation. The minimum cost of transportation is 1550.
Q15. Denote the decision variables x and y as the weight of Ingredient I and Ingredient
II, respectively, in the new type of food. 
As the weights are non-negative, we have the constraints x 0 and y 0.
15
/TheshadedarearepresentsthefeasibleregionandA(0,4),B(0,5),C(3,5),D(5,3),E(5,0)andF(4,0)arecorner-pointfeasiblesolutions(CPFsolutions).Theevalua-tionofZatcornerpointsareshownasbelow.Itcouldbenoticedthatpoint(0,5)providestheminimumvalue1550ofZ.Cornerpoint(x,y)Z=10(x7y+190)A:(0,4) 1620B:(0,5)1550(minimum)C:(3,5) 1580D:(5,3) 1740E:(5,0) 1950F:(4,0) 1940Therefore,delivering0,5and3unitsfromthefactoryatPand5,0and1unitsfromthefactoryatQtothedepotsatA,BandC,respectivelycanminimizethecostoftransportation.Theminimumcostoftransportationis1550.Q15.DenotethedecisionvariablesxandyastheweightofIngredientIandIngredient
II,respectively,inthenewtypeoffood.
Astheweightsarenon-negative,wehavetheconstraintsx0andy0.
15 
Considering the requirements for vitamins contained in the new type of food, we
have:
2x+ y  8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y  10 (The total weight of vitimin C from both integrident is 10 or above) 
The cost of this new type of food is the total cost of the two ingredients where the
price 50/kg is for Ingredient I and 70/kg is for Ingredient II. The cost of the new
type of food is thus defined as Z = 50x+ 70y. 
min
x,y 
Z = 50x+ 70y
subject to
2x+ y  8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y  10 (The total weight of vitimin C from both integrident is 10 or above) 
x  0 (The weight of Integredient I is non-negative)
y  0 (The weight of Integredient II is non-negative) 
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
16
Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,we
have:
2x+y8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswherethe
price50/kgisforIngredientIand70/kgisforIngredientII.Thecostofthenew
typeoffoodisthusdenedasZ=50x+70y.
min
x,y
Z=50x+70y
subjectto
2x+y8(ThetotalweightofvitiminAfrombothintegridentis8orabove)
x+2y10(ThetotalweightofvitiminCfrombothintegridentis10orabove)
x0(TheweightofIntegredientIisnon-negative)
y0(TheweightofIntegredientIIisnon-negative)
Asthecostandallconstraintsarelinear,thisproblemisalinearprogramming
problemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraph
oftheconstraintsasfollows:
16 
Considering the requirements for vitamins contained in the new type of food, we
have:
2x+ y  8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y  10 (The total weight of vitimin C from both integrident is 8 or above) 
The cost of this new type of food is the total cost of the two ingredients where the
price 50/kg is for Ingredient I and 70/kg is for Ingredient II. The cost of the new
type of food is thus defined as Z = 50x+ 70y. 
min
x,y 
Z = 50x+ 70y
subject to
2x+ y  8 (The total weight of vitimin A from both integrident is 8 or above)
x+ 2y  10 (The total weight of vitimin C from both integrident is 8 or above) 
x  0 (The weight of Integredient I is non-negative)
y  0 (The weight of Integredient II is non-negative) 
As the cost and all constraints are linear, this problem is a linear programming
problem which can be solved using linear programming technique. Draw the graph
of the constraints as follows:
16
Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,wehave:2x+y8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y10(ThetotalweightofvitiminCfrombothintegridentis8orabove)Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswheretheprice50/kgisforIngredientIand70/kgisforIngredientII.ThecostofthenewtypeoffoodisthusdenedasZ=50x+70y.minx,yZ=50x+70ysubjectto2x+y8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y10(ThetotalweightofvitiminCfrombothintegridentis8orabove)x0(TheweightofIntegredientIisnon-negative)y0(TheweightofIntegredientIIisnon-negative)Asthecostandallconstraintsarelinear,thisproblemisalinearprogramming
problemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraph
oftheconstraintsasfollows:
16 
As shown in the figure, the feasible region (faded area in deep color) is unbounded.
The corner-point feasible solutions (CPF solutions) are A(0, 8), B(2, 4) and C(10, 0)
and the evaluation of Z is shown in the table below. 
Corner point (x, y) Z = 50x+ 70y
A: (0,8) 560 
B: (2,4) 380 (minimum)
C: (10,0) 500 
The minimum value 380 is obtained at B(2, 4). However, the feasible region is
unbounded, further steps need to be taken to make sure that there is no other point 
in the open half plane resulting that the Z value is smaller than 380. Hence, we
draw the region of inequality which satisfied: 
50x+ 70y < 380,which is equivalent to5x+ 7y < 38.The common area which satisfied both the inequality requirement (5x + 7y < 38)and the constraints (x  0 and y  0) are marked in light color. We could observethat there is no overlap between the feasible region and the graph of inequality.Therefore we could say, the point (2,4) gives the minimum value Z of 380. So thebest formula of this new type of food is to have 2kg of ingredient I and 4kg ofingredient II. The minimum cost would be 380.Q16. A least-squares problem is described as minxf(x) =|| AxB ||22.Its solution is given as x =ATA1ATB.Denote the i-th row of A as aTi and the i-th row of Bi as bi.The solution x can be represented in row form as follows:x = mXi=1aiaTi1 mXi=1biai.For example,A =26664a11 a12    a1na21 a22    a2n……. . ….am1 am2    amn37775=26664aT1aT2…aTm37775,B =26664b1b2…bm37775.17Considering the requirements for vitamins contained in the new type of food, wehave:2x+ y  8 (The total weight of vitimin A from both integrident is 8 or above)x+ 2y  10 (The total weight of vitimin C from both integrident is 10 or above)The cost of this new type of food is the total cost of the two ingredients where theprice 50/kg is for Ingredient I and 70/kg is for Ingredient II. The cost of the newtype of food is thus defined as Z = 50x+ 70y.minx,yZ = 50x+ 70ysubject to2x+ y  8 (The total weight of vitimin A from both integrident is 8 or above)x+ 2y  10 (The total weight of vitimin C from both integrident is 10 or above)x  0 (The weight of Integredient I is non-negative)y  0 (The weight of Integredient II is non-negative)As the cost and all constraints are linear, this problem is a linear programmingproblem which can be solved using linear programming technique. Draw the graphof the constraints as follows:16Consideringtherequirementsforvitaminscontainedinthenewtypeoffood,wehave:2x+y8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y10(ThetotalweightofvitiminCfrombothintegridentis10orabove)Thecostofthisnewtypeoffoodisthetotalcostofthetwoingredientswheretheprice50/kgisforIngredientIand70/kgisforIngredientII.ThecostofthenewtypeoffoodisthusdenedasZ=50x+70y.minx,yZ=50x+70ysubjectto2x+y8(ThetotalweightofvitiminAfrombothintegridentis8orabove)x+2y10(ThetotalweightofvitiminCfrombothintegridentis10orabove)x0(TheweightofIntegredientIisnon-negative)y0(TheweightofIntegredientIIisnon-negative)Asthecostandallconstraintsarelinear,thisproblemisalinearprogrammingproblemwhichcanbesolvedusinglinearprogrammingtechnique.Drawthegraphoftheconstraintsasfollows:16A least-squares problem is described asminxf(x) =|| AxB ||22 .Its solution is given asx =ATA1ATB.Denote the i-th row of A as aTi and the i-th row of Bi as bi.The solution x can be represented in row form as follows:x = mXi=1aiaTi1 mXi=1biai.For example,A =26664a11 a12    a1na21 a22    a2n……. . ….am1 am2    amn37775=26664aT1aT2…aTm37775, B =26664b1b2…bm37775.
ATA
mX
i=1
aia
T
i 
2
666
4 
a11 a12    a1n
a21 a22    a2n 
. . .
am1 am2 amn
3
777
5 
,
a1
aT1
a1a
T
1 + a2a 
T
2 +   + ama 
T
m 
A =
2
666
4 
a11 a12    a1n
a21 a22    a2n 
. . .
am1 am2 amn
3
777
5
= 
2
666
4 
aT1
aT2 
aTm
3
777
5
, B = 
2
666
4 
b1
b2
bm
3
777
5
. 
h
am+1,1 am+1,2 
am+1,n
i
= aTm+1 
bm+1
The new solution can be written as
xnew =
 mX 
i=1
aia
T
i + am+1a 
T
m+1 
1 mX
i=1
biai + bm+1am+1
.
At the time that you have m samples, recall that the solution is:
x = P(m)1q(m)
where
P(m) = ATA =
mX 
i=1
aia
T
i 
and
q(m) = ATB =
mX 
i=1
biai.
With the new sample aTm+1 and and bm+1, the solution is:
xnew = P(m+ 1)
1q(m+ 1) 
where
P(m+ 1) =
mX 
i=1
aia
T
i + am+1a 
T
m+1 = P(m) + am+1a 
T
m+1 
and
q(m+ 1) =
mX 
i=1
biai + bm+1am+1.
Rank one update formula:
P+ aaT
1
= P1  
1
1 + aTP1a
P1a
P1a
T
Remark: Rank one update formula is valid when P = PT , and P and P+ aaT
are both invertible.
Rank one update formula:
P+ aaT
1
= P1  
1
1 + aTP1a
P1a
P1a
T
We have:
P(m+ 1) =
mX 
i=1
aia
T
i + am+1a 
T
m+1 = P(m) + am+1a 
T
m+1 
P(m+ 1)1
P(m) + am+1a
T
m+1 
1
. 
Q14. There are two factories located in place P and Q. The commodity produced by these
two factories will be delivered to three depots situated at A, B and C. Each week,
the depots at A, B and C requires 5, 5 and 4 units of the commodity, while the
production capacity of the factories at P and Q are 8 and 6 units correspondingly.
The commodities produced at both factories will be all transported. The cost of
transportation varies from dierent factories to dierent depots where the details of
the cost of transportation per unit commodity are in Table 2.
From/To Cost of A Cost of B Cost of C
P 160 100 150
Q 100 120 100
Table 2: Cost of transportation per unit commodity.
Find the units of commodity from factories to depots so the cost of transportation
is minimum? What will be the minimum cost of transportation?
Q15. A factory wishes to produce a new type of food which has two ingredients denoted
as Ingredient I and Ingredient II. This new type of food need to contains vitamin A
and vitamin C with minimum value 8 and 10, respectively. Ingredient I contains 2
units/kg of vitamin A and 1 units/kg of vitamin C. Ingredient II contains 1 units/kg
of vitamin A and 2 units/kg of vitamin C. The price of ingredient I and ingredient
II are 50/kg and 70/kg, respectively. Decide the formula (the weight of each
ingredient) of this new type of food so that the cost is at the minimum? 
Q16. Explain how Recursive least-squares works.
Q17. Consider a least-squares problem as Ax = B, where
A =
2
4
5 6 
5 8
4 5
3
5 ,B =
2
4
5 
2
1
3
5
Given that there are five new samples coming in the the following order,
aT1 =
3 4
, b1 = 7;
aT2 =
10 10
, b2 = 5;
aT3 =
3 8
, b3 = 8;
aT4 =
8 8
, b4 = 2;
aT5 =
7 5
, b5 = 3.
calculate the initial solution, denoted as x0, for Ax = B and update the solution
using the recursive least-squares technique with the given new samples. Show the 
working and the solution after adding the samples one by one according to the order
shown in the above.
4
/docProps/thumbnail.jpeg

![[SOLVED] CS flex algorithm AI Biologically Inspired Methods](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] Python program that plays a guess-the-number game with magic lists](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
 
 
 
Reviews
There are no reviews yet.