CSCI3230 HW3 Li Wei 1155062148
test!
1. Information Theory and Logic
a.
define $P_{front}$ and $P_{back}$ as the possibilities of front and back respectively. And we have:
$P_{front}+P_{back} = 1$
$I(V) = 0$: The coin flipping result must be front/ back, which means its possibility is 1. $P_{front} \ or \ P_{back} =1$ so that $I(V) = - 0\times log_2 0 - 1\times log_21 = 0$
$I(V) = log_2n$: The possibility of front and back are same, which is 0.5 respectively. $P_{front} =P_{back} =0.5$ so that $I(V) = - 0.5\times log_2 0.5 - 0.5\times log_20.5 = log_22=1$
b.
R1:
$Pass(x,computer) \wedge Win(x,prize)\rightarrow Happy(x)\Rightarrow $
$\neg(Pass(x,computer) \wedge Win(x,prize)) \vee Happy(x) \Rightarrow$
$\neg Pass(x,computer) \vee \neg Win(x,prize) \vee Happy(x) $
R2:
$Study(y) \vee Lucky(y) \rightarrow Pass(y,z) \Rightarrow$
$\neg(Study(y) \vee Lucky(y)) \vee Pass(y,z) \Rightarrow$
$(\neg Study(y) \wedge \neg Lucky(y)) \vee Pass(y,z) \Rightarrow$
$(\neg Study(y) \vee Pass(y,z)) \wedge(\neg Lucky(y)\vee Pass(y,z))$
R3:
$Lucky(w) \rightarrow Win(w,prize) \Rightarrow$
$\neg Lucky(w) \vee Win(w,prize) $
2. Neural Network
a.
$O = f(\sum_{j=1}^{n} (w_jI_j)+ w_o)$
b.
$h_{i,k} = f(\sum_{j = 1}^{H_{i-1}}(w_{i-1,j,k}h_{i-1,j}))$
$O_m =f(\sum_{j = 1}^{H_{K}}(w_{K,j,m}h_{K,j})) $
c.
$f’(z) = (\frac{1}{1+e^{-z}})’ = \frac{e^{-z}}{(1+e^{-z})^2} = \frac{1}{1+e^{-z}}\times \frac{e^{-z}}{1+e^{-z}}= f(z)(1-f(z))$
d.
Learning rate represents the impact of the correction applied following a training step. The bigger the learning rate, the more drastic the changes at each step. And if we set learning rate as 1, which means no learning rate, we’ll mostly likely start diverging away from the minimum. Therefore, we need a learning rate to control the changing velocity of weights and it is better to use a smaller learning rate.
e.
$\frac{\delta E}{\delta w_{K,j,k}} = \frac{\delta E}{\delta O_k}\cdot \frac{\delta O_k}{\delta w_{K,j,k}}= (O_k - T_k)\cdot O_k\cdot(1-O_k)\cdot h_{K,j}$ (because $E = 0.5\sum_{m=1}^{H_{K+1}}(O_m - T_m)^2 \ and\ O_ k=f(\sum_{j = 1}^{H_{K}}(w_{K,j,k}h_{K,j}) ) $)
considering $E=F(h_{i+2,1}+ …+ h_{i+2,H_{i-2}})$ and $h_{i+2,j} = G(h_{i+1,k})$, according to the multivariate chain rule:
$\frac{\delta E}{\delta h_{i+1,k}} = \sum_{j=1}^{H_{i+2}}(\frac{\delta E}{\delta h_{i+2,j}} \cdot \frac{\delta h_{i+2,j}}{\delta h_{i+1},k})$
$=\sum_{j=1}^{H_{i+2}}(\frac{\delta E}{\delta h_{i+2,j}}\cdot h_{i+2,k}\cdot (1-h_{i+2,k})\cdot w_{i+1,k,j}) = \sum_{j=1}^{H_{i+2}}\Delta_{i+2,j} \cdot w_{i+1,k,j}$
According to chain rule, we have
$\frac{\delta E}{\delta w_{i,j,k}} = \frac{\delta E}{\delta h_{i+1},k} \cdot \frac{\delta h_{i+1},k}{w_{i,j,k}}$
And $\frac{\delta h_{i+1,k}}{w_{i,j,k}} = f’(\sum_{j = 1}^{H_{i}}(w_{i,j,k}h_{i,j}))\cdot h_{i,j}= h_{i+1,k}\cdot (1-h_{i+1,k})\cdot h_{i,j}$
Therefore, the result is:
$\frac{\delta E}{\delta w_{i,j,k}} = (\sum_{j=1}^{H_{i+2}}\Delta_{i+2,j} \cdot w_{i+1,k,j})\cdot h_{i+1,k}\cdot (1-h_{i+1,k})\cdot h_{i,j} $
def BP(network,examples,a) return a modified network:
INPUTS:
network, a multilayer network
examples, a set of data/label pairs
a, learning rateRepeat:
for each e in examples:
$O \leftarrow Run(Network, e)$
for each neuron in network:$\Delta_{K+1,k} \leftarrow (O_k - T_k)\cdot O_k\cdot(1-O_k)\cdot h_{K,j}$
for each weight connected to corresponding neuron:
$W_{K,j,k} \leftarrow W_{K,j,k} - a\cdot \Delta_{K+1,k} \cdot h_{K,j}$
for each sub-layer except the first layer:
**for each ** neuron in sub layer:
$\Delta_{i,k}= (\sum_{j=1}^{H_{i+1}}\Delta_{i+1,j} \cdot w_{i,k,j})\cdot h_{i,k}\cdot (1-h_{i,k})\cdot $
for each weight connected to corresponding neuron in current sublayer:
$W_{i,j,k} \leftarrow W_{i,j,k} - a\cdot \Delta_{i+1,k} \cdot h_{i,j}$
Until network has converged
return network