一、理论回顾
在我之前的博客里面,我介绍了神经网络算法,这里演示前向传播算法。
为了高效计算,我们构建了向量化的实现方法:
以上面的神经网络为例,我们计算假设的步骤如下:
z 1 ( 2 ) = Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 z_1^{(2)}=\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3\\ z1(2)=Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3
z 2 ( 2 ) = Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 z_2^{(2)}=\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3\\ z2(2)=Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3
z 3 ( 2 ) = Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 z_3^{(2)}=\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3 z3(2)=Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3
我们解下来计算隐藏层的激活单元:
a 1 ( 2 ) = g ( z 1 ( 2 ) ) a 2 ( 2 ) = g ( z 2 ( 2 ) ) a 3 ( 2 ) = g ( z 3 ( 2 ) ) a_1^{(2)}=g(z_1^{(2)})\\ a_2^{(2)}=g(z_2^{(2)})\\ a_3^{(2)}=g(z_3^{(2)})\\ a1(2)=g(z1(2))a2(2)=g(z2(2))a3(2)=g(z3(2))
换个形式,我们可以表示为:
a 1 ( 2 ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) a_1^{(2)}=g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3) a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)
我们最终得到的函数为:
h Θ ( x ) = a 1 ( 3 ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) h_\Theta(x)=a_1^{(3)}=g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)} hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2)
上面提到的 z z z值都是线性组合,将某个特定的神经元的输入值 x 0 , x 1 , x 2 , x 3 x_0\;,\;x_1\;,\;x_2\;,\;x_3 x0,x1,x2,x3等加权线性组合而成。
利用向量化的方法会使得计算会更加简便。以上面的神经网络为例,试着计算第二层的值:
X = [ x 0 x 1 x 2 x 3 ] , z ( 2 ) = [ z 1 ( 2 ) z 2 ( 2 ) z 3 ( 2 ) ] X= \left[ \begin{matrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{matrix} \right] , z^{(2)}=\left[ \begin{matrix} z_1^{(2)}\\ z_2^{(2)}\\ z_3^{(2)} \end{matrix} \right] X=⎣
⎡x0x1x2x3⎦
⎤,z(2)=⎣
⎡z1(2)z2(2)z3(2)⎦
⎤
z ( 2 ) = Θ ( 1 ) x z^{(2)}=\Theta^{(1)}x z(2)=Θ(1)x
a ( 2 ) = g ( z ( 2 ) ) a^{(2)}=g(z^{(2)}) a(2)=g(z(2))
Θ = [ Θ 10 ( 1 ) Θ 11 ( 1 ) Θ 12 ( 1 ) Θ 13 ( 1 ) Θ 20 ( 1 ) Θ 21 ( 1 ) Θ 22 ( 1 ) Θ 23 ( 1 ) Θ 30 ( 1 ) Θ 31 ( 1 ) Θ 32 ( 1 ) Θ 33 ( 1 ) ] \Theta=\left[ \begin{matrix} \Theta_{10}^{(1)} & \Theta_{11}^{(1)} & \Theta_{12}^{(1)} & \Theta_{13}^{(1)} \\ \Theta_{20}^{(1)} & \Theta_{21}^{(1)} & \Theta_{22}^{(1)} & \Theta_{23}^{(1)} \\ \Theta_{30}^{(1)} & \Theta_{31}^{(1)} & \Theta_{32}^{(1)} & \Theta_{33}^{(1)} \\ \end{matrix} \right] Θ=⎣
⎡Θ10(1)Θ20(1)Θ30(1)Θ11(1)Θ21(1)Θ31(1)Θ12(1)Θ22(1)Θ32(1)Θ13(1)Θ23(1)Θ33(1)⎦
⎤
所以我们接下来得到:
g ( [ Θ 10 ( 1 ) Θ 11 ( 1 ) Θ 12 ( 1 ) Θ 13 ( 1 ) Θ 20 ( 1 ) Θ 21 ( 1 ) Θ 22 ( 1 ) Θ 23 ( 1 ) Θ 30 ( 1 ) Θ 31 ( 1 ) Θ 32 ( 1 ) Θ 33 ( 1 ) ] ) = [ a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] g(\left[ \begin{matrix} \Theta_{10}^{(1)} & \Theta_{11}^{(1)} & \Theta_{12}^{(1)} & \Theta_{13}^{(1)} \\ \Theta_{20}^{(1)} & \Theta_{21}^{(1)} & \Theta_{22}^{(1)} & \Theta_{23}^{(1)} \\ \Theta_{30}^{(1)} & \Theta_{31}^{(1)} & \Theta_{32}^{(1)} & \Theta_{33}^{(1)} \\ \end{matrix} \right])=\left[\begin{matrix} a_1^{(2)}\\ a_2^{(2)}\\ a_3^{(2)} \end{matrix}\right] g(⎣
⎡Θ10(1)Θ20(1)Θ30(1)Θ11(1)Θ21(1)Θ31(1)Θ12(1)Θ22(1)Θ32(1)Θ13(1)Θ23(1)Θ33(1)⎦
⎤)=⎣
⎡a1(2)a2(2)a3(2)⎦
⎤
我们令 z ( 2 ) = Θ ( 1 ) x z^{(2)}=\Theta^{(1)}x z(2)=Θ(1)x,则 a ( 2 ) = g ( z ( 2 ) ) a^{(2)}=g(z^{(2)}) a(2)=g(z(2)),计算后添加 a 0 ( 2 ) = 1 a_0^{(2)}=1 a0(2)=1。计算输出的值为:
g ( [ Θ 10 ( 2 ) + Θ 11 ( 2 ) + Θ 12 ( 2 ) + Θ 13 ( 2 ) ] × [ a 0 ( 2 ) a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] ) g(\left[\begin{matrix} \Theta_{10}^{(2)}+\Theta_{11}^{(2)}+\Theta_{12}^{(2)}+\Theta_{13}^{(2)}\end{matrix}\right]\times \left[\begin{matrix} a_0^{(2)}\\ a_1^{(2)}\\ a_2^{(2)}\\ a_3^{(2)} \end{matrix} \right]) g([Θ10(2)+Θ11(2)+Θ12(2)+Θ13(2)]×⎣
⎡a0(2)a1(2)a2(2)a3(2)⎦
⎤)
值为:
g ( [ Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 0 ( 2 ) + Θ 12 ( 2 ) a 0 ( 2 ) + Θ 13 ( 2 ) a 0 ( 2 ) ] ) = h Θ ( x ) g([\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_0^{(2)}+\Theta_{12}^{(2)}a_0^{(2)}+\Theta_{13}^{(2)}a_0^{(2)}])=h_{\Theta}(x) g([Θ10(2)a0(2)+Θ11(2)a0(2)+Θ12(2)a0(2)+Θ13(2)a0(2)])=hΘ(x)
我们令 z ( 3 ) = Θ ( 2 ) x z^{(3)}=\Theta^{(2)}x z(3)=Θ(2)x,则 h Θ ( x ) = a ( 3 ) = g ( z ( 3 ) h_\Theta(x)=a^{(3)}=g(z^{(3)} hΘ(x)=a(3)=g(z(3)。
这只是针对训练集中一个训练实例所进行的计算。如果我们要对整个训练集进行计算,我们需要将训练集特征矩阵进行转置,使得同一个实例的特征都在同一列里。即:
z ( 2 ) = Θ ( 1 ) × X T z^{(2)}=\Theta^{(1)}\times X^T z(2)=Θ(1)×XT
a ( 2 ) = g ( z ( 2 ) a^{(2)}=g(z^{(2)} a(2)=g(z(2)
二、Numpy代码手写来实现前向传播算法
首先导入我们需要的包:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat
我们可以初步展示一下我们的数据集:
dataset=loadmat('neural_network_dataset.mat')
print('数据集展示为:\n',dataset)
结果为:
数据集展示为:
{'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct 16 13:09:09 2011', '__version__': '1.0', '__globals__': [], 'X': array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]), 'y': array([[10],
[10],
[10],
...,
[ 9],
[ 9],
[ 9]], dtype=uint8)}
我们看一下X和y的数据维度:
X=dataset['X']
y=dataset['y']
print(X.shape,y.shape)
结果为:
(5000, 400) (5000, 1)
我们导入权重数据:
weights=loadmat('neural_network_weights.mat')
theta_1=weights['Theta1']
theta_2=weights['Theta2']
print(theta_1,theta_1.shape) #(25,401)
我们定义前向传播算法(以三层神经网络为例):
class Forward_Propagate:
def __init__(self):
pass
def sigmoid(self,z):
return 1/(1+np.exp(-z))
def forward_propagate(self,X, theta_1, theta_2):
m = X.shape[0]
a1 = np.insert(X, 0, values=np.ones(m), axis=1)
z2 = a1 @ theta_1.T
a2=self.sigmoid(z2)
a2 = np.insert(self.sigmoid(z2), 0, values=np.ones(m), axis=1)
z3 = a2 @ theta_2.T
a3=self.sigmoid(z3)
h = self.sigmoid(a3)
return a1, z2, a2, z3,a3,h
我们看一下输出结果:
clf=Forward_Propagate()
a1, z2, a2, z3,a3, h=clf.forward_propagate(X,theta_1,theta_2)
print(h)
我们从下面的输出结果中不难看出,输出的是每个种类的概率大小。
[[0.50002817 0.50043532 0.50063174 ... 0.50010037 0.50162018 0.73021901]
[0.50011976 0.50060374 0.50086189 ... 0.50059777 0.50049256 0.7302117 ]
[0.50002214 0.50081067 0.50638515 ... 0.51556728 0.50137451 0.71667106]
...
[0.51293816 0.50095429 0.5074069 ... 0.50053917 0.65697147 0.50000606]
[0.50020766 0.5001555 0.50007863 ... 0.50298412 0.72540055 0.50005154]
[0.50001204 0.50011471 0.50000538 ... 0.50143358 0.66736448 0.52045301]]
我们可以进一步查看数据的维度:
(5000, 10)
我们接下来的目标是输出每一个样本中最大的概率来作为分类的组:
因为Python切片的特殊性,我们选择+1后变为每个样本的分类器。
h_max=np.argmax(h,axis=1)
y_pred=h_max+1
最后输出准确率为:
acc=np.mean(y_pred==y)
print('accuracy={}%'.format(acc*100))
accuracy=97.52%