当前位置:   article > 正文

Python 手写 BP神经网络_bp神经网络算法已进行手写输入

bp神经网络算法已进行手写输入

误差反向传播算法

输出层

对训练例 ( x k , y k ) \left(\boldsymbol{x}_{k}, \boldsymbol{y}_{k}\right) (xk,yk), 假定神经网络的输出为 y ^ k = ( y ^ 1 k , y ^ 2 k , … , y ^ l k ) \hat{\boldsymbol{y}}_{k}=\left(\hat{y}_{1}^{k}, \hat{y}_{2}^{k}, \ldots, \hat{y}_{l}^{k}\right) y^k=(y^1k,y^2k,,y^lk), 即
y ^ j k = f ( β j − θ j ) , \hat{y}_{j}^{k}=f\left(\beta_{j}-\theta_{j}\right), y^jk=f(βjθj),
则网络在 ( x k , y k ) \left(\boldsymbol{x}_{k}, \boldsymbol{y}_{k}\right) (xk,yk) 上的均方误差为
E k = 1 2 ∑ j = 1 l ( y ^ j k − y j k ) 2 . E_{k}=\frac{1}{2} \sum_{j=1}^{l}\left(\hat{y}_{j}^{k}-y_{j}^{k}\right)^{2} . Ek=21j=1l(y^jkyjk)2.
BP 算法基于梯度下降(gradient descent)策略, 以目标的负梯度方向对参数 进行调整. 对误差 E k E_{k} Ek, 给定学习率 η \eta η, 有
Δ w h j = − η ∂ E k ∂ w h j . \Delta w_{h j}=-\eta \frac{\partial E_{k}}{\partial w_{h j}} . Δwhj=ηwhjEk.

注意到 w h j w_{h j} whj 先影响到第 j j j 个输出层神经元的输入值 β j \beta_{j} βj, 再影响到其输出值 y ^ j k \hat{y}_{j}^{k} y^jk, 然后影响到 E k E_{k} Ek,那么根据链式法则有,
∂ E k ∂ w h j = ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j ⋅ ∂ β j ∂ w h j . \frac{\partial E_{k}}{\partial w_{h j}}=\frac{\partial E_{k}}{\partial \hat{y}_{j}^{k}} \cdot \frac{\partial \hat{y}_{j}^{k}}{\partial \beta_{j}} \cdot \frac{\partial \beta_{j}}{\partial w_{h j}} . whjEk=y^jkEkβjy^jkwhjβj.
因为有 β j = ∑ h = 1 q w h j b h \beta_{j}= \sum\limits_{h=1}^{q} w_{hj}b_{h} βj=h=1qwhjbh
我们将 β j \beta_j βj抽象为斜率为 b h b_h bh的一条直线,那么自然有
∂ β j ∂ w h j = b h . \frac{\partial \beta_{j}}{\partial w_{h j}}=b_{h} . whjβj=bh.

g j = − ∂ E k ∂ y ^ j k ⋅ ∂ y ^ j k ∂ β j g_{j} =-\frac{\partial E_{k}}{\partial \hat{y}_{j}^{k}} \cdot \frac{\partial \hat{y}_{j}^{k}}{\partial \beta_{j}} gj=y^jkEkβjy^jk
= − ( y ^ j k − y j k ) f ′ ( β j − θ j ) =-\left(\hat{y}_{j}^{k}-y_{j}^{k}\right) f^{\prime}\left(\beta_{j}-\theta_{j}\right) =(y^jkyjk)f(βjθj)
= y ^ j k ( 1 − y ^ j k ) ( y j k − y ^ j k ) . =\hat{y}_{j}^{k}\left(1-\hat{y}_{j}^{k}\right)\left(y_{j}^{k}-\hat{y}_{j}^{k}\right) . =y^jk(1y^jk)(yjky^jk).

结合上式得 Δ w h j \Delta w_{h j} Δwhj
Δ w h j = η g j b h . \Delta w_{h j}=\eta g_{j} b_{h} . Δwhj=ηgjbh.
类似可得

Δ θ j = − η g j , \Delta \theta_{j} =-\eta g_{j}, Δθj=ηgj,

Δ γ h = − η e h , \Delta \gamma_{h} =-\eta e_{h}, Δγh=ηeh,

隐藏层

同理得出
e h =   − ∂ E k ∂ b h ⋅ ∂ b h ∂ α h =   − ∑ j = 1 l ∂ E k ∂ β j ⋅ ∂ β j ∂ b h f ′ ( α h − γ h ) =   f ′ ( α h − γ h ) ∑ j = 1 l w h j g j

eh= Ekbhbhαh= j=1lEkβjβjbhf(αhγh)= f(αhγh)j=1lwhjgj
eh=== bhEkαhbh j=1lβjEkbhβjf(αhγh) f(αhγh)j=1lwhjgj
Δ v i h = η e h x i , \Delta v_{i h} =\eta e_{h} x_{i}, Δvih=ηehxi,
Δ γ h = − η e h , \Delta \gamma_{h} =-\eta e_{h}, Δγh=ηeh,

代码实现

from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from sklearn import datasets


iris = datasets.load_iris()
data = iris.data
target = iris.target

class NeuralNetwork:
    def __init__(self, in_size, o_size, h_size):
        # 初始化层的数量
        self.in_size = in_size
        self.o_size = o_size
        self.h_size = h_size
        
        self.W1 = np.random.randn(in_size, h_size) # n x b的矩阵
        self.W2 = np.random.randn(h_size, o_size) # b x k的矩阵
        
    def sigmod(self, x):
        return 1 / (1 + np.exp(-x))
    
    # 映射函数,将连续值变成离散值
    def ref(self, x):
        if x <= (1 / 3):
            return 0
        elif x <= (2 / 3):
            return 1
        else:
            return 2
        
    # 设输入X为 m x n的矩阵
    def forward(self, X):
        vec_rule = np.vectorize(self.ref)
        self.z2 = np.dot(X, self.W1) # m x b
        self.act2 = self.sigmod(self.z2)
        self.z3 = np.dot(self.act2, self.W2)# m x k
        self.y_hat = self.sigmod(self.z3)
        self.y_hat = vec_rule(self.y_hat)
        
        return self.y_hat
    # 设y为 m x k 的矩阵
    def backward(self, X, y, y_hat, leraning_rate):
        # 算出输出层的梯度顶
        Grd_1 = (y - y_hat) *  self.sigmod(self.z3) * (1 - self.sigmod(self.z3)) # m x k
        # 输出层的Δ值
        Delta_W2 = np.dot(self.act2.T, Grd_1) # b x k
        # 隐藏层的梯度顶
        Grd_2 = np.dot(Grd_1, self.W2.T) * self.sigmod(self.z2) * (1 - self.sigmod(self.z2)) # m x b
        # 隐藏层的Δ值
        Delta_W1 = np.dot(X.T, Grd_2) # n x b
        
        # 更新权值
        self.W1 += leraning_rate * Delta_W1
        self.W2 += leraning_rate * Delta_W2
        
    def tarin(self, X, y, learning_rate, num_epochs):
        # 检查形状
        if(X.shape[0] != y.shape[0]):
            return -1;
        for i in range(1, num_epochs + 1):
            y_hat = self.forward(X)
            self.backward(X, y, self.y_hat, learning_rate)
        # 输出均方误差
            loss = np.mean((y - y_hat) ** 2)
            print(f"loss = {loss}, epochs/num_epochs:{i}/{num_epochs}")
    def predict(self, X):
        y_pred = self.forward(X)
        return y_pred
        

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72

注: 部分公式来自周志华的西瓜书

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/327978
推荐阅读
相关标签
  

闽ICP备14008679号