IT小白

这个屌丝很懒，什么也没留下！

热门标签

使用numpy手写一个神经网络

作者：IT小白 | 2024-06-02 19:24:33

踩

使用numpy手写一个神经网络

本文主要包含以下内容：

推导神经网络的误差反向传播过程
使用numpy编写简单的神经网络，并使用iris数据集和california_housing数据集分别进行分类和回归任务，最终将训练过程可视化。

1. BP算法的推导过程

1.1 导入

前向传播和反向传播的总体过程。

神经网络的直接输出记为 $Z^{[l]}$ ，表示激活前的输出，激活后的输出记为 $A$ 。

第一个图像是神经网络的前向传递和反向传播的过程，第二个图像用于解释中间的变量关系，第三个图像是前向和后向过程的计算图，方便进行推导，但是第三个图左下角的 $A^{[l-2]}$ 有错误，应该是 $A^{[l-1]}$ 。

1.2 符号表

为了方便进行推导，有必要对各个符号进行介绍

符号表

记号	含义
$n_l$	第 $l$ 层神经元个数
$f_l(\cdot)$	第 $l$ 层神经元的激活函数
$\mathbf{W}^l\in\R^{n_{l-1}\times n_{l}}$	第 $l - 1$ 层到第 $l$ 层的权重矩阵
$\mathbf{b}^l \in \R^{n_l}$	第 $l - 1$ 层到第 $l$ 层的偏置
$\mathbf{Z}^l \in \R^{n_l}$	第 $l$ 层的净输出，没有经过激活的输出
$\mathbf{A}^l \in \R^{n_l}$	第 $l$ 层经过激活函数的输出， $A^0=X$

深层的神经网络都是由一个一个单层网络堆叠起来的，于是我们可以写出神经网络最基本的结构，然后进行堆叠得到深层的神经网络。

于是，我们可以开始编写代码，通过一个类Layer来描述单个神经网络层

class Layer:
    def __init__(self, input_dim, output_dim):
        # 初始化参数
        self.W = np.random.randn(input_dim, output_dim) * 0.01
        self.b = np.zeros((1, output_dim))
        
    def forward(self, X):
        # 前向计算
        self.Z = np.dot(X, self.W) + self.b
        self.A = self.activation(self.Z)
        return self.A
    
    def backward(self, dA, A_prev, activation_derivative):
        # 反向传播
        # 计算公式推导见下方
        m = A_prev.shape[0]
        self.dZ = dA * activation_derivative(self.Z)
        self.dW = np.dot(A_prev.T, self.dZ) / m
        self.db = np.sum(self.dZ, axis=0, keepdims=True) / m
        dA_prev = np.dot(self.dZ, self.W.T)
        return dA_prev
    
    def update_parameters(self, learning_rate):
        # 参数更新
        self.W -= learning_rate * self.dW
        self.b -= learning_rate * self.db
        

# 带有ReLU激活函数的Layer
class ReLULayer(Layer):
    def activation(self, Z):
        return np.maximum(0, Z)
    
    def activation_derivative(self, Z):
        return (Z > 0).astype(float)
    
# 带有Softmax激活函数（主要用于分类）的Layer
class SoftmaxLayer(Layer):
    def activation(self, Z):
        exp_z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)
    
    def activation_derivative(self, Z):
        # Softmax derivative is more complex, not directly used in this form.
        return np.ones_like(Z)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

1.3 推导过程

权重更新的核心在于计算得到self.dW和self.db，同时，为了将梯度信息不断回传，需要backward函数返回梯度信息dA_prev。

需要用到的公式
$Z^l = W^l A^{l-1} +b^l \\A^l = f(Z^l)\\\frac{dZ}{dW} = (A^{l-1})^T \\\frac{dZ}{db} = 1$
解释：

从上方计算图右侧的反向传播过程可以看到，来自于上一层的梯度信息dA经过dZ之后直接传递到db，也经过dU之后传递到dW，于是我们可以得到dW和db的梯度计算公式如下：

\begin{aligned} d W & = d A \cdot \frac{d A}{d Z} \cdot \frac{d Z}{d W} \\ = d A \cdot f^{'} (d Z) \cdot A_{p r e v}^{T} \end{aligned}

$\begin{align}dW &= dA \cdot \frac{dA}{dZ} \cdot \frac{dZ}{dW}\\ &= dA \cdot f'(dZ) \cdot A_{prev}^T \\ \end{align}$

d W = d A \cdot \frac{d A}{d Z} \cdot \frac{d Z}{d W} = d A \cdot f^{'} (d Z) \cdot A_{p re v}^{T}

其中，

f(\cdot)

是激活函数，

f'(\cdot)

是激活函数的导数，

A_{prev}^T

是当前层上一层激活输出的转置。

同理，可以得到

\begin{aligned} d b & = d A \cdot \frac{d A}{d Z} \cdot \frac{d Z}{d b} \\ = d A \cdot f^{'} (d Z) \end{aligned}

$\begin{align}db &= dA \cdot \frac{dA}{dZ} \cdot \frac{dZ}{db}\\ &= dA \cdot f'(dZ) \\ \end{align}$

d b = d A \cdot \frac{d A}{d Z} \cdot \frac{d Z}{d b} = d A \cdot f^{'} (d Z)

需要仅需往前传递的梯度信息：

\begin{aligned} d A_{p r e v} & = d A \cdot \frac{d A}{d Z} \cdot \frac{d Z}{A_{p r e v}} \\ = d A \cdot f^{'} (d Z) \cdot W^{T} \end{aligned}

所以，经过上述推导，我们可以将梯度信息从后向前传递。

分类损失函数

分类过程的损失函数最常见的就是交叉熵损失了，用来计算模型输出分布和真实值之间的差异，其公式如下：
$-\frac{1}{N}\sum_{i=1}^N \sum_{j=1}^C{y_{ij} log(\hat{y_{ij}})}$
其中， $N$ 表示样本个数， $C$ 表示类别个数， $y_{ij}$ 表示第i个样本的第j个位置的值，由于使用了独热编码，因此每一行仅有1个数字是1，其余全部是0，所以，交叉熵损失每次需要对第 $i$ 个样本不为0的位置的概率计算对数，然后将所有所有概率取平均值的负数。

交叉熵损失函数的梯度可以简洁地使用如下符号表示：
$\nabla_zL = \mathbf{\hat{y}} - \mathbf{{y}}$

回归损失函数

均方差损失函数由于良好的性能被回归问题广泛采用，其公式如下：
$\frac{1}{N} \sum_{i=1}^N(y_i - \hat{y_i})^2$
向量形式：
$\frac{1}{N} ||\mathbf{y} - \mathbf{\hat{y}}||^2_2$
梯度计算：
$\nabla_{\hat{y}}L = \frac{2}{N}(\mathbf{\hat{y}} - \mathbf{y})$

2 代码

2.1 分类代码

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt

class Layer:
    def __init__(self, input_dim, output_dim):
        self.W = np.random.randn(input_dim, output_dim) * 0.01
        self.b = np.zeros((1, output_dim))
        
    def forward(self, X):
        self.Z = np.dot(X, self.W) + self.b     # 激活前的输出
        self.A = self.activation(self.Z)        # 激活后的输出
        return self.A
    
    def backward(self, dA, A_prev, activation_derivative):
        # 注意：梯度信息是反向传递的: l+1 --> l --> l-1
        # A_prev是第l-1层的输出，也即A^{l-1}
        # dA是第l+1的层反向传递的梯度信息
        # activation_derivative是激活函数的导数
        # dA_prev是传递给第l-1层的梯度信息
        m = A_prev.shape[0]
        self.dZ = dA * activation_derivative(self.Z)
        self.dW = np.dot(A_prev.T, self.dZ) / m
        self.db = np.sum(self.dZ, axis=0, keepdims=True) / m
        dA_prev = np.dot(self.dZ, self.W.T) # 反向传递给下一层的梯度信息
        return dA_prev
    
    def update_parameters(self, learning_rate):
        self.W -= learning_rate * self.dW
        self.b -= learning_rate * self.db

class ReLULayer(Layer):
    def activation(self, Z):
        return np.maximum(0, Z)
    
    def activation_derivative(self, Z):
        return (Z > 0).astype(float)

class SoftmaxLayer(Layer):
    def activation(self, Z):
        exp_z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)
    
    def activation_derivative(self, Z):
        # Softmax derivative is more complex, not directly used in this form.
        return np.ones_like(Z)

class NeuralNetwork:
    def __init__(self, layer_dims, learning_rate=0.01):
        self.layers = []
        self.learning_rate = learning_rate
        for i in range(len(layer_dims) - 2):
            self.layers.append(ReLULayer(layer_dims[i], layer_dims[i + 1]))
        self.layers.append(SoftmaxLayer(layer_dims[-2], layer_dims[-1]))
    
    def cross_entropy_loss(self, y_true, y_pred):
        n_samples = y_true.shape[0]
        y_pred_clipped = np.clip(y_pred, 1e-12, 1 - 1e-12)
        return -np.sum(y_true * np.log(y_pred_clipped)) / n_samples
    
    def accuracy(self, y_true, y_pred):
        y_true_labels = np.argmax(y_true, axis=1)
        y_pred_labels = np.argmax(y_pred, axis=1)
        return np.mean(y_true_labels == y_pred_labels)
    
    def train(self, X, y, epochs):
        loss_history = []
        for epoch in range(epochs):
            A = X
            # Forward propagation
            cache = [A]
            for layer in self.layers:
                A = layer.forward(A)
                cache.append(A)
            
            loss = self.cross_entropy_loss(y, A)
            loss_history.append(loss)
            
            # Backward propagation
            # 损失函数求导
            dA = A - y
            for i in reversed(range(len(self.layers))):
                layer = self.layers[i]
                A_prev = cache[i]
                dA = layer.backward(dA, A_prev, layer.activation_derivative)
            
            # Update parameters
            for layer in self.layers:
                layer.update_parameters(self.learning_rate)
            
            if (epoch + 1) % 100 == 0:
                print(f'Epoch {epoch + 1}/{epochs}, Loss: {loss:.4f}')
        
        return loss_history
    
    def predict(self, X):
        A = X
        for layer in self.layers:
            A = layer.forward(A)
        return A

# 导入数据
iris = load_iris()
X = iris.data
y = iris.target.reshape(-1, 1)

# One hot encoding
encoder = OneHotEncoder(sparse_output=False)
y = encoder.fit_transform(y)

# 分割数据
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 定义并训练神经网络
layer_dims = [X_train.shape[1], 100, 20, y_train.shape[1]]  # Example with 2 hidden layers
learning_rate = 0.01
epochs = 5000

nn = NeuralNetwork(layer_dims, learning_rate)
loss_history = nn.train(X_train, y_train, epochs)

# 预测和评估
train_predictions = nn.predict(X_train)
test_predictions = nn.predict(X_test)

train_acc = nn.accuracy(y_train, train_predictions)
test_acc = nn.accuracy(y_test, test_predictions)

print(f'Training Accuracy: {train_acc:.4f}')
print(f'Test Accuracy: {test_acc:.4f}')

# 绘制损失曲线
plt.plot(loss_history)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Curve')
plt.show()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140

输出

Epoch 100/1000, Loss: 1.0983
Epoch 200/1000, Loss: 1.0980
Epoch 300/1000, Loss: 1.0975
Epoch 400/1000, Loss: 1.0960
Epoch 500/1000, Loss: 1.0891
Epoch 600/1000, Loss: 1.0119
Epoch 700/1000, Loss: 0.6284
Epoch 800/1000, Loss: 0.3711
Epoch 900/1000, Loss: 0.2117
Epoch 1000/1000, Loss: 0.1290
Training Accuracy: 0.9833
Test Accuracy: 1.0000
1
2
3
4
5
6
7
8
9
10
11
12

在这里插入图片描述
可以看到经过1000轮迭代，最终的准确率到达100%。

回归代码

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing


class Layer:
    def __init__(self, input_dim, output_dim):
        self.W = np.random.randn(input_dim, output_dim) * 0.01
        self.b = np.zeros((1, output_dim))
        
    def forward(self, X):
        self.Z = np.dot(X, self.W) + self.b
        self.A = self.activation(self.Z)
        return self.A
    
    def backward(self, dA, X, activation_derivative):
        m = X.shape[0]
        self.dZ = dA * activation_derivative(self.Z)
        self.dW = np.dot(X.T, self.dZ) / m
        self.db = np.sum(self.dZ, axis=0, keepdims=True) / m
        dA_prev = np.dot(self.dZ, self.W.T)
        return dA_prev
    
    def update_parameters(self, learning_rate):
        self.W -= learning_rate * self.dW
        self.b -= learning_rate * self.db

class ReLULayer(Layer):
    def activation(self, Z):
        return np.maximum(0, Z)
    
    def activation_derivative(self, Z):
        return (Z > 0).astype(float)

class LinearLayer(Layer):
    def activation(self, Z):
        return Z
    
    def activation_derivative(self, Z):
        return np.ones_like(Z)

class NeuralNetwork:
    def __init__(self, layer_dims, learning_rate=0.01):
        self.layers = []
        self.learning_rate = learning_rate
        for i in range(len(layer_dims) - 2):
            self.layers.append(ReLULayer(layer_dims[i], layer_dims[i + 1]))
        self.layers.append(LinearLayer(layer_dims[-2], layer_dims[-1]))
    
    def mean_squared_error(self, y_true, y_pred):
        return np.mean((y_true - y_pred) ** 2)
    
    def train(self, X, y, epochs):
        loss_history = []
        for epoch in range(epochs):
            A = X
            # Forward propagation
            cache = [A]
            for layer in self.layers:
                A = layer.forward(A)
                cache.append(A)
            
            loss = self.mean_squared_error(y, A)
            loss_history.append(loss)
            
            # Backward propagation
            # 损失函数求导
            dA = -(y - A)
            for i in reversed(range(len(self.layers))):
                layer = self.layers[i]
                A_prev = cache[i]
                dA = layer.backward(dA, A_prev, layer.activation_derivative)
            
            # Update parameters
            for layer in self.layers:
                layer.update_parameters(self.learning_rate)
            
            if (epoch + 1) % 100 == 0:
                print(f'Epoch {epoch + 1}/{epochs}, Loss: {loss:.4f}')
        
        return loss_history
    
    def predict(self, X):
        A = X
        for layer in self.layers:
            A = layer.forward(A)
        return A

housing = fetch_california_housing()

# 导入数据
X = housing.data
y = housing.target.reshape(-1, 1)

# 标准化
scaler_X = StandardScaler()
scaler_y = StandardScaler()
X = scaler_X.fit_transform(X)
y = scaler_y.fit_transform(y)

# 分割数据
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 定义并训练神经网络
layer_dims = [X_train.shape[1], 50, 5, 1]  # Example with 2 hidden layers
learning_rate = 0.8
epochs = 1000

nn = NeuralNetwork(layer_dims, learning_rate)
loss_history = nn.train(X_train, y_train, epochs)

# 预测和评估
train_predictions = nn.predict(X_train)
test_predictions = nn.predict(X_test)

train_mse = nn.mean_squared_error(y_train, train_predictions)
test_mse = nn.mean_squared_error(y_test, test_predictions)

print(f'Training MSE: {train_mse:.4f}')
print(f'Test MSE: {test_mse:.4f}')

# 绘制损失曲线
plt.plot(loss_history)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Curve')
plt.show()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130

输出

Epoch 100/1000, Loss: 1.0038
Epoch 200/1000, Loss: 0.9943
Epoch 300/1000, Loss: 0.3497
Epoch 400/1000, Loss: 0.3306
Epoch 500/1000, Loss: 0.3326
Epoch 600/1000, Loss: 0.3206
Epoch 700/1000, Loss: 0.3125
Epoch 800/1000, Loss: 0.3057
Epoch 900/1000, Loss: 0.2999
Epoch 1000/1000, Loss: 0.2958
Training MSE: 0.2992
Test MSE: 0.3071
1
2
3
4
5
6
7
8
9
10
11
12

在这里插入图片描述

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/IT小白/article/detail/664029