当前位置:   article > 正文

【深度学习实战】一、Numpy手撸神经网络实现线性回归_def update(self,x,y,grad_step): y_pred=self.predic

def update(self,x,y,grad_step): y_pred=self.predict(x) grad=np.zeros_like(se

目录

一、引言

二、代码实战

1、Tensor和初始化类

2、全连接层

3、模型组网

4、SGD优化器

5、均方差损失函数

6、Dataset

三、线性回归实战

四、实验结果

五、总结


一、引言

深度学习理论相对简单,但是深度学习框架(tensorflow/torch/paddlepaddle)源码却比较复杂,对于初学者来说,将代码和理论相结合理解是一个巨大的问题,本文的目标是使用python的numpy库从零开始实现一个简单的神经网络模型,并解决一个简单的线性回归问题。本文不会过多介绍理论,直接从代码入手,如需要了解相关理论请参考下方链接。

相关资料参考:

1、全连接层前向传播和梯度计算

2、动量梯度下降

3、ReLU

二、代码实战

1、Tensor和初始化类

Tensor:包含数据和梯度信息,神经网络层的参数已Tensor保存。

初始化类(Constant/Normal):参数初始化方法。

  1. # 因为层的参数需要保存值和对应的梯度,这里定义梯度,可训练的参数全部以Tensor的类别保存
  2. import numpy as np
  3. np.random.seed(10001)
  4. class Tensor:
  5. def __init__(self, shape):
  6. self.data = np.zeros(shape=shape, dtype=np.float32) # 存放数据
  7. self.grad = np.zeros(shape=shape, dtype=np.float32) # 存放梯度
  8. def clear_grad(self):
  9. self.grad = np.zeros_like(self.grad)
  10. def __str__(self):
  11. return "Tensor shape: {}, data: {}".format(self.data.shape, self.data)
  12. # Tensor的初始化类,目前仅提供Normal初始化和Constant初始化
  13. class Initializer:
  14. """
  15. 基类
  16. """
  17. def __init__(self, shape=None, name='initializer'):
  18. self.shape = shape
  19. self.name = name
  20. def __call__(self, *args, **kwargs):
  21. raise NotImplementedError
  22. def __str__(self):
  23. return self.name
  24. class Constant(Initializer):
  25. def __init__(self, value=0., name='constant initializer', *args, **kwargs):
  26. super().__init__(name=name, *args, **kwargs)
  27. self.value = value
  28. def __call__(self, shape=None, *args, **kwargs):
  29. if shape:
  30. self.shape = shape
  31. assert shape is not None, "the shape of initializer must not be None."
  32. return self.value + np.zeros(shape=self.shape)
  33. class Normal(Initializer):
  34. def __init__(self, mean=0., std=0.01, name='normal initializer', *args, **kwargs):
  35. super().__init__(name=name, *args, **kwargs)
  36. self.mean = mean
  37. self.std = std
  38. def __call__(self, shape=None, *args, **kwargs):
  39. if shape:
  40. self.shape = shape
  41. assert shape is not None, "the shape of initializer must not be None."
  42. return np.random.normal(self.mean, self.std, size=self.shape)

2、全连接层

Layer:层的基类,主要包括前向传播和反向传播。

Linear:全连接层,继承自Layer,具体化前向传播和反向传播的参数计算过程。

  1. # 为了使层能够组建起来,实现前向传播和反向传播,首先定义层的基类Layer
  2. # Layer的几个主要方法说明:
  3. # forward: 实现前向传播
  4. # backward: 实现反向传播
  5. # parameters: 返回该层的参数,传入优化器进行优化
  6. class Layer:
  7. def __init__(self, name='layer', *args, **kwargs):
  8. self.name = name
  9. def forward(self, *args, **kwargs):
  10. raise NotImplementedError
  11. def backward(self):
  12. raise NotImplementedError
  13. def parameters(self):
  14. return []
  15. def __call__(self, *args, **kwargs):
  16. return self.forward(*args, **kwargs)
  17. def __str__(self):
  18. return self.name
  19. class Linear(Layer):
  20. """
  21. input X, shape: [N, C]
  22. output Y, shape: [N, O]
  23. weight W, shape: [C, O]
  24. bias b, shape: [1, O]
  25. grad dY, shape: [N, O]
  26. forward formula:
  27. Y = X @ W + b # @表示矩阵乘法
  28. backward formula:
  29. dW = X.T @ dY
  30. db = sum(dY, axis=0)
  31. dX = dY @ W.T
  32. """
  33. def __init__(
  34. self,
  35. in_features,
  36. out_features,
  37. name='linear',
  38. weight_attr=Normal(),
  39. bias_attr=Constant(),
  40. *args,
  41. **kwargs
  42. ):
  43. super().__init__(name=name, *args, **kwargs)
  44. self.weights = Tensor((in_features, out_features))
  45. self.weights.data = weight_attr(self.weights.data.shape)
  46. self.bias = Tensor((1, out_features))
  47. self.bias.data = bias_attr(self.bias.data.shape)
  48. self.input = None
  49. def forward(self, x):
  50. self.input = x
  51. output = np.dot(x, self.weights.data) + self.bias.data
  52. return output
  53. def backward(self, gradient):
  54. self.weights.grad += np.dot(self.input.T, gradient) # dy / dw
  55. self.bias.grad += np.sum(gradient, axis=0, keepdims=True) # dy / db
  56. input_grad = np.dot(gradient, self.weights.data.T) # dy / dx
  57. return input_grad
  58. def parameters(self):
  59. return [self.weights, self.bias]
  60. def __str__(self):
  61. string = "linear layer, weight shape: {}, bias shape: {}".format(self.weights.data.shape, self.bias.data.shape)
  62. return string
  63. class ReLU(Layer):
  64. """
  65. forward formula:
  66. relu = x if x >= 0
  67. = 0 if x < 0
  68. backwawrd formula:
  69. grad = gradient * (x > 0)
  70. """
  71. def __init__(self, name='relu', *args, **kwargs):
  72. super().__init__(name=name, *args, **kwargs)
  73. self.activated = None
  74. def forward(self, x):
  75. x[x < 0] = 0
  76. self.activated = x
  77. return self.activated
  78. def backward(self, gradient):
  79. return gradient * (self.activated > 0)

3、模型组网

Sequential:模型组网类,将神经网络的层按照顺序叠加起来,实现顺序前向传播和反向传播。

  1. # 模型组网的功能是将层串起来,实现数据的前向传播和梯度的反向传播
  2. # 添加层的时候,按照顺序添加层的参数
  3. # Sequential方法说明:
  4. # add: 向组网中添加层
  5. # forward: 按照组网构建的层顺序,依次前向传播
  6. # backward: 接收损失函数的梯度,按照层的逆序反向传播
  7. class Sequential:
  8. def __init__(self, *args, **kwargs):
  9. self.graphs = []
  10. self._parameters = []
  11. for arg_layer in args:
  12. if isinstance(arg_layer, Layer):
  13. self.graphs.append(arg_layer)
  14. self._parameters += arg_layer.parameters()
  15. def add(self, layer):
  16. assert isinstance(layer, Layer), "The type of added layer must be Layer, but got {}.".format(type(layer))
  17. self.graphs.append(layer)
  18. self._parameters += layer.parameters()
  19. def forward(self, x):
  20. for graph in self.graphs:
  21. x = graph(x)
  22. return x
  23. def backward(self, grad):
  24. # grad backward in inverse order of graph
  25. for graph in self.graphs[::-1]:
  26. grad = graph.backward(grad)
  27. def __call__(self, *args, **kwargs):
  28. return self.forward(*args, **kwargs)
  29. def __str__(self):
  30. string = 'Sequential:\n'
  31. for graph in self.graphs:
  32. string += graph.__str__() + '\n'
  33. return string
  34. def parameters(self):
  35. return self._parameters

4、SGD优化器

Optimizer:优化器基类,包括参数优化方法(step)、清空梯度(clear_grad)、计算正则化(get_decay)。

SGD:随机梯度下降类,实现了动量梯度下降方法。

  1. # 优化器主要完成根据梯度来优化参数的任务,其主要参数有学习率和正则化类型和正则化系数
  2. # Optimizer主要方法:
  3. # step: 梯度反向传播后调用,该方法根据计算出的梯度,对参数进行优化
  4. # clear_grad: 模型调用backward后,梯度会进行累加,如果已经调用step优化过参数,需要将使用过的梯度清空
  5. # get_decay: 根据不同的正则化方法,计算出正则化惩罚值
  6. class Optimizer:
  7. """
  8. optimizer base class.
  9. Args:
  10. parameters (Tensor): parameters to be optimized.
  11. learning_rate (float): learning rate. Default: 0.001.
  12. weight_decay (float): The decay weight of parameters. Defaylt: 0.0.
  13. decay_type (str): The type of regularizer. Default: l2.
  14. """
  15. def __init__(self, parameters, learning_rate=0.001, weight_decay=0.0, decay_type='l2'):
  16. assert decay_type in ['l1', 'l2'], "only support decay_type 'l1' and 'l2', but got {}.".format(decay_type)
  17. self.parameters = parameters
  18. self.learning_rate = learning_rate
  19. self.weight_decay = weight_decay
  20. self.decay_type = decay_type
  21. def step(self):
  22. raise NotImplementedError
  23. def clear_grad(self):
  24. for p in self.parameters:
  25. p.clear_grad()
  26. def get_decay(self, g):
  27. if self.decay_type == 'l1':
  28. return self.weight_decay
  29. elif self.decay_type == 'l2':
  30. return self.weight_decay * g
  31. # 基本的梯度下降法为(不带正则化):
  32. # W = W - learn_rate * dW
  33. # 带动量的梯度计算方法(减弱的梯度的随机性):
  34. # dW = (momentum * v) + (1 - momentum) * dW
  35. class SGD(Optimizer):
  36. def __init__(self, momentum=0.9, *args, **kwargs):
  37. super().__init__(*args, **kwargs)
  38. self.momentum = momentum
  39. self.velocity = []
  40. for p in self.parameters:
  41. self.velocity.append(np.zeros_like(p.grad))
  42. def step(self):
  43. for p, v in zip(self.parameters, self.velocity):
  44. decay = self.get_decay(p.grad)
  45. v = self.momentum * v + p.grad + decay # 动量计算
  46. p.data = p.data - self.learning_rate * v

5、均方差损失函数

MSE:均方差损失函数。

  1. # 损失函数的设计延续了Layer的模式,但是因为需要注意的是forward和backward部分有些不同
  2. # MSE_loss = (predict_value - label) ^ 2
  3. # MSE方法和Layer的区别:
  4. # forward:y是组网输出的值,target是目标值(这里的输入是组网的输出和目标值),前向传播的同时把dloss / dy 计算出来
  5. # backward: 没有参数,因为在forward的时候,计算出了dloss / dy,所以这里不需要输入参数
  6. class MSE(Layer):
  7. """
  8. Mean Square Error:
  9. J = 0.5 * (y - target)^2
  10. gradient formula:
  11. dJ/dy = y - target
  12. """
  13. def __init__(self, name='mse', reduction='mean', *args, **kwargs):
  14. super().__init__(name=name, *args, **kwargs)
  15. assert reduction in ['mean', 'none', 'sum'], "reduction only support 'mean', 'none' and 'sum', but got {}.".format(reduction)
  16. self.reduction = reduction
  17. self.pred = None
  18. self.target = None
  19. def forward(self, y, target):
  20. assert y.shape == target.shape, "The shape of y and target is not same, y shape = {} but target shape = {}".format(y.shape, target.shape)
  21. self.pred = y
  22. self.target = target
  23. loss = 0.5 * np.square(y - target)
  24. if self.reduction is 'mean':
  25. return loss.mean()
  26. elif self.reduction is 'none':
  27. return loss
  28. else:
  29. return loss.sum()
  30. def backward(self):
  31. gradient = self.pred - self.target
  32. return gradient

6、Dataset

tensorflow/pytorch/paddlepaddle都有Dataset类,这里也简单实现一个Dataset类。

  1. # 这里仿照PaddlePaddle,Dataset需要实现__getitem__和__len__方法
  2. class Dataset:
  3. def __init__(self, *args, **kwargs):
  4. pass
  5. def __getitem__(self, idx):
  6. raise NotImplementedError("'{}' not implement in class {}"
  7. .format('__getitem__', self.__class__.__name__))
  8. def __len__(self):
  9. raise NotImplementedError("'{}' not implement in class {}"
  10. .format('__len__', self.__class__.__name__))
  11. # 根据dataset和一些设置,生成每个batch在dataset中的索引
  12. class BatchSampler:
  13. def __init__(self, dataset=None, shuffle=False, batch_size=1, drop_last=False):
  14. self.batch_size = batch_size
  15. self.drop_last = drop_last
  16. self.shuffle = shuffle
  17. self.num_data = len(dataset)
  18. if self.drop_last or (self.num_data % batch_size == 0):
  19. self.num_samples = self.num_data // batch_size
  20. else:
  21. self.num_samples = self.num_data // batch_size + 1
  22. indices = np.arange(self.num_data)
  23. if shuffle:
  24. np.random.shuffle(indices)
  25. if drop_last:
  26. indices = indices[:self.num_samples * batch_size]
  27. self.indices = indices
  28. def __len__(self):
  29. return self.num_samples
  30. def __iter__(self):
  31. batch_indices = []
  32. for i in range(self.num_samples):
  33. if (i + 1) * self.batch_size <= self.num_data:
  34. for idx in range(i * self.batch_size, (i + 1) * self.batch_size):
  35. batch_indices.append(self.indices[idx])
  36. yield batch_indices
  37. batch_indices = []
  38. else:
  39. for idx in range(i * self.batch_size, self.num_data):
  40. batch_indices.append(self.indices[idx])
  41. if not self.drop_last and len(batch_indices) > 0:
  42. yield batch_indices
  43. # 根据sampler生成的索引,从dataset中取数据,并组合成一个batch
  44. class DataLoader:
  45. def __init__(self, dataset, sampler=BatchSampler, shuffle=False, batch_size=1, drop_last=False):
  46. self.dataset = dataset
  47. self.sampler = sampler(dataset, shuffle, batch_size, drop_last)
  48. def __len__(self):
  49. return len(self.sampler)
  50. def __call__(self):
  51. self.__iter__()
  52. def __iter__(self):
  53. for sample_indices in self.sampler:
  54. data_list = []
  55. label_list = []
  56. for indice in sample_indices:
  57. data, label = self.dataset[indice]
  58. data_list.append(data)
  59. label_list.append(label)
  60. yield np.stack(data_list, axis=0), np.stack(label_list, axis=0)

三、线性回归实战

生成一组数据,利用上面完成的类,实现线性回归。

  1. import matplotlib.pyplot as plt
  2. class LinearDataset(Dataset):
  3. def __init__(self, X, Y):
  4. self.X = X
  5. self.Y = Y
  6. def __len__(self):
  7. return len(self.X)
  8. def __getitem__(self, idx):
  9. return self.X[idx], self.Y[idx]
  10. num_data = 200 # 训练数据数量
  11. val_number = 500 # 验证数据数量
  12. epoches = 500
  13. batch_size = 4
  14. learning_rate = 0.01
  15. X = np.linspace(-np.pi, np.pi, num_data).reshape(num_data, 1)
  16. Y = np.sin(X) * 2 + (np.random.rand(*X.shape) - 0.5) * 0.1
  17. y_ = np.sin(X) * 2
  18. model = Sequential(
  19. Linear(1, 16, name='linear1'),
  20. ReLU(name='relu1'),
  21. Linear(16, 64, name='linear2'),
  22. ReLU(name='relu1'),
  23. Linear(64, 16, name='linear2'),
  24. ReLU(name='relu1'),
  25. Linear(16, 1, name='linear2'),
  26. )
  27. opt = SGD(parameters=model.parameters(), learning_rate=learning_rate, weight_decay=0.0, decay_type='l2')
  28. loss_fn = MSE()
  29. train_dataset = LinearDataset(X, Y)
  30. train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=4, drop_last=True)
  31. for epoch in range(1, epoches):
  32. for x, y in train_dataloader:
  33. pred = model(x)
  34. loss = loss_fn(pred, y)
  35. grad = loss_fn.backward()
  36. model.backward(grad)
  37. opt.step()
  38. opt.clear_grad()
  39. print("epoch: {}. loss: {}".format(epoch, loss))
  40. X_val = np.linspace(-np.pi, np.pi, val_number).reshape(val_number, 1)
  41. Y_val = np.sin(X_val) * 2
  42. val_dataset = LinearDataset(X_val, Y_val)
  43. val_dataloader = DataLoader(val_dataset, shuffle=False, batch_size=2, drop_last=False)
  44. all_pred = []
  45. for x, y in val_dataloader:
  46. pred = model(x)
  47. all_pred.append(pred)
  48. all_pred = np.vstack(all_pred)
  49. plt.scatter(X, Y, marker='x')
  50. plt.plot(X_val, all_pred, color='red')
  51. plt.show()

四、实验结果

实验数据如图所示:

拟合结果如图所示,可以看出本文实现的神经网络模型拟合效果。

五、总结

深度学习理论简单,而且成熟的框架很多,大部分深度学习框架的使用者都不需要关注框架底层的实现。本文使用python的numpy库实现了一个非常简单的神经网络模型,并且验证了该模型在线性回归问题上的效果,相信你一定能有所收获。

代码运行不起来?在线notebook体验链接:【深度学习实战】一、Numpy手撸神经网络实现线性回归

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/314359
推荐阅读
相关标签
  

闽ICP备14008679号