当前位置:   article > 正文

pytorch自学笔记

pytorch自学

        pytorch和tensorflow是机器学习的两大框架,上一篇帖子已经完整梳理了TensorFlow自学的知识点,这一篇把自学pytorch的知识点也整理下来,内容参考了网上很多帖子,不一一引用了,如有侵权请联系我。

目录

0、前言

0.1为什么选择pytorch(pytorch的优势):

0.1.1.简洁:

0.1.2.速度:

0.1.3.易用:

0.1.4.活跃的社区:

0.2 安装:

1、张量

1.1  张量的数据结构

1.1.1 张量的数据类型

1.1.2 张量的维度

1.1.3 张量的尺寸

1.1.4 张量和numpy数组

1.2  张量的结构操作

1.2.1 创建张量

1.2.2 索引切片

1.2.3 维度变换

1.2.4 合并分割

1.3  张量的数学运算

1.3.1 标量运算

1.3.2 向量运算

1.3.3 矩阵运算

1.3.4 广播机制

2、其他基础知识

2.1 自动微分机制

2.1.1 利用backward方法求导数

2.1.2 利用autograd.grad方法求导数

2.1.3 利用自动微分和优化器求最小值

2.2 动态计算图

2.2.1 动态计算图简介:

2.2.2 计算图中的Function

2.2.3 计算图与反向传播

2.2.4 叶子节点和非叶子节点

2.2.5 计算图在TensorBoard中的可视化

2.3  Pytorch的层次结构

2.3.1 低阶API示范 

2.3.2 中阶API示范

2.3.3 高阶API示范

3、pytorch的数据加载和处理 -- Dataset和DataLoader

3.1  Dataset和DataLoader概述

3.1.1 获取一个batch数据的步骤

3.1.2 Dataset和DataLoader的功能分工

3.1.3 Dataset和DataLoader的主要接口

3.2  使用Dataset创建数据集

3.3  使用DataLoader加载数据集

4、pytorch的模型搭建--torch.nn模块

4.1  nn.functional 和 nn.Module

4.1.1 nn.functional 和 nn.Module

4.1.2 使用nn.Module来管理参数

4.1.3 使用nn.Module来管理子模块

4.2  模型层layers

4.2.1 内置模型层

4.2.2 自定义模型层

4.3  损失函数losses

4.3.1 内置损失函数

4.3.2 自定义损失函数

4.3.3 自定义L1和L2正则化项

4.3.4 通过优化器实现L2正则化

5、用pytorch构建和训练模型的方法

5.1  用pytorch构建模型的三种方法

5.1.1 继承nn.Module基类构建自定义模型

5.1.2 使用nn.Sequential按层顺序构建模型

5.1.3 继承nn.Module基类构建模型并辅助应用模型容器进行封装

5.2  用pytorch训练模型的三种方法:

5.2.1 脚本风格

5.2.2 函数风格

5.2.3 类风格

6、Pytorch的建模流程范例

6.1 结构化数据建模流程范例

a)准备数据

b) 定义模型

c) 训练模型

d)评估模型

 e) 使用模型

f) 保存模型

6.2 图片数据建模流程范例

a)准备数据

b) 定义模型

c)  训练模型

d)  评估模型

e) 使用模型

f)  保存模型

6.3 文本数据建模流程范例

a) 准备数据

b)  定义模型

c)  训练模型

d)  评估模型

e)  使用模型

f)  保存模型

6.4 时间序列数据建模流程范例

a) 准备数据

b) 定义模型

c)  训练模型

d) 评估模型

e)  使用模型

f)  保存模型

7、其他

7.1 TensorBoard可视化

7.1.1 可视化模型结构

7.1.2 可视化指标变化

7.1.3 可视化参数分布

7.1.4 可视化原始图像

7.1.5 可视化人工绘图

7.2  使用GPU训练模型

7.2.1 pytorch中关于GPU的基本操作 

7.2.2 用GPU训练矩阵乘法

7.2.3 torchkeras.Model使用单GPU示例

7.2.4 torchkeras.Model使用多GPU示例

7.2.5 torchkeras.LightModel使用GPU/TPU示例

7.3  本笔记所用到的资料链接

7.3.1 data文件夹(数据集)

7.3.2 pytorchStudy文件夹(代码)


0、前言

        Pytorch是torch的python版本,是由Facebook开源的神经网络框架,专门针对 GPU 加速的深度神经网络(DNN)编程。Torch 是一个经典的对多维矩阵数据进行操作的张量(tensor )库,在机器学习和其他数学密集型应用有广泛应用。与Tensorflow的静态计算图不同,pytorch的计算图是动态的,可以根据计算需要实时改变计算图。但由于Torch语言采用 Lua,导致在国内一直很小众,并逐渐被支持 Python 的 Tensorflow 抢走用户。作为经典机器学习库 Torch 的端口,PyTorch 为 Python 语言使用者提供了舒适的写代码选择。

0.1为什么选择pytorch(pytorch的优势):

0.1.1.简洁:

     PyTorch的设计追求最少的封装,尽量避免重复造轮子。不像 TensorFlow 中充斥着session、graph、operation、name_scope、variable、tensor、layer等全新的概念,PyTorch 的设计遵循tensor→variable(autograd)→nn.Module 三个由低到高的抽象层次,分别代表高维数组(张量)、自动求导(变量)和神经网络(层/模块),而且这三个抽象之间联系紧密,可以同时进行修改和操作。 简洁的设计带来的另外一个好处就是代码易于理解。PyTorch的源码只有TensorFlow的十分之一左右,更少的抽象、更直观的设计使得PyTorch的源码十分易于阅读。

0.1.2.速度:

    PyTorch 的灵活性不以速度为代价,在许多评测中,PyTorch 的速度表现胜过 TensorFlow和Keras 等框架。框架的运行速度和程序员的编码水平有极大关系,但同样的算法,使用PyTorch实现的那个更有可能快过用其他框架实现的。

0.1.3.易用:

     PyTorch 是所有的框架中面向对象设计的最优雅的一个。PyTorch的面向对象的接口设计来源于Torch,而Torch的接口设计以灵活易用而著称,Keras作者最初就是受Torch的启发才开发了Keras。PyTorch继承了Torch的衣钵,尤其是API的设计和模块的接口都与Torch高度一致。PyTorch的设计最符合人们的思维,它让用户尽可能地专注于实现自己的想法,即所思即所得,不需要考虑太多关于框架本身的束缚。

0.1.4.活跃的社区:

    PyTorch 提供了完整的文档,循序渐进的指南,作者亲自维护的论坛 供用户交流和求教问题。Facebook 人工智能研究院对 PyTorch 提供了强力支持,作为当今排名前三的深度学习研究机构,FAIR的支持足以确保PyTorch获得持续的开发更新,不至于像许多由个人开发的框架那样昙花一现。

0.2 安装:

Pytorch的安装可以参考前文TensorFlow2自学笔记_阿尔法羊的博客-CSDN博客的准备工作部分。也要安装Anaconda,和其他一些列辅助的库,也尽量使用国内镜像来安装(国外那个网速实在太慢),不同的就是把tensorflow 换成 pytorch,这里不再详述。

1、张量

        张量是Pytorch的核心概念,pytorch的计算都是基于张量的计算。所以了解张量的基本概念和基本操作,是学习pytorch的基础内容。

1.1  张量的数据结构

Pytorch的基本数据结构是张量Tensor。张量即多维数组。Pytorch的张量和numpy中的array很类似。

张量的数据结构包括张量的数据类型、张量的维度、张量的尺寸、张量和numpy数组等基本概念。

1.1.1 张量的数据类型

张量的数据类型和numpy.array基本一一对应,但是不支持str类型。

数据类型有以下这些:

torch.float64(torch.double),

torch.float32(torch.float),

torch.float16,

torch.int64(torch.long),

torch.int32(torch.int),

torch.int16,

torch.int8,

torch.uint8,

torch.bool

一般神经网络建模使用的都是torch.float32类型。

  1. #例1-1-1 张量数据类型的操作
  2. import numpy as np
  3. import torch
  4. # 自动推断数据类型
  5. i = torch.tensor(1)
  6. print(i,i.dtype)
  7. x = torch.tensor(2.0)
  8. print(x,x.dtype)
  9. b = torch.tensor(True)
  10. print(b,b.dtype)
  11. out:
  12. tensor(1) torch.int64
  13. tensor(2.) torch.float32
  14. tensor(True) torch.bool
  15. # 指定数据类型
  16. i = torch.tensor(1,dtype = torch.int32)
  17. print(i,i.dtype)
  18. x = torch.tensor(2.0,dtype = torch.double)
  19. print(x,x.dtype)
  20. out:
  21. tensor(1, dtype=torch.int32) torch.int32
  22. tensor(2., dtype=torch.float64) torch.float64
  23. # 使用特定类型构造函数
  24. i = torch.IntTensor(1)
  25. print(i,i.dtype)
  26. x = torch.Tensor(np.array(2.0))
  27. print(x,x.dtype) #等价于torch.FloatTensor
  28. b = torch.BoolTensor(np.array([1,0,2,0]))
  29. print(b,b.dtype)
  30. out:
  31. tensor([0], dtype=torch.int32) torch.int32
  32. tensor(2.) torch.float32
  33. tensor([ True, False, True, False]) torch.bool
  34. # 不同类型进行转换
  35. i = torch.tensor(1)
  36. print(i,i.dtype)
  37. #调用 float方法转换成浮点类型
  38. x = i.float()
  39. print(x,x.dtype)
  40. #使用type函数转换成浮点类型
  41. y = i.type(torch.float)
  42. print(y,y.dtype)
  43. #使用type_as方法转换成某个Tensor相同类型
  44. z = i.type_as(x)
  45. print(z,z.dtype)
  46. out:
  47. tensor(1) torch.int64
  48. tensor(1.) torch.float32
  49. tensor(1.) torch.float32
  50. tensor(1.) torch.float32

1.1.2 张量的维度

不同类型的数据可以用不同维度(dimension)的张量来表示。

标量为0维张量,向量为1维张量,矩阵为2维张量。

彩色图像有rgb三个通道,可以表示为3维张量。

视频还有时间维,可以表示为4维张量。

可以简单地总结为:有几层中括号,就是多少维的张量。

  1. 1-1-2 张量的维度
  2. Import torch
  3. # 标量,0维张量
  4. scalar = torch.tensor(True)
  5. print(scalar)
  6. print(scalar.dim())
  7. out:
  8. tensor(True)
  9. 0
  10. #向量,1维张量
  11. vector = torch.tensor([1.0,2.0,3.0,4.0])
  12. print(vector)
  13. print(vector.dim())
  14. out:
  15. tensor([1., 2., 3., 4.])
  16. 1
  17. #矩阵, 2维张量
  18. matrix = torch.tensor([[1.0,2.0],[3.0,4.0]])
  19. print(matrix)
  20. print(matrix.dim())
  21. out:
  22. tensor([[1., 2.],
  23. [3., 4.]])
  24. 2
  25. # 3维张量
  26. tensor3 = torch.tensor([[[1.0,2.0],[3.0,4.0]],[[5.0,6.0],[7.0,8.0]]])
  27. print(tensor3)
  28. print(tensor3.dim())
  29. out:
  30. tensor([[[1., 2.],
  31. [3., 4.]],
  32. [[5., 6.],
  33. [7., 8.]]])
  34. 3
  35. # 4维张量
  36. tensor4 = torch.tensor([[[[1.0,1.0],[2.0,2.0]],[[3.0,3.0],[4.0,4.0]]],
  37. [[[5.0,5.0],[6.0,6.0]],[[7.0,7.0],[8.0,8.0]]]])
  38. print(tensor4)
  39. print(tensor4.dim())
  40. out:
  41. tensor([[[[1., 1.],
  42. [2., 2.]],
  43. [[3., 3.],
  44. [4., 4.]]],
  45. [[[5., 5.],
  46. [6., 6.]],
  47. [[7., 7.],
  48. [8., 8.]]]])
  49. 4

1.1.3 张量的尺寸

可以使用 shape属性或者 size()方法查看张量在每一维的长度.

可以使用view方法改变张量的尺寸。 

如果view方法改变尺寸失败,可以使用reshape方法

  1. #例1-1-3 张量的尺寸
  2. Import torch
  3. vector = torch.tensor([1.0,2.0,3.0,4.0])
  4. print(vector.size())
  5. print(vector.shape)
  6. out:
  7. torch.Size([4])
  8. torch.Size([4])
  9. matrix = torch.tensor([[1.0,2.0],[3.0,4.0]])
  10. print(matrix.size())
  11. out:
  12. torch.Size([2, 2])
  13. # 使用view可以改变张量尺寸
  14. vector = torch.arange(0,12)
  15. print(vector)
  16. print(vector.shape)
  17. matrix34 = vector.view(3,4)
  18. print(matrix34)
  19. print(matrix34.shape)
  20. matrix43 = vector.view(4,-1) #-1表示该位置长度由程序自动推断
  21. print(matrix43)
  22. print(matrix43.shape)
  23. out:
  24. tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
  25. torch.Size([12])
  26. tensor([[ 0, 1, 2, 3],
  27. [ 4, 5, 6, 7],
  28. [ 8, 9, 10, 11]])
  29. torch.Size([3, 4])
  30. tensor([[ 0, 1, 2],
  31. [ 3, 4, 5],
  32. [ 6, 7, 8],
  33. [ 9, 10, 11]])
  34. torch.Size([4, 3])
  35. # 有些操作会让张量存储结构扭曲,直接使用view会失败,可以用reshape方法
  36. matrix26 = torch.arange(0,12).view(2,6)
  37. print(matrix26)
  38. print(matrix26.shape)
  39. # 转置操作让张量存储结构扭曲
  40. matrix62 = matrix26.t()
  41. print(matrix62.is_contiguous())
  42. # 直接使用view方法会失败,可以使用reshape方法
  43. #matrix34 = matrix62.view(3,4) #error!
  44. matrix34 = matrix62.reshape(3,4) #等价于matrix34 =
  45. matrix62.contiguous().view(3,4)
  46. print(matrix34)
  47. out:
  48. tensor([[ 0, 1, 2, 3, 4, 5],
  49. [ 6, 7, 8, 9, 10, 11]])
  50. torch.Size([2, 6])
  51. False
  52. tensor([[ 0, 6, 1, 7],
  53. [ 2, 8, 3, 9],
  54. [ 4, 10, 5, 11]])

1.1.4 张量和numpy数组

可以用numpy方法从Tensor得到numpy数组,也可以用torch.from_numpy从numpy数组得到 Tensor。

这两种方法关联的Tensor和numpy数组是共享数据内存的。

如果改变其中一个,另外一个的值也会发生改变。

如果有需要,可以用张量的clone方法拷贝张量,中断这种关联。

此外,还可以使用item方法从标量张量得到对应的Python数值。

使用tolist方法从张量得到对应的Python数值列表。

  1. #例1-1-4 张量和numpy数组
  2. import numpy as np
  3. import torch
  4. #torch.from_numpy函数从numpy数组得到Tensor
  5. arr = np.zeros(3)
  6. tensor = torch.from_numpy(arr)
  7. print("before add 1:")
  8. print(arr)
  9. print(tensor)
  10. print("\nafter add 1:")
  11. np.add(arr,1, out = arr) #给 arr增加1,tensor也随之改变
  12. print(arr)
  13. print(tensor)
  14. out:
  15. before add 1:
  16. [0. 0. 0.]
  17. tensor([0., 0., 0.], dtype=torch.float64)
  18. after add 1:
  19. [1. 1. 1.]
  20. tensor([1., 1., 1.], dtype=torch.float64)
  21. # numpy方法从Tensor得到numpy数组
  22. tensor = torch.zeros(3)
  23. arr = tensor.numpy()
  24. print("before add 1:")
  25. print(tensor)
  26. print(arr)
  27. print("\nafter add 1:")
  28. #使用带下划线的方法表示计算结果会返回给调用 张量
  29. tensor.add_(1) #给 tensor增加1,arr也随之改变
  30. #或: torch.add(tensor,1,out = tensor)
  31. print(tensor)
  32. print(arr)
  33. out:
  34. before add 1:
  35. tensor([0., 0., 0.])
  36. [0. 0. 0.]
  37. after add 1:
  38. tensor([1., 1., 1.])
  39. [1. 1. 1.]
  40. # 可以用clone() 方法拷贝张量,中断这种关联
  41. tensor = torch.zeros(3)
  42. #使用clone方法拷贝张量, 拷贝后的张量和原始张量内存独立
  43. arr = tensor.clone().numpy() # 也可以使用tensor.data.numpy()
  44. print("before add 1:")
  45. print(tensor)
  46. print(arr)
  47. print("\nafter add 1:")
  48. #使用 带下划线的方法表示计算结果会返回给调用 张量
  49. tensor.add_(1) #给 tensor增加1,arr不再随之改变
  50. print(tensor)
  51. print(arr)
  52. out:
  53. before add 1:
  54. tensor([0., 0., 0.])
  55. [0. 0. 0.]
  56. after add 1:
  57. tensor([1., 1., 1.])
  58. [0. 0. 0.]
  59. # item方法和tolist方法可以将张量转换成Python数值和数值列表
  60. scalar = torch.tensor(1.0)
  61. s = scalar.item()
  62. print(s)
  63. print(type(s))
  64. tensor = torch.rand(2,2)
  65. t = tensor.tolist()
  66. print(t)
  67. print(type(t))
  68. out:
  69. 1.0
  70. <class 'float'>
  71. [[0.5407917499542236, 0.08548498153686523], [0.8822196125984192, 0.5270139575004578]]
  72. <class 'list'>

1.2  张量的结构操作

1.2.1 创建张量

  1. #例1-2-1 创建张量
  2. import numpy as np
  3. import torch
  4. a = torch.tensor([1,2,3],dtype = torch.float)
  5. print('a:',a)
  6. b = torch.arange(1,10,step = 2)
  7. print('b:',b)
  8. c = torch.linspace(0.0,2*3.14,10)
  9. print('c:',c)
  10. d = torch.zeros((3,3))
  11. print('d:',d)
  12. e = torch.ones((3,3),dtype = torch.int)
  13. f = torch.zeros_like(e,dtype = torch.float)
  14. print('e:',e)
  15. print('f:',f)
  16. torch.fill_(f,5)
  17. print('f:',f)
  18. #均匀随机分布
  19. torch.manual_seed(0)
  20. minval,maxval = 0,10
  21. g = minval + (maxval-minval)*torch.rand([5])
  22. print('g:',g)
  23. #正态分布随机
  24. h = torch.normal(mean = torch.zeros(3,3), std = torch.ones(3,3))
  25. print('h:',h)
  26. #正态分布随机
  27. mean,std = 2,5
  28. i = std*torch.randn((3,3))+mean
  29. print('i:',i)
  30. #整数随机排列
  31. j = torch.randperm(20)
  32. print('j:',j)
  33. #特殊矩阵
  34. k = torch.eye(3,3) #单位矩阵
  35. print('k:',k)
  36. l = torch.diag(torch.tensor([1,2,3])) #对角矩阵
  37. print('l:',l)

运行结果:

  1. a: tensor([1., 2., 3.])
  2. b: tensor([1, 3, 5, 7, 9])
  3. c: tensor([0.0000, 0.6978, 1.3956, 2.0933, 2.7911, 3.4889, 4.1867, 4.8844, 5.5822,
  4. 6.2800])
  5. d: tensor([[0., 0., 0.],
  6. [0., 0., 0.],
  7. [0., 0., 0.]])
  8. e: tensor([[1, 1, 1],
  9. [1, 1, 1],
  10. [1, 1, 1]], dtype=torch.int32)
  11. f: tensor([[0., 0., 0.],
  12. [0., 0., 0.],
  13. [0., 0., 0.]])
  14. f: tensor([[5., 5., 5.],
  15. [5., 5., 5.],
  16. [5., 5., 5.]])
  17. g: tensor([4.9626, 7.6822, 0.8848, 1.3203, 3.0742])
  18. h: tensor([[ 0.5507, 0.2704, 0.6472],
  19. [ 0.2490, -0.3354, 0.4564],
  20. [-0.6255, 0.4539, -1.3740]])
  21. i: tensor([[16.2371, -1.6612, 3.9163],
  22. [ 7.4999, 1.5616, 4.0768],
  23. [ 5.2128, -8.9407, 6.4601]])
  24. j: tensor([ 3, 17, 9, 19, 1, 18, 4, 13, 15, 12, 0, 16, 7, 11, 2, 5, 8, 10,
  25. 6, 14])
  26. k: tensor([[1., 0., 0.],
  27. [0., 1., 0.],
  28. [0., 0., 1.]])
  29. l: tensor([[1, 0, 0],
  30. [0, 2, 0],
  31. [0, 0, 3]])

1.2.2 索引切片

张量的索引切片方式和numpy几乎是一样的。

切片时支持缺省参数和省略号。

可以通过索引和切片对部分元素进行修改。

此外,对于不规则的切片提取,可以使用torch.index_select, torch.masked_select, torch.take

如果要通过修改张量的某些元素得到新的张量,可以使用 torch.where,torch.masked_fill,torch.index_fill

  1. #例1-2-2 索引切片-a
  2. #均匀随机分布
  3. torch.manual_seed(0)
  4. minval,maxval = 0,10
  5. t = torch.floor(minval + (maxval-minval)*torch.rand([5,5])).int()
  6. print(t)
  7. #第0行
  8. print(t[0])
  9. #倒数第一行
  10. print(t[-1])
  11. #第1行第3列
  12. print(t[1,3])
  13. print(t[1][3])
  14. #第1行至第3行
  15. print(t[1:4,:])
  16. #第1行至最后一行,第0列到最后一列每隔两列取一列
  17. print(t[1:4,:4:2])
  18. #可以使用索引和切片修改部分元素
  19. x = torch.tensor([[1,2],[3,4]],dtype = torch.float32,requires_grad=True)
  20. x.data[1,:] = torch.tensor([0.0,0.0])
  21. print(x)
  22. a = torch.arange(27).view(3,3,3)
  23. print(a)
  24. #省略号可以表示多个冒号
  25. print(a[...,1])

运行结果

  1. tensor([[4, 7, 0, 1, 3],
  2. [6, 4, 8, 4, 6],
  3. [3, 4, 0, 1, 2],
  4. [5, 6, 8, 1, 2],
  5. [6, 9, 3, 8, 4]], dtype=torch.int32)
  6. tensor([4, 7, 0, 1, 3], dtype=torch.int32)
  7. tensor([6, 9, 3, 8, 4], dtype=torch.int32)
  8. tensor(4, dtype=torch.int32)
  9. tensor(4, dtype=torch.int32)
  10. tensor([[6, 4, 8, 4, 6],
  11. [3, 4, 0, 1, 2],
  12. [5, 6, 8, 1, 2]], dtype=torch.int32)
  13. tensor([[6, 8],
  14. [3, 0],
  15. [5, 8]], dtype=torch.int32)
  16. tensor([[1., 2.],
  17. [0., 0.]], requires_grad=True)
  18. tensor([[[ 0, 1, 2],
  19. [ 3, 4, 5],
  20. [ 6, 7, 8]],
  21. [[ 9, 10, 11],
  22. [12, 13, 14],
  23. [15, 16, 17]],
  24. [[18, 19, 20],
  25. [21, 22, 23],
  26. [24, 25, 26]]])
  27. tensor([[ 1, 4, 7],
  28. [10, 13, 16],
  29. [19, 22, 25]])

对于不规则的切片提取,可以使用torch.index_select, torch.take, torch.gather, torch.masked_select.

假设有个班级成绩册的例子,有4个班级,每个班级10个学生,每个学生7门科目成绩。可以用一个 4×10×7的张量来表示。

  1. #例1-2-2 索引切片-b
  2. minval=0
  3. maxval=100
  4. scores = torch.floor(minval + (maxval-minval)*torch.rand([4,10,7])).int()
  5. print(scores)
  6. #打印如下:
  7. tensor([[[49, 39, 71, 15, 19, 69, 89],
  8. [57, 99, 45, 14, 8, 42, 15],
  9. [49, 10, 5, 39, 48, 53, 45],
  10. [54, 25, 71, 3, 87, 71, 19],
  11. [50, 63, 50, 13, 64, 74, 37],
  12. [44, 71, 61, 10, 23, 15, 59],
  13. [44, 93, 48, 26, 16, 50, 59],
  14. [39, 41, 6, 3, 37, 68, 3],
  15. [47, 26, 46, 5, 28, 74, 17],
  16. [62, 11, 16, 11, 18, 2, 72]],
  17. [[85, 75, 23, 77, 30, 20, 79],
  18. [98, 88, 88, 92, 4, 10, 24],
  19. [66, 15, 89, 36, 51, 2, 69],
  20. [27, 39, 69, 78, 79, 70, 89],
  21. [92, 29, 6, 99, 45, 82, 71],
  22. [26, 89, 10, 36, 14, 92, 39],
  23. [15, 36, 90, 92, 41, 94, 0],
  24. [33, 12, 37, 65, 32, 79, 60],
  25. [76, 4, 50, 67, 31, 99, 68],
  26. [70, 10, 30, 64, 81, 12, 7]],
  27. [[29, 21, 59, 86, 30, 83, 79],
  28. [30, 71, 53, 89, 37, 71, 83],
  29. [17, 96, 66, 9, 24, 48, 84],
  30. [92, 47, 0, 2, 97, 56, 41],
  31. [14, 2, 59, 8, 96, 12, 35],
  32. [83, 91, 13, 63, 94, 16, 4],
  33. [55, 42, 79, 58, 85, 27, 74],
  34. [18, 47, 17, 50, 67, 8, 87],
  35. [43, 94, 6, 70, 7, 30, 39],
  36. [45, 80, 40, 85, 59, 99, 31]],
  37. [[59, 71, 93, 64, 30, 80, 60],
  38. [10, 10, 98, 38, 31, 68, 67],
  39. [ 0, 64, 87, 75, 39, 72, 44],
  40. [78, 66, 78, 2, 54, 39, 98],
  41. [44, 30, 1, 39, 13, 32, 81],
  42. [47, 70, 92, 0, 20, 75, 49],
  43. [66, 49, 13, 92, 16, 90, 34],
  44. [27, 49, 2, 70, 87, 80, 32],
  45. [ 2, 80, 97, 84, 86, 17, 14],
  46. [68, 13, 78, 28, 51, 85, 35]]], dtype=torch.int32)
  47. #抽取每个班级第0个学生,第5个学生,第9个学生的全部成绩
  48. cj=torch.index_select(scores,dim = 1,index = torch.tensor([0,5,9]))
  49. print(cj)
  50. #打印如下:
  51. tensor([[[49, 39, 71, 15, 19, 69, 89],
  52. [44, 71, 61, 10, 23, 15, 59],
  53. [62, 11, 16, 11, 18, 2, 72]],
  54. [[85, 75, 23, 77, 30, 20, 79],
  55. [26, 89, 10, 36, 14, 92, 39],
  56. [70, 10, 30, 64, 81, 12, 7]],
  57. [[29, 21, 59, 86, 30, 83, 79],
  58. [83, 91, 13, 63, 94, 16, 4],
  59. [45, 80, 40, 85, 59, 99, 31]],
  60. [[59, 71, 93, 64, 30, 80, 60],
  61. [47, 70, 92, 0, 20, 75, 49],
  62. [68, 13, 78, 28, 51, 85, 35]]], dtype=torch.int32)
  63. #抽取每个班级第0个学生,第5个学生,第9个学生的第1门课程,第3门课程,第6门课程成绩
  64. cj2 = torch.index_select(torch.index_select(scores,dim = 1,index =
  65. torch.tensor([0,5,9]))
  66. ,dim=2,index = torch.tensor([1,3,6]))
  67. print(cj2)
  68. #打印如下:
  69. tensor([[[39, 15, 89],
  70. [71, 10, 59],
  71. [11, 11, 72]],
  72. [[75, 77, 79],
  73. [89, 36, 39],
  74. [10, 64, 7]],
  75. [[21, 86, 79],
  76. [91, 63, 4],
  77. [80, 85, 31]],
  78. [[71, 64, 60],
  79. [70, 0, 49],
  80. [13, 28, 35]]], dtype=torch.int32)
  81. #抽取第0个班级第0个学生的第0门课程,第2个班级的第4个学生的第1门课程,第3个班级的第9个学生第6
  82. 门课程成绩
  83. #take将输入看成一维数组,输出和index同形状
  84. cj3 = torch.take(scores,torch.tensor([0*10*7+0,2*10*7+4*7+1,3*10*7+9*7+6]))
  85. print(cj3)
  86. #打印如下:
  87. tensor([49, 2, 35], dtype=torch.int32)
  88. #抽取分数大于等于80分的分数(布尔索引)
  89. #结果是1维张量
  90. cj4 = torch.masked_select(scores,scores>=80)
  91. print(cj4)
  92. #打印如下:
  93. tensor([89, 99, 87, 93, 85, 98, 88, 88, 92, 89, 89, 92, 99, 82, 89, 92, 90, 92,
  94. 94, 99, 81, 86, 83, 89, 83, 96, 84, 92, 97, 96, 83, 91, 94, 85, 87, 94,
  95. 80, 85, 99, 93, 80, 98, 87, 98, 81, 92, 92, 90, 87, 80, 80, 97, 84, 86,
  96. 85], dtype=torch.int32)
  97. #如果分数大于60分,赋值成1,否则赋值成0
  98. ifpass = torch.where(scores>60,torch.tensor(1),torch.tensor(0))
  99. print(ifpass)
  100. #打印如下:
  101. tensor([[[0, 0, 1, 0, 0, 1, 1],
  102. [0, 1, 0, 0, 0, 0, 0],
  103. [0, 0, 0, 0, 0, 0, 0],
  104. [0, 0, 1, 0, 1, 1, 0],
  105. [0, 1, 0, 0, 1, 1, 0],
  106. [0, 1, 1, 0, 0, 0, 0],
  107. [0, 1, 0, 0, 0, 0, 0],
  108. [0, 0, 0, 0, 0, 1, 0],
  109. [0, 0, 0, 0, 0, 1, 0],
  110. [1, 0, 0, 0, 0, 0, 1]],
  111. [[1, 1, 0, 1, 0, 0, 1],
  112. [1, 1, 1, 1, 0, 0, 0],
  113. [1, 0, 1, 0, 0, 0, 1],
  114. [0, 0, 1, 1, 1, 1, 1],
  115. [1, 0, 0, 1, 0, 1, 1],
  116. [0, 1, 0, 0, 0, 1, 0],
  117. [0, 0, 1, 1, 0, 1, 0],
  118. [0, 0, 0, 1, 0, 1, 0],
  119. [1, 0, 0, 1, 0, 1, 1],
  120. [1, 0, 0, 1, 1, 0, 0]],
  121. [[0, 0, 0, 1, 0, 1, 1],
  122. [0, 1, 0, 1, 0, 1, 1],
  123. [0, 1, 1, 0, 0, 0, 1],
  124. [1, 0, 0, 0, 1, 0, 0],
  125. [0, 0, 0, 0, 1, 0, 0],
  126. [1, 1, 0, 1, 1, 0, 0],
  127. [0, 0, 1, 0, 1, 0, 1],
  128. [0, 0, 0, 0, 1, 0, 1],
  129. [0, 1, 0, 1, 0, 0, 0],
  130. [0, 1, 0, 1, 0, 1, 0]],
  131. [[0, 1, 1, 1, 0, 1, 0],
  132. [0, 0, 1, 0, 0, 1, 1],
  133. [0, 1, 1, 1, 0, 1, 0],
  134. [1, 1, 1, 0, 0, 0, 1],
  135. [0, 0, 0, 0, 0, 0, 1],
  136. [0, 1, 1, 0, 0, 1, 0],
  137. [1, 0, 0, 1, 0, 1, 0],
  138. [0, 0, 0, 1, 1, 1, 0],
  139. [0, 1, 1, 1, 1, 0, 0],
  140. [1, 0, 1, 0, 0, 1, 0]]])
  141. #将每个班级第0个学生,第5个学生,第9个学生的全部成绩赋值成满分
  142. torch.index_fill(scores,dim = 1,index = torch.tensor([0,5,9]),value = 100)
  143. #等价于 scores.index_fill(dim = 1,index = torch.tensor([0,5,9]),value = 100)
  144. #打印如下:
  145. tensor([[[100, 100, 100, 100, 100, 100, 100],
  146. [ 57, 99, 45, 14, 8, 42, 15],
  147. [ 49, 10, 5, 39, 48, 53, 45],
  148. [ 54, 25, 71, 3, 87, 71, 19],
  149. [ 50, 63, 50, 13, 64, 74, 37],
  150. [100, 100, 100, 100, 100, 100, 100],
  151. [ 44, 93, 48, 26, 16, 50, 59],
  152. [ 39, 41, 6, 3, 37, 68, 3],
  153. [ 47, 26, 46, 5, 28, 74, 17],
  154. [100, 100, 100, 100, 100, 100, 100]],
  155. [[100, 100, 100, 100, 100, 100, 100],
  156. [ 98, 88, 88, 92, 4, 10, 24],
  157. [ 66, 15, 89, 36, 51, 2, 69],
  158. [ 27, 39, 69, 78, 79, 70, 89],
  159. [ 92, 29, 6, 99, 45, 82, 71],
  160. [100, 100, 100, 100, 100, 100, 100],
  161. [ 15, 36, 90, 92, 41, 94, 0],
  162. [ 33, 12, 37, 65, 32, 79, 60],
  163. [ 76, 4, 50, 67, 31, 99, 68],
  164. [100, 100, 100, 100, 100, 100, 100]],
  165. [[100, 100, 100, 100, 100, 100, 100],
  166. [ 30, 71, 53, 89, 37, 71, 83],
  167. [ 17, 96, 66, 9, 24, 48, 84],
  168. [ 92, 47, 0, 2, 97, 56, 41],
  169. [ 14, 2, 59, 8, 96, 12, 35],
  170. [100, 100, 100, 100, 100, 100, 100],
  171. [ 55, 42, 79, 58, 85, 27, 74],
  172. [ 18, 47, 17, 50, 67, 8, 87],
  173. [ 43, 94, 6, 70, 7, 30, 39],
  174. [100, 100, 100, 100, 100, 100, 100]],
  175. [[100, 100, 100, 100, 100, 100, 100],
  176. [ 10, 10, 98, 38, 31, 68, 67],
  177. [ 0, 64, 87, 75, 39, 72, 44],
  178. [ 78, 66, 78, 2, 54, 39, 98],
  179. [ 44, 30, 1, 39, 13, 32, 81],
  180. [100, 100, 100, 100, 100, 100, 100],
  181. [ 66, 49, 13, 92, 16, 90, 34],
  182. [ 27, 49, 2, 70, 87, 80, 32],
  183. [ 2, 80, 97, 84, 86, 17, 14],
  184. [100, 100, 100, 100, 100, 100, 100]]], dtype=torch.int32)
  185. #将分数小于60分的分数赋值成60分
  186. cj5 = torch.masked_fill(scores,scores<60,60)
  187. #等价于b = scores.masked_fill(scores<60,60)
  188. print(cj5)
  189. #打印如下:
  190. tensor([[[60, 60, 71, 60, 60, 69, 89],
  191. [60, 99, 60, 60, 60, 60, 60],
  192. [60, 60, 60, 60, 60, 60, 60],
  193. [60, 60, 71, 60, 87, 71, 60],
  194. [60, 63, 60, 60, 64, 74, 60],
  195. [60, 71, 61, 60, 60, 60, 60],
  196. [60, 93, 60, 60, 60, 60, 60],
  197. [60, 60, 60, 60, 60, 68, 60],
  198. [60, 60, 60, 60, 60, 74, 60],
  199. [62, 60, 60, 60, 60, 60, 72]],
  200. [[85, 75, 60, 77, 60, 60, 79],
  201. [98, 88, 88, 92, 60, 60, 60],
  202. [66, 60, 89, 60, 60, 60, 69],
  203. [60, 60, 69, 78, 79, 70, 89],
  204. [92, 60, 60, 99, 60, 82, 71],
  205. [60, 89, 60, 60, 60, 92, 60],
  206. [60, 60, 90, 92, 60, 94, 60],
  207. [60, 60, 60, 65, 60, 79, 60],
  208. [76, 60, 60, 67, 60, 99, 68],
  209. [70, 60, 60, 64, 81, 60, 60]],
  210. [[60, 60, 60, 86, 60, 83, 79],
  211. [60, 71, 60, 89, 60, 71, 83],
  212. [60, 96, 66, 60, 60, 60, 84],
  213. [92, 60, 60, 60, 97, 60, 60],
  214. [60, 60, 60, 60, 96, 60, 60],
  215. [83, 91, 60, 63, 94, 60, 60],
  216. [60, 60, 79, 60, 85, 60, 74],
  217. [60, 60, 60, 60, 67, 60, 87],
  218. [60, 94, 60, 70, 60, 60, 60],
  219. [60, 80, 60, 85, 60, 99, 60]],
  220. [[60, 71, 93, 64, 60, 80, 60],
  221. [60, 60, 98, 60, 60, 68, 67],
  222. [60, 64, 87, 75, 60, 72, 60],
  223. [78, 66, 78, 60, 60, 60, 98],
  224. [60, 60, 60, 60, 60, 60, 81],
  225. [60, 70, 92, 60, 60, 75, 60],
  226. [66, 60, 60, 92, 60, 90, 60],
  227. [60, 60, 60, 70, 87, 80, 60],
  228. [60, 80, 97, 84, 86, 60, 60],
  229. [68, 60, 78, 60, 60, 85, 60]]], dtype=torch.int32)

1.2.3 维度变换

维度变换相关函数主要有 torch.reshape(或者调用张量的view方法), torch.squeeze, torch.unsqueeze, torch.transpose

torch.reshape 可以改变张量的形状。

torch.squeeze 可以减少维度。

torch.unsqueeze 可以增加维度。

torch.transpose 可以交换维度。

  1. #例1-2-3 维度变换
  2. import torch
  3. # 张量的view方法有时候会调用失败,可以使用reshape方法。
  4. torch.manual_seed(0)
  5. minval,maxval = 0,255
  6. a = (minval + (maxval-minval)*torch.rand([1,3,3,2])).int()
  7. print(a.shape)
  8. print(a)
  9. out:
  10. torch.Size([1, 3, 3, 2])
  11. tensor([[[[126, 195],
  12. [ 22, 33],
  13. [ 78, 161]],
  14. [[124, 228],
  15. [116, 161],
  16. [ 88, 102]],
  17. [[ 5, 43],
  18. [ 74, 132],
  19. [177, 204]]]], dtype=torch.int32)
  20. # 改成 (3,6)形状的张量
  21. b = a.view([3,6]) #torch.reshape(a,[3,6])
  22. print(b.shape)
  23. print(b)
  24. out:
  25. torch.Size([3, 6])
  26. tensor([[126, 195, 22, 33, 78, 161],
  27. [124, 228, 116, 161, 88, 102],
  28. [ 5, 43, 74, 132, 177, 204]], dtype=torch.int32)
  29. # 改回成 [1,3,3,2] 形状的张量
  30. c = torch.reshape(b,[1,3,3,2]) # b.view([1,3,3,2])
  31. print(c)
  32. out:
  33. tensor([[[[126, 195],
  34. [ 22, 33],
  35. [ 78, 161]],
  36. [[124, 228],
  37. [116, 161],
  38. [ 88, 102]],
  39. [[ 5, 43],
  40. [ 74, 132],
  41. [177, 204]]]], dtype=torch.int32)
  42. #如果张量在某个维度上只有一个元素,利用torch.squeeze可以消除这个维度。
  43. #torch.unsqueeze的作用和torch.squeeze的作用相反。
  44. d = torch.tensor([[1.0,2.0]])
  45. e = torch.squeeze(d)
  46. print(d)
  47. print(e)
  48. print(d.shape)
  49. print(e.shape)
  50. out:
  51. tensor([[1., 2.]])
  52. tensor([1., 2.])
  53. torch.Size([1, 2])
  54. torch.Size([2])
  55. #在第0维插入长度为1的一个维度
  56. f = torch.unsqueeze(e,axis=0)
  57. print(e)
  58. print(f)
  59. print(e.shape)
  60. print(f.shape)
  61. out:
  62. tensor([1., 2.])
  63. tensor([[1., 2.]])
  64. torch.Size([2])
  65. torch.Size([1, 2])
  66. #torch.transpose可以交换张量的维度,torch.transpose常用于图片存储格式的变换上。
  67. #如果是二维的矩阵,通常会调用矩阵的转置方法 matrix.t(),等价于 torch.transpose(matrix,0,1)。
  68. minval=0
  69. maxval=255
  70. # Batch,Height,Width,Channel
  71. data = torch.floor(minval + (maxval-minval)*torch.rand([100,256,256,4])).int()
  72. print(data.shape)
  73. # 转换成 Pytorch默认的图片格式 Batch,Channel,Height,Width
  74. # 需要交换两次
  75. data_t = torch.transpose(torch.transpose(data,1,2),1,3)
  76. print(data_t.shape)
  77. out:
  78. torch.Size([100, 256, 256, 4])
  79. torch.Size([100, 4, 256, 256])
  80. matrix = torch.tensor([[1,2,3],[4,5,6]])
  81. print(matrix)
  82. print(matrix.t()) #等价于torch.transpose(matrix,0,1)
  83. out:
  84. tensor([[1, 2, 3],
  85. [4, 5, 6]])
  86. tensor([[1, 4],
  87. [2, 5],
  88. [3, 6]])

1.2.4 合并分割

可以用torch.cat方法和torch.stack方法将多个张量合并,

可以用torch.split方法把一个张量分割 成多个张量。

torch.cat和torch.stack有略微的区别,torch.cat是连接,不会增加维度,而torch.stack是堆叠, 会增加维度

  1. #例1-2-4 张量的合并分割
  2. import torch
  3. a = torch.tensor([[1.0,2.0],[3.0,4.0]])
  4. b = torch.tensor([[5.0,6.0],[7.0,8.0]])
  5. c = torch.tensor([[9.0,10.0],[11.0,12.0]])
  6. abc_cat = torch.cat([a,b,c],dim = 0)
  7. print(abc_cat.shape)
  8. print(abc_cat)
  9. out:
  10. torch.Size([6, 2])
  11. tensor([[ 1., 2.],
  12. [ 3., 4.],
  13. [ 5., 6.],
  14. [ 7., 8.],
  15. [ 9., 10.],
  16. [11., 12.]])
  17. abc_stack = torch.stack([a,b,c],axis = 0) #torch中dim和axis参数名可以混用
  18. print(abc_stack.shape)
  19. print(abc_stack)
  20. out:
  21. torch.Size([3, 2, 2])
  22. tensor([[[ 1., 2.],
  23. [ 3., 4.]],
  24. [[ 5., 6.],
  25. [ 7., 8.]],
  26. [[ 9., 10.],
  27. [11., 12.]]])
  28. torch.cat([a,b,c],axis = 1)
  29. out:
  30. tensor([[ 1., 2., 5., 6., 9., 10.],
  31. [ 3., 4., 7., 8., 11., 12.]])
  32. torch.stack([a,b,c],axis = 1)
  33. out:
  34. tensor([[[ 1., 2.],
  35. [ 5., 6.],
  36. [ 9., 10.]],
  37. [[ 3., 4.],
  38. [ 7., 8.],
  39. [11., 12.]]])
  40. #torch.split是torch.cat的逆运算,可以指定分割份数平均分割,也可以通过指定每份的记录数量进行分割。
  41. print(abc_cat)
  42. a,b,c = torch.split(abc_cat,split_size_or_sections = 2,dim = 0) #每份2个进行分割
  43. print(a)
  44. print(b)
  45. print(c)
  46. print(abc_cat)
  47. p,q,r = torch.split(abc_cat,split_size_or_sections =[4,1,1],dim = 0) #每份分别为[4,1,1]
  48. print(p)
  49. print(q)
  50. print(r)
  51. out:
  52. tensor([[ 1., 2.],
  53. [ 3., 4.],
  54. [ 5., 6.],
  55. [ 7., 8.],
  56. [ 9., 10.],
  57. [11., 12.]])
  58. tensor([[1., 2.],
  59. [3., 4.]])
  60. tensor([[5., 6.],
  61. [7., 8.]])
  62. tensor([[ 9., 10.],
  63. [11., 12.]])
  64. tensor([[ 1., 2.],
  65. [ 3., 4.],
  66. [ 5., 6.],
  67. [ 7., 8.],
  68. [ 9., 10.],
  69. [11., 12.]])
  70. tensor([[1., 2.],
  71. [3., 4.],
  72. [5., 6.],
  73. [7., 8.]])
  74. tensor([[ 9., 10.]])
  75. tensor([[11., 12.]])

1.3  张量的数学运算

张量数学运算主要有:标量运算,向量运算,矩阵运算。

1.3.1 标量运算

张量的数学运算符可以分为标量运算符、向量运算符、以及矩阵运算符。

加减乘除乘方,以及三角函数,指数,对数等常见函数,逻辑比较运算符等都是标量运算符。

标量运算符的特点是对张量实施逐元素运算。

有些标量运算符对常用的数学运算符进行了重载,并且支持类似numpy的广播特性。

  1. #例1-3-1 张量的数学运算-标量运算
  2. import torch
  3. import numpy as np
  4. a = torch.tensor([[1.0,2],[-3,4.0]])
  5. b = torch.tensor([[5.0,6],[7.0,8.0]])
  6. a+b #运算符重载
  7. out:
  8. tensor([[ 6., 8.],
  9. [ 4., 12.]])
  10. a-b
  11. out:
  12. tensor([[ -4., -4.],
  13. [-10., -4.]])
  14. a*b
  15. out:
  16. tensor([[ 5., 12.],
  17. [-21., 32.]])
  18. a/b
  19. out:
  20. tensor([[ 0.2000, 0.3333],
  21. [-0.4286, 0.5000]])
  22. a**2
  23. out:
  24. tensor([[ 1., 4.],
  25. [ 9., 16.]])
  26. a**(0.5)
  27. out:
  28. tensor([[1.0000, 1.4142],
  29. [ nan, 2.0000]])
  30. a%3 #求模
  31. out:
  32. tensor([[1., 2.],
  33. [-0., 1.]])
  34. a//3 #地板除法
  35. out:
  36. tensor([[ 0., 0.],
  37. [-1., 1.]])
  38. a>=2 # torch.ge(a,2) #ge: greater_equal缩写
  39. out:
  40. tensor([[False, True],
  41. [False, True]])
  42. (a>=2)&(a<=3)
  43. out:
  44. tensor([[False, True],
  45. [False, False]])
  46. (a>=2)|(a<=3)
  47. out:
  48. tensor([[True, True],
  49. [True, True]])
  50. a==5 #torch.eq(a,5)
  51. out:
  52. tensor([[False, False],
  53. [False, False]])
  54. torch.sqrt(a)
  55. out:
  56. tensor([[1.0000, 1.4142],
  57. [ nan, 2.0000]])
  58. a = torch.tensor([1.0,8.0])
  59. b = torch.tensor([5.0,6.0])
  60. c = torch.tensor([6.0,7.0])
  61. d = a+b+c
  62. print(d)
  63. out:
  64. tensor([12., 21.])
  65. print(torch.max(a,b))
  66. out:
  67. tensor([5., 8.])
  68. print(torch.min(a,b))
  69. out:
  70. tensor([1., 6.])
  71. x = torch.tensor([2.6,-2.7])
  72. print(torch.round(x)) #保留整数部分,四舍五入
  73. print(torch.floor(x)) #保留整数部分,向下归整
  74. print(torch.ceil(x)) #保留整数部分,向上归整
  75. print(torch.trunc(x)) #保留整数部分,向0归整
  76. out:
  77. tensor([ 3., -3.])
  78. tensor([ 2., -3.])
  79. tensor([ 3., -2.])
  80. tensor([ 2., -2.])
  81. x = torch.tensor([2.6,-2.7])
  82. print(torch.fmod(x,2)) #作除法取余数
  83. print(torch.remainder(x,2)) #作除法取剩余的部分,结果恒正
  84. out:
  85. tensor([ 0.6000, -0.7000])
  86. tensor([0.6000, 1.3000])
  87. # 幅值裁剪
  88. x = torch.tensor([0.9,-0.8,100.0,-20.0,0.7])
  89. y = torch.clamp(x,min=-1,max = 1)
  90. z = torch.clamp(x,max = 1)
  91. print(y)
  92. print(z)
  93. out:
  94. tensor([ 0.9000, -0.8000, 1.0000, -1.0000, 0.7000])
  95. tensor([ 0.9000, -0.8000, 1.0000, -20.0000, 0.7000])

1.3.2 向量运算

向量运算符只在一个特定轴上运算,将一个向量映射到一个标量或者另外一个向量。

  1. #例1-3-2 张量的数学运算-向量运算
  2. import torch
  3. #统计值
  4. a = torch.arange(1,10).float()
  5. print(torch.sum(a))
  6. print(torch.mean(a))
  7. print(torch.max(a))
  8. print(torch.min(a))
  9. print(torch.prod(a)) #累乘
  10. print(torch.std(a)) #标准差
  11. print(torch.var(a)) #方差
  12. print(torch.median(a)) #中位数
  13. out:
  14. tensor(45.)
  15. tensor(5.)
  16. tensor(9.)
  17. tensor(1.)
  18. tensor(362880.)
  19. tensor(2.7386)
  20. tensor(7.5000)
  21. tensor(5.)
  22. #指定维度计算统计值
  23. b = a.view(3,3)
  24. print(b)
  25. print(torch.max(b,dim = 0))
  26. print(torch.max(b,dim = 1))
  27. out:
  28. tensor([[1., 2., 3.],
  29. [4., 5., 6.],
  30. [7., 8., 9.]])
  31. torch.return_types.max(
  32. values=tensor([7., 8., 9.]),
  33. indices=tensor([2, 2, 2]))
  34. torch.return_types.max(
  35. values=tensor([3., 6., 9.]),
  36. indices=tensor([2, 2, 2]))
  37. #cum扫描
  38. a = torch.arange(1,10)
  39. print(torch.cumsum(a,0))
  40. print(torch.cumprod(a,0))
  41. print(torch.cummax(a,0).values)
  42. print(torch.cummax(a,0).indices)
  43. print(torch.cummin(a,0))
  44. out:
  45. tensor([ 1, 3, 6, 10, 15, 21, 28, 36, 45])
  46. tensor([ 1, 2, 6, 24, 120, 720, 5040, 40320, 362880])
  47. tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
  48. tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
  49. torch.return_types.cummin(
  50. values=tensor([1, 1, 1, 1, 1, 1, 1, 1, 1]),
  51. indices=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0]))
  52. #torch.sort和torch.topk可以对张量排序
  53. a = torch.tensor([[9,7,8],[1,3,2],[5,6,4]]).float()
  54. print(torch.topk(a,2,dim = 0),"\n")
  55. print(torch.topk(a,2,dim = 1),"\n")
  56. print(torch.sort(a,dim = 1),"\n")
  57. out:
  58. torch.return_types.topk(
  59. values=tensor([[9., 7., 8.],
  60. [5., 6., 4.]]),
  61. indices=tensor([[0, 0, 0],
  62. [2, 2, 2]]))
  63. torch.return_types.topk(
  64. values=tensor([[9., 8.],
  65. [3., 2.],
  66. [6., 5.]]),
  67. indices=tensor([[0, 2],
  68. [1, 2],
  69. [1, 0]]))
  70. torch.return_types.sort(
  71. values=tensor([[7., 8., 9.],
  72. [1., 2., 3.],
  73. [4., 5., 6.]]),
  74. indices=tensor([[1, 2, 0],
  75. [0, 2, 1],
  76. [2, 0, 1]]))

1.3.3 矩阵运算

矩阵必须是二维的。类似torch.tensor([1,2,3])这样的不是矩阵。

矩阵运算包括:矩阵乘法,矩阵转置,矩阵逆,矩阵求迹,矩阵范数,矩阵行列式,矩阵求特征 值,矩阵分解等运算。

  1. #例1-3-3 张量的数学运算-矩阵运算
  2. import torch
  3. #矩阵乘法
  4. a = torch.tensor([[1,2],[3,4]])
  5. b = torch.tensor([[2,0],[0,2]])
  6. print(a@b) #等价于torch.matmul(a,b) 或 torch.mm(a,b)
  7. out:
  8. tensor([[2, 4],
  9. [6, 8]])
  10. #矩阵转置
  11. a = torch.tensor([[1.0,2],[3,4]])
  12. print(a.t())
  13. out:
  14. tensor([[1., 3.],
  15. [2., 4.]])
  16. #矩阵逆,必须为浮点类型
  17. a = torch.tensor([[1.0,2],[3,4]])
  18. print(torch.inverse(a))
  19. out:
  20. tensor([[-2.0000, 1.0000],
  21. [ 1.5000, -0.5000]])
  22. #矩阵求trace
  23. a = torch.tensor([[1.0,2],[3,4]])
  24. print(torch.trace(a))
  25. out:
  26. tensor(5.)
  27. #矩阵求范数
  28. a = torch.tensor([[1.0,2],[3,4]])
  29. print(torch.norm(a))
  30. out:
  31. tensor(5.4772)
  32. #矩阵行列式
  33. a = torch.tensor([[1.0,2],[3,4]])
  34. print(torch.det(a))
  35. out:
  36. tensor(-2.0000)
  37. #矩阵特征值和特征向量
  38. a = torch.tensor([[1.0,2],[-5,4]],dtype = torch.float)
  39. print(torch.eig(a,eigenvectors=True))
  40. #两个特征值分别是 -2.5+2.7839j, 2.5-2.7839j
  41. out:
  42. torch.return_types.eig(
  43. eigenvalues=tensor([[ 2.5000, 2.7839],
  44. [ 2.5000, -2.7839]]),
  45. eigenvectors=tensor([[ 0.2535, -0.4706],
  46. [ 0.8452, 0.0000]]))
  47. #矩阵QR分解, 将一个方阵分解为一个正交矩阵q和上三角矩阵r
  48. #QR分解实际上是对矩阵a实施Schmidt正交化得到q
  49. a = torch.tensor([[1.0,2.0],[3.0,4.0]])
  50. q,r = torch.qr(a)
  51. print(q,"\n")
  52. print(r,"\n")
  53. print(q@r)
  54. out:
  55. tensor([[-0.3162, -0.9487],
  56. [-0.9487, 0.3162]])
  57. tensor([[-3.1623, -4.4272],
  58. [ 0.0000, -0.6325]])
  59. tensor([[1.0000, 2.0000],
  60. [3.0000, 4.0000]])
  61. #矩阵svd分解
  62. #svd分解可以将任意一个矩阵分解为一个正交矩阵u,一个对角阵s和一个正交矩阵v.t()的乘积
  63. #svd常用于矩阵压缩和降维
  64. a=torch.tensor([[1.0,2.0],[3.0,4.0],[5.0,6.0]])
  65. u,s,v = torch.svd(a)
  66. print(u,"\n")
  67. print(s,"\n")
  68. print(v,"\n")
  69. print(u@torch.diag(s)@v.t())
  70. #利用svd分解可以在Pytorch中实现主成分分析降维
  71. out:
  72. tensor([[-0.2298, 0.8835],
  73. [-0.5247, 0.2408],
  74. [-0.8196, -0.4019]])
  75. tensor([9.5255, 0.5143])
  76. tensor([[-0.6196, -0.7849],
  77. [-0.7849, 0.6196]])
  78. tensor([[1.0000, 2.0000],
  79. [3.0000, 4.0000],
  80. [5.0000, 6.0000]])

1.3.4 广播机制

Pytorch的广播规则和numpy是一样的:

1、如果张量的维度不同,将维度较小的张量进行扩展,直到两个张量的维度都一样。

2、如果两个张量在某个维度上的长度是相同的,或者其中一个张量在该维度上的长度为1, 那么我们就说这两个张量在该维度上是相容的。

3、如果两个张量在所有维度上都是相容的,它们就能使用广播。

4、广播之后,每个维度的长度将取两个张量在该维度长度的较大值。

5、在任何一个维度上,如果一个张量的长度为1,另一个张量长度大于1,那么在该维度 上,就好像是对第一个张量进行了复制。 torch.broadcast_tensors可以将多个张量根据广播规则转换成相同的维度。

  1. #例 1-3-4 广播机制
  2. import torch
  3. a = torch.tensor([1,2,3])
  4. b = torch.tensor([[0,0,0],[1,1,1],[2,2,2]])
  5. print(b + a)
  6. a_broad,b_broad = torch.broadcast_tensors(a,b)
  7. print(a_broad,"\n")
  8. print(b_broad,"\n")
  9. print(a_broad + b_broad)
  10. out:
  11. tensor([[1, 2, 3],
  12. [2, 3, 4],
  13. [3, 4, 5]])
  14. tensor([[1, 2, 3],
  15. [1, 2, 3],
  16. [1, 2, 3]])
  17. tensor([[0, 0, 0],
  18. [1, 1, 1],
  19. [2, 2, 2]])
  20. tensor([[1, 2, 3],
  21. [2, 3, 4],
  22. [3, 4, 5]])

2、其他基础知识

2.1 自动微分机制

神经网络通常依赖反向传播求梯度来更新网络参数,求梯度过程通常是一件非常复杂而容易出错 的事情。

而深度学习框架可以帮助我们自动地完成这种求梯度运算。

Pytorch一般通过反向传播 backward 方法实现这种求梯度计算。

该方法求得的梯度将存在对应自变量张量的grad属性下。

除此之外,也能够调用torch.autograd.grad函数来实现求梯度计算。

这就是Pytorch的自动微分机制。

2.1.1 利用backward方法求导数

backward 方法通常在一个标量张量上调用,该方法求得的梯度将存在对应自变量张量的grad属 性下。

如果调用的张量非标量,则要传入一个和它同形状的gradient参数张量。   

相当于用该gradient参数张量与调用张量作向量点乘,得到的标量结果再反向传播。

  1. #例2-1-1:利用backward方法求导数
  2. #标量的反向传播
  3. import numpy as np
  4. import torch
  5. # f(x) = a*x**2 + b*x + c的导数
  6. x = torch.tensor(0.0,requires_grad = True) # x需要被求导
  7. a = torch.tensor(1.0)
  8. b = torch.tensor(-2.0)
  9. c = torch.tensor(1.0)
  10. y = a*torch.pow(x,2) + b*x + c
  11. y.backward()
  12. dy_dx = x.grad
  13. print(dy_dx)
  14. out:
  15. tensor(-2.)
  16. #非标量的反向传播
  17. import numpy as np
  18. import torch
  19. # f(x) = a*x**2 + b*x + c
  20. x = torch.tensor([[0.0,0.0],[1.0,2.0]],requires_grad = True) # x需要被求导
  21. a = torch.tensor(1.0)
  22. b = torch.tensor(-2.0)
  23. c = torch.tensor(1.0)
  24. y = a*torch.pow(x,2) + b*x + c
  25. gradient = torch.tensor([[1.0,1.0],[1.0,1.0]])
  26. print("x:\n",x)
  27. print("y:\n",y)
  28. y.backward(gradient = gradient)
  29. x_grad = x.grad
  30. print("x_grad:\n",x_grad)
  31. out:
  32. x:
  33. tensor([[0., 0.],
  34. [1., 2.]], requires_grad=True)
  35. y:
  36. tensor([[1., 1.],
  37. [0., 1.]], grad_fn=<AddBackward0>)
  38. x_grad:
  39. tensor([[-2., -2.],
  40. [ 0., 2.]])
  41. #非标量的反向传播可以用标量的反向传播实现
  42. import numpy as np
  43. import torch
  44. # f(x) = a*x**2 + b*x + c
  45. x = torch.tensor([[0.0,0.0],[1.0,2.0]],requires_grad = True) # x需要被求导
  46. a = torch.tensor(1.0)
  47. b = torch.tensor(-2.0)
  48. c = torch.tensor(1.0)
  49. y = a*torch.pow(x,2) + b*x + c
  50. gradient = torch.tensor([[1.0,1.0],[1.0,1.0]])
  51. z = torch.sum(y*gradient)
  52. print("x:",x)
  53. print("y:",y)
  54. z.backward()
  55. x_grad = x.grad
  56. print("x_grad:\n",x_grad)
  57. out:
  58. x: tensor([[0., 0.],
  59. [1., 2.]], requires_grad=True)
  60. y: tensor([[1., 1.],
  61. [0., 1.]], grad_fn=<AddBackward0>)
  62. x_grad:
  63. tensor([[-2., -2.],
  64. [ 0., 2.]])

2.1.2 利用autograd.grad方法求导数

  1. #例2-1-2 利用autograd.grad方法求导数
  2. import numpy as np
  3. import torch
  4. # f(x) = a*x**2 + b*x + c的导数
  5. x = torch.tensor(0.0,requires_grad = True) # x需要被求导
  6. a = torch.tensor(1.0)
  7. b = torch.tensor(-2.0)
  8. c = torch.tensor(1.0)
  9. y = a*torch.pow(x,2) + b*x + c
  10. # create_graph 设置为 True 将允许创建更高阶的导数
  11. dy_dx = torch.autograd.grad(y,x,create_graph=True)[0]
  12. print(dy_dx.data)
  13. # 求二阶导数
  14. dy2_dx2 = torch.autograd.grad(dy_dx,x)[0]
  15. print(dy2_dx2.data)
  16. out:
  17. tensor(-2.)
  18. tensor(2.)
  19. #例1-2-2 利用autograd.grad方法求导数,对多个自变量求导数
  20. import numpy as np
  21. import torch
  22. x1 = torch.tensor(1.0,requires_grad = True) # x需要被求导
  23. x2 = torch.tensor(2.0,requires_grad = True)
  24. y1 = x1*x2
  25. y2 = x1+x2
  26. # 允许同时对多个自变量求导数
  27. (dy1_dx1,dy1_dx2) = torch.autograd.grad(outputs=y1,inputs =
  28. [x1,x2],retain_graph = True)
  29. print(dy1_dx1,dy1_dx2)
  30. # 如果有多个因变量,相当于把多个因变量的梯度结果求和
  31. (dy12_dx1,dy12_dx2) = torch.autograd.grad(outputs=[y1,y2],inputs = [x1,x2])
  32. print(dy12_dx1,dy12_dx2)
  33. out:
  34. tensor(2.) tensor(1.)
  35. tensor(3.) tensor(2.)

2.1.3 利用自动微分和优化器求最小值

  1. #例2-1-3 利用自动微分和优化器求最小值
  2. import numpy as np
  3. import torch
  4. # f(x) = a*x**2 + b*x + c的最小值
  5. x = torch.tensor(0.0,requires_grad = True) # x需要被求导
  6. a = torch.tensor(1.0)
  7. b = torch.tensor(-2.0)
  8. c = torch.tensor(1.0)
  9. optimizer = torch.optim.SGD(params=[x],lr = 0.01)
  10. def f(x):
  11. result = a*torch.pow(x,2) + b*x + c
  12. return(result)
  13. for i in range(500):
  14. optimizer.zero_grad()
  15. y = f(x)
  16. y.backward()
  17. optimizer.step()
  18. print("y=",f(x).data,";","x=",x.data)
  19. out:
  20. y= tensor(0.) ; x= tensor(1.0000)

2.2 动态计算图

2.2.1 动态计算图简介:

Pytorch的计算图由节点和边组成,节点表示张量或者Function,边表示张量和Function之间的 依赖关系。

Pytorch中的计算图是动态图。这里的动态主要有两重含义。

第一层含义是:计算图的正向传播是立即执行的。无需等待完整的计算图创建完毕,每条语句都 会在计算图中动态添加节点和边,并立即执行正向传播得到计算结果。

第二层含义是:计算图在反向传播后立即销毁。下次调用需要重新构建计算图。如果在程序中使 用了backward方法执行了反向传播,或者利用torch.autograd.grad方法计算了梯度,那么创建的 计算图会被立即销毁,释放存储空间,下次调用需要重新创建。

  1. #例2-2-1 计算图的正向传播是立即执行的
  2. import torch
  3. w = torch.tensor([[3.0,1.0]],requires_grad=True)
  4. b = torch.tensor([[3.0]],requires_grad=True)
  5. X = torch.randn(10,2)
  6. Y = torch.randn(10,1)
  7. Y_hat = X@w.t() + b # Y_hat定义后其正向传播被立即执行,与其后面的loss创建语句无关
  8. loss = torch.mean(torch.pow(Y_hat-Y,2))
  9. print(loss.data)
  10. print(Y_hat.data)
  11. out:
  12. tensor(25.9445)
  13. tensor([[ 5.8349],
  14. [ 0.5817],
  15. [-4.2764],
  16. [ 3.2476],
  17. [ 3.6737],
  18. [ 2.8748],
  19. [ 8.3981],
  20. [ 7.1418],
  21. [-4.8522],
  22. [ 2.2610]])
  23. #计算图在反向传播后立即销毁
  24. import torch
  25. w = torch.tensor([[3.0,1.0]],requires_grad=True)
  26. b = torch.tensor([[3.0]],requires_grad=True)
  27. X = torch.randn(10,2)
  28. Y = torch.randn(10,1)
  29. Y_hat = X@w.t() + b # Y_hat定义后其正向传播被立即执行,与其后面的loss创建语句无关
  30. loss = torch.mean(torch.pow(Y_hat-Y,2))
  31. #计算图在反向传播后立即销毁,如果需要保留计算图, 需要设置retain_graph = True
  32. loss.backward() #loss.backward(retain_graph = True)
  33. #loss.backward() #如果再次执行反向传播将报错

2.2.2 计算图中的Function

计算图中的张量我们已经比较熟悉了, 计算图中的另外一种节点是Function, 实际上就是 Pytorch 中各种对张量操作的函数。

这些Function和我们Python中的函数有一个较大的区别,那就是它同时包括正向计算逻辑和反向传播的逻辑。

我们可以通过继承torch.autograd.Function来创建这种支持反向传播的Function

  1. #例2-2-2 计算图中的Function
  2. import torch
  3. class MyReLU(torch.autograd.Function):
  4. #正向传播逻辑,可以用ctx存储一些值,供反向传播使用。
  5. @staticmethod
  6. def forward(ctx, input):
  7. ctx.save_for_backward(input)
  8. return input.clamp(min=0)
  9. #反向传播逻辑
  10. @staticmethod
  11. def backward(ctx, grad_output):
  12. input, = ctx.saved_tensors
  13. grad_input = grad_output.clone()
  14. grad_input[input < 0] = 0
  15. return grad_input
  16. import torch
  17. w = torch.tensor([[3.0,1.0]],requires_grad=True)
  18. b = torch.tensor([[3.0]],requires_grad=True)
  19. X = torch.tensor([[-1.0,-1.0],[1.0,1.0]])
  20. Y = torch.tensor([[2.0,3.0]])
  21. relu = MyReLU.apply # relu现在也可以具有正向传播和反向传播功能
  22. Y_hat = relu(X@w.t() + b)
  23. loss = torch.mean(torch.pow(Y_hat-Y,2))
  24. loss.backward()
  25. print(w.grad)
  26. print(b.grad)
  27. out:
  28. tensor([[4.5000, 4.5000]])
  29. tensor([[4.5000]])
  30. # Y_hat的梯度函数即是我们自己所定义的 MyReLU.backward
  31. print(Y_hat.grad_fn)
  32. <torch.autograd.function.MyReLUBackward object at 0x000001FE1652D900>

2.2.3 计算图与反向传播

了解了Function的功能,我们可以简单地理解一下反向传播的原理和过程。理解该部分原理需要一些高等数学中求导链式法则的基础知识。

  1. #例2-2-3 计算图与反向传播
  2. import torch
  3. x = torch.tensor(3.0,requires_grad=True)
  4. y1 = x + 1
  5. y2 = 2*x
  6. loss = (y1-y2)**2
  7. loss.backward()

loss.backward()语句调用后,依次发生以下计算过程。

1,loss自己的grad梯度赋值为1,即对自身的梯度为1。

2,loss根据其自身梯度以及关联的backward方法,计算出其对应的自变量即y1和y2的梯度,将 该值赋值到y1.grad和y2.grad。

3,y2和y1根据其自身梯度以及关联的backward方法, 分别计算出其对应的自变量x的梯度, x.grad将其收到的多个梯度值累加。 (注意,1,2,3步骤的求梯度顺序和对多个梯度值的累加规则恰好是求导链式法则的程序表述) 正因为求导链式法则衍生的梯度累加规则,张量的grad梯度不会自动清零,在需要的时候需要手动置零。

2.2.4 叶子节点和非叶子节点

执行下面代码,我们会发现 loss.grad并不是我们期望的1,而是 None。 类似地 y1.grad 以及 y2.grad也是 None. 这是为什么呢?这是由于它们不是叶子节点张量。

在反向传播过程中,只有 is_leaf=True 的叶子节点,需要求导的张量的导数结果才会被最后保留下来。

那么什么是叶子节点张量呢?叶子节点张量需要满足两个条件。

1,叶子节点张量是由用户直接创建的张量,而非由某个Function通过计算得到的张量。

2,叶子节点张量的 requires_grad属性必须为True. Pytorch设计这样的规则主要是为了节约内存或者显存空间,因为几乎所有的时候,用户只会关 心他自己直接创建的张量的梯度。 所有依赖于叶子节点张量的张量, 其requires_grad 属性必定是True的,但其梯度值只在计算过程中被用到,不会最终存储到grad属性中。

如果需要保留中间计算结果的梯度到grad属性中,可以使用 retain_grad方法。

如果仅仅是为了 调试代码查看梯度值,可以利用register_hook打印日志

  1. #例2-2-4 叶子节点和非叶子节点
  2. import torch
  3. x = torch.tensor(3.0,requires_grad=True)
  4. y1 = x + 1
  5. y2 = 2*x
  6. loss = (y1-y2)**2
  7. loss.backward()
  8. print("loss.grad:", loss.grad)
  9. print("y1.grad:", y1.grad)
  10. print("y2.grad:", y2.grad)
  11. print(x.grad)
  12. out:
  13. loss.grad: None
  14. y1.grad: None
  15. y2.grad: None
  16. tensor(4.)
  17. print(x.is_leaf)
  18. print(y1.is_leaf)
  19. print(y2.is_leaf)
  20. print(loss.is_leaf)
  21. out:
  22. True
  23. False
  24. False
  25. False

利用retain_grad可以保留非叶子节点的梯度值,利用register_hook可以查看非叶子节点的梯度 值。

  1. #例2-2-4 叶子节点的梯度值
  2. import torch
  3. #正向传播
  4. x = torch.tensor(3.0,requires_grad=True)
  5. y1 = x + 1
  6. y2 = 2*x
  7. loss = (y1-y2)**2
  8. #非叶子节点梯度显示控制
  9. y1.register_hook(lambda grad: print('y1 grad: ', grad))
  10. y2.register_hook(lambda grad: print('y2 grad: ', grad))
  11. loss.retain_grad()
  12. #反向传播
  13. loss.backward()
  14. print("loss.grad:", loss.grad)
  15. print("x.grad:", x.grad)
  16. out:
  17. y2 grad: tensor(4.)
  18. y1 grad: tensor(-4.)
  19. loss.grad: tensor(1.)
  20. x.grad: tensor(4.)

2.2.5 计算图在TensorBoard中的可视化

可以利用 torch.utils.tensorboard 将计算图导出到 TensorBoard进行可视化。

  1. #例2-2-5 计算图在TensorBoard中的可视化
  2. import torch
  3. from torch import nn
  4. class Net(nn.Module):
  5. def __init__(self):
  6. super(Net, self).__init__()
  7. self.w = nn.Parameter(torch.randn(2,1))
  8. self.b = nn.Parameter(torch.zeros(1,1))
  9. def forward(self, x):
  10. y = x@self.w + self.b
  11. return y
  12. net = Net()
  13. from torch.utils.tensorboard import SummaryWriter
  14. writer = SummaryWriter('./data/tensorboard')
  15. writer.add_graph(net,input_to_model = torch.rand(10,2))
  16. writer.close()
  17. %load_ext tensorboard
  18. #%tensorboard --logdir ./data/tensorboard
  19. from tensorboard import notebook
  20. notebook.list()
  21. #在tensorboard中查看模型
  22. notebook.start("--logdir ./data/tensorboard")

运行得到下图

2.3  Pytorch的层次结构

Pytorch中有5个不同的层次结构:即硬件层,内核层,低阶API,中阶API,高阶 API【torchkeras】。

Pytorch的层次结构从低到高可以分成如下五层。

最底层为硬件层,Pytorch支持CPU、GPU加入计算资源池。

第二层为C++实现的内核。

第三层为Python实现的操作符,提供了封装C++内核的低级API指令,主要包括各种张量操作算 子、自动微分、变量管理. 如torch.tensor,torch.cat,torch.autograd.grad,nn.Module. 

第四层为Python实现的模型组件,对低级API进行了函数封装,主要包括各种模型层,损失函 数,优化器,数据管道等等。 如 torch.nn.Linear,torch.nn.BCE,torch.optim.Adam,torch.utils.data.DataLoader. 

第五层为Python实现的模型接口。Pytorch没有官方的高阶API。为了便于训练模型,我们仿照 keras中的模型接口,使用了不到300行代码,封装了Pytorch的高阶模型接口 torchkeras.Model。

2.3.1 低阶API示范 

Pytorch的低阶API实现线性回归模型和DNN二分类模型。

低阶API主要包括张量操作,计算图和自动微分。

  1. #例2-3-1-a 低阶API实现线性回归模型示范
  2. import os
  3. import datetime
  4. #打印时间
  5. def printbar():
  6. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  7. print("\n"+"=========="*8 + "%s"%nowtime)
  8. #第一步 准备数据
  9. import numpy as np
  10. import pandas as pd
  11. from matplotlib import pyplot as plt
  12. import torch
  13. from torch import nn
  14. #样本数量
  15. n = 400
  16. # 生成测试用数据集
  17. X = 10*torch.rand([n,2])-5.0 #torch.rand是均匀分布
  18. w0 = torch.tensor([[2.0],[-3.0]])
  19. b0 = torch.tensor([[10.0]])
  20. Y = X@w0 + b0 + torch.normal( 0.0,2.0,size = [n,1]) # @表示矩阵乘法,增加正态扰动
  21. # 数据可视化
  22. %matplotlib inline
  23. %config InlineBackend.figure_format = 'svg'
  24. plt.figure(figsize = (12,5))
  25. ax1 = plt.subplot(121)
  26. ax1.scatter(X[:,0].numpy(),Y[:,0].numpy(), c = "b",label = "samples")
  27. ax1.legend()
  28. plt.xlabel("x1")
  29. plt.ylabel("y",rotation = 0)
  30. ax2 = plt.subplot(122)
  31. ax2.scatter(X[:,1].numpy(),Y[:,0].numpy(), c = "g",label = "samples")
  32. ax2.legend()
  33. plt.xlabel("x2")
  34. plt.ylabel("y",rotation = 0)
  35. plt.show()
  36. # 构建数据管道迭代器
  37. def data_iter(features, labels, batch_size=8):
  38. num_examples = len(features)
  39. indices = list(range(num_examples))
  40. np.random.shuffle(indices) #样本的读取顺序是随机的
  41. for i in range(0, num_examples, batch_size):
  42. indexs = torch.LongTensor(indices[i: min(i + batch_size,num_examples)])
  43. yield features.index_select(0, indexs), labels.index_select(0,indexs)
  44. # 测试数据管道效果
  45. batch_size = 8
  46. (features,labels) = next(data_iter(X,Y,batch_size))
  47. print(features)
  48. print(labels)
  49. #第二步 构建模型
  50. # 定义模型
  51. class LinearRegression:
  52. def __init__(self):
  53. self.w = torch.randn_like(w0,requires_grad=True)
  54. self.b = torch.zeros_like(b0,requires_grad=True)
  55. #正向传播
  56. def forward(self,x):
  57. return x@self.w + self.b
  58. # 损失函数
  59. def loss_func(self,y_pred,y_true):
  60. return torch.mean((y_pred - y_true)**2/2)
  61. model = LinearRegression()
  62. #第三步 训练模型
  63. def train_step(model, features, labels):
  64. predictions = model.forward(features)
  65. loss = model.loss_func(predictions,labels)
  66. # 反向传播求梯度
  67. loss.backward()
  68. # 使用torch.no_grad()避免梯度记录,也可以通过操作 model.w.data 实现避免梯度记录
  69. with torch.no_grad():
  70. # 梯度下降法更新参数
  71. model.w -= 0.001*model.w.grad
  72. model.b -= 0.001*model.b.grad
  73. # 梯度清零
  74. model.w.grad.zero_()
  75. model.b.grad.zero_()
  76. return loss
  77. # 测试train_step效果
  78. batch_size = 10
  79. (features,labels) = next(data_iter(X,Y,batch_size))
  80. train_step(model,features,labels)
  81. def train_model(model,epochs):
  82. for epoch in range(1,epochs+1):
  83. for features, labels in data_iter(X,Y,10):
  84. loss = train_step(model,features,labels)
  85. if epoch%200==0:
  86. printbar()
  87. print("epoch =",epoch,"loss = ",loss.item())
  88. print("model.w =",model.w.data)
  89. print("model.b =",model.b.data)
  90. train_model(model,epochs = 1000)
  91. # 结果可视化
  92. %matplotlib inline
  93. %config InlineBackend.figure_format = 'svg'
  94. plt.figure(figsize = (12,5))
  95. ax1 = plt.subplot(121)
  96. ax1.scatter(X[:,0].numpy(),Y[:,0].numpy(), c = "b",label = "samples")
  97. ax1.plot(X[:,0].numpy(),(model.w[0].data*X[:,0]+model.b[0].data).numpy(),"-r",linewidth = 5.0,label = "model")
  98. ax1.legend()
  99. plt.xlabel("x1")
  100. plt.ylabel("y",rotation = 0)
  101. ax2 = plt.subplot(122)
  102. ax2.scatter(X[:,1].numpy(),Y[:,0].numpy(), c = "g",label = "samples")
  103. ax2.plot(X[:,1].numpy(),(model.w[1].data*X[:,1]+model.b[0].data).numpy(),"-r",linewidth = 5.0,label = "model")
  104. ax2.legend()
  105. plt.xlabel("x2")
  106. plt.ylabel("y",rotation = 0)
  107. plt.show()
  1. #例2-3-1-b 低阶API实现DNN二分类模型范例
  2. #第一步 准备数据
  3. import numpy as np
  4. import pandas as pd
  5. from matplotlib import pyplot as plt
  6. import torch
  7. from torch import nn
  8. %matplotlib inline
  9. %config InlineBackend.figure_format = 'svg'
  10. #正负样本数量
  11. n_positive,n_negative = 2000,2000
  12. #生成正样本, 小圆环分布
  13. r_p = 5.0 + torch.normal(0.0,1.0,size = [n_positive,1])
  14. theta_p = 2*np.pi*torch.rand([n_positive,1])
  15. Xp = torch.cat([r_p*torch.cos(theta_p),r_p*torch.sin(theta_p)],axis = 1)
  16. Yp = torch.ones_like(r_p)
  17. #生成负样本, 大圆环分布
  18. r_n = 8.0 + torch.normal(0.0,1.0,size = [n_negative,1])
  19. theta_n = 2*np.pi*torch.rand([n_negative,1])
  20. Xn = torch.cat([r_n*torch.cos(theta_n),r_n*torch.sin(theta_n)],axis = 1)
  21. Yn = torch.zeros_like(r_n)
  22. #汇总样本
  23. X = torch.cat([Xp,Xn],axis = 0)
  24. Y = torch.cat([Yp,Yn],axis = 0)
  25. #可视化
  26. plt.figure(figsize = (6,6))
  27. plt.scatter(Xp[:,0].numpy(),Xp[:,1].numpy(),c = "r")
  28. plt.scatter(Xn[:,0].numpy(),Xn[:,1].numpy(),c = "g")
  29. plt.legend(["positive","negative"]);
  30. # 构建数据管道迭代器
  31. def data_iter(features, labels, batch_size=8):
  32. num_examples = len(features)
  33. indices = list(range(num_examples))
  34. np.random.shuffle(indices) #样本的读取顺序是随机的
  35. for i in range(0, num_examples, batch_size):
  36. indexs = torch.LongTensor(indices[i: min(i + batch_size,num_examples)])
  37. yield features.index_select(0, indexs), labels.index_select(0,indexs)
  38. # 测试数据管道效果
  39. batch_size = 8
  40. (features,labels) = next(data_iter(X,Y,batch_size))
  41. print(features)
  42. print(labels)
  43. #第二步 定义模型
  44. class DNNModel(nn.Module):
  45. def __init__(self):
  46. super(DNNModel, self).__init__()
  47. self.w1 = nn.Parameter(torch.randn(2,4))
  48. self.b1 = nn.Parameter(torch.zeros(1,4))
  49. self.w2 = nn.Parameter(torch.randn(4,8))
  50. self.b2 = nn.Parameter(torch.zeros(1,8))
  51. self.w3 = nn.Parameter(torch.randn(8,1))
  52. self.b3 = nn.Parameter(torch.zeros(1,1))
  53. # 正向传播
  54. def forward(self,x):
  55. x = torch.relu(x@self.w1 + self.b1)
  56. x = torch.relu(x@self.w2 + self.b2)
  57. y = torch.sigmoid(x@self.w3 + self.b3)
  58. return y
  59. # 损失函数(二元交叉熵)
  60. def loss_func(self,y_pred,y_true):
  61. #将预测值限制在1e-7以上, 1- (1e-7)以下,避免log(0)错误
  62. eps = 1e-7
  63. y_pred = torch.clamp(y_pred,eps,1.0-eps)
  64. bce = - y_true*torch.log(y_pred) - (1-y_true)*torch.log(1-y_pred)
  65. return torch.mean(bce)
  66. # 评估指标(准确率)
  67. def metric_func(self,y_pred,y_true):
  68. y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype = torch.float32),torch.zeros_like(y_pred,dtype = torch.float32))
  69. acc = torch.mean(1-torch.abs(y_true-y_pred))
  70. return acc
  71. model = DNNModel()
  72. # 测试模型结构
  73. batch_size = 10
  74. (features,labels) = next(data_iter(X,Y,batch_size))
  75. predictions = model(features)
  76. loss = model.loss_func(labels,predictions)
  77. metric = model.metric_func(labels,predictions)
  78. print("init loss:", loss.item())
  79. print("init metric:", metric.item())
  80. #第三步 训练模型
  81. def train_step(model, features, labels):
  82. # 正向传播求损失
  83. predictions = model.forward(features)
  84. loss = model.loss_func(predictions,labels)
  85. metric = model.metric_func(predictions,labels)
  86. # 反向传播求梯度
  87. loss.backward()
  88. # 梯度下降法更新参数
  89. for param in model.parameters():
  90. #注意是对param.data进行重新赋值,避免此处操作引起梯度记录
  91. param.data = (param.data - 0.01*param.grad.data)
  92. # 梯度清零
  93. model.zero_grad()
  94. return loss.item(),metric.item()
  95. def train_model(model,epochs):
  96. for epoch in range(1,epochs+1):
  97. loss_list,metric_list = [],[]
  98. for features, labels in data_iter(X,Y,20):
  99. lossi,metrici = train_step(model,features,labels)
  100. loss_list.append(lossi)
  101. metric_list.append(metrici)
  102. loss = np.mean(loss_list)
  103. metric = np.mean(metric_list)
  104. if epoch%100==0:
  105. print()
  106. print("epoch =",epoch,"loss = ",loss,"metric = ",metric)
  107. train_model(model,epochs = 1000)
  108. # 结果可视化
  109. fig, (ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize = (12,5))
  110. ax1.scatter(Xp[:,0],Xp[:,1], c="r")
  111. ax1.scatter(Xn[:,0],Xn[:,1],c = "g")
  112. ax1.legend(["positive","negative"]);
  113. ax1.set_title("y_true");
  114. Xp_pred = X[torch.squeeze(model.forward(X)>=0.5)]
  115. Xn_pred = X[torch.squeeze(model.forward(X)<0.5)]
  116. ax2.scatter(Xp_pred[:,0],Xp_pred[:,1],c = "r")
  117. ax2.scatter(Xn_pred[:,0],Xn_pred[:,1],c = "g")
  118. ax2.legend(["positive","negative"]);
  119. ax2.set_title("y_pred");

2.3.2 中阶API示范

下面的范例使用Pytorch的中阶API实现线性回归模型和和DNN二分类模型。

Pytorch的中阶API主要包括各种模型层,损失函数,优化器,数据管道等等。

  1. #例2-3-2-a 中阶API实现线性回归范例
  2. import numpy as np
  3. import pandas as pd
  4. from matplotlib import pyplot as plt
  5. import torch
  6. from torch import nn
  7. import torch.nn.functional as F
  8. from torch.utils.data import Dataset,DataLoader,TensorDataset
  9. #第一步 准备数据
  10. #样本数量
  11. n = 400
  12. # 生成测试用数据集
  13. X = 10*torch.rand([n,2])-5.0 #torch.rand是均匀分布
  14. w0 = torch.tensor([[2.0],[-3.0]])
  15. b0 = torch.tensor([[10.0]])
  16. Y = X@w0 + b0 + torch.normal( 0.0,2.0,size = [n,1]) # @表示矩阵乘法,增加正态扰动
  17. # 数据可视化
  18. %matplotlib inline
  19. %config InlineBackend.figure_format = 'svg'
  20. plt.figure(figsize = (12,5))
  21. ax1 = plt.subplot(121)
  22. ax1.scatter(X[:,0],Y[:,0], c = "b",label = "samples")
  23. ax1.legend()
  24. plt.xlabel("x1")
  25. plt.ylabel("y",rotation = 0)
  26. ax2 = plt.subplot(122)
  27. ax2.scatter(X[:,1],Y[:,0], c = "g",label = "samples")
  28. ax2.legend()
  29. plt.xlabel("x2")
  30. plt.ylabel("y",rotation = 0)
  31. plt.show()
  32. #构建输入数据管道
  33. ds = TensorDataset(X,Y)
  34. dl = DataLoader(ds,batch_size = 10,shuffle=True,num_workers=2)
  35. #第二步 定义模型
  36. model = nn.Linear(2,1) #线性层
  37. model.loss_func = nn.MSELoss()
  38. model.optimizer = torch.optim.SGD(model.parameters(),lr = 0.01)
  39. #第三步 训练模型
  40. def train_step(model, features, labels):
  41. predictions = model(features)
  42. loss = model.loss_func(predictions,labels)
  43. loss.backward()
  44. model.optimizer.step()
  45. model.optimizer.zero_grad()
  46. return loss.item()
  47. # 测试train_step效果
  48. features,labels = next(iter(dl))
  49. train_step(model,features,labels)
  50. def train_model(model,epochs):
  51. for epoch in range(1,epochs+1):
  52. for features, labels in dl:
  53. loss = train_step(model,features,labels)
  54. if epoch%50==0:
  55. printbar()
  56. w = model.state_dict()["weight"]
  57. b = model.state_dict()["bias"]
  58. print("epoch =",epoch,"loss = ",loss)
  59. print("w =",w)
  60. print("b =",b)
  61. train_model(model,epochs = 200)
  62. # 结果可视化
  63. %matplotlib inline
  64. %config InlineBackend.figure_format = 'svg'
  65. w,b = model.state_dict()["weight"],model.state_dict()["bias"]
  66. plt.figure(figsize = (12,5))
  67. ax1 = plt.subplot(121)
  68. ax1.scatter(X[:,0],Y[:,0], c = "b",label = "samples")
  69. ax1.plot(X[:,0],w[0,0]*X[:,0]+b[0],"-r",linewidth = 5.0,label = "model")
  70. ax1.legend()
  71. plt.xlabel("x1")
  72. plt.ylabel("y",rotation = 0)
  73. ax2 = plt.subplot(122)
  74. ax2.scatter(X[:,1],Y[:,0], c = "g",label = "samples")
  75. ax2.plot(X[:,1],w[0,1]*X[:,1]+b[0],"-r",linewidth = 5.0,label = "model")
  76. ax2.legend()
  77. plt.xlabel("x2")
  78. plt.ylabel("y",rotation = 0)
  79. plt.show()

  1. #例2-3-2-b 中阶API实现 DNN二分类
  2. #第一步 准备数据
  3. import numpy as np
  4. import pandas as pd
  5. from matplotlib import pyplot as plt
  6. import torch
  7. from torch import nn
  8. import torch.nn.functional as F
  9. from torch.utils.data import Dataset,DataLoader,TensorDataset
  10. %matplotlib inline
  11. %config InlineBackend.figure_format = 'svg'
  12. #正负样本数量
  13. n_positive,n_negative = 2000,2000
  14. #生成正样本, 小圆环分布
  15. r_p = 5.0 + torch.normal(0.0,1.0,size = [n_positive,1])
  16. theta_p = 2*np.pi*torch.rand([n_positive,1])
  17. Xp = torch.cat([r_p*torch.cos(theta_p),r_p*torch.sin(theta_p)],axis = 1)
  18. Yp = torch.ones_like(r_p)
  19. #生成负样本, 大圆环分布
  20. r_n = 8.0 + torch.normal(0.0,1.0,size = [n_negative,1])
  21. theta_n = 2*np.pi*torch.rand([n_negative,1])
  22. Xn = torch.cat([r_n*torch.cos(theta_n),r_n*torch.sin(theta_n)],axis = 1)
  23. Yn = torch.zeros_like(r_n)
  24. #汇总样本
  25. X = torch.cat([Xp,Xn],axis = 0)
  26. Y = torch.cat([Yp,Yn],axis = 0)
  27. #可视化
  28. plt.figure(figsize = (6,6))
  29. plt.scatter(Xp[:,0],Xp[:,1],c = "r")
  30. plt.scatter(Xn[:,0],Xn[:,1],c = "g")
  31. plt.legend(["positive","negative"]);
  32. #构建输入数据管道
  33. ds = TensorDataset(X,Y)
  34. dl = DataLoader(ds,batch_size = 10,shuffle=True,num_workers=2)
  35. #第二步 定义模型
  36. class DNNModel(nn.Module):
  37. def __init__(self):
  38. super(DNNModel, self).__init__()
  39. self.fc1 = nn.Linear(2,4)
  40. self.fc2 = nn.Linear(4,8)
  41. self.fc3 = nn.Linear(8,1)
  42. # 正向传播
  43. def forward(self,x):
  44. x = F.relu(self.fc1(x))
  45. x = F.relu(self.fc2(x))
  46. y = nn.Sigmoid()(self.fc3(x))
  47. return y
  48. # 损失函数
  49. def loss_func(self,y_pred,y_true):
  50. return nn.BCELoss()(y_pred,y_true)
  51. # 评估函数(准确率)
  52. def metric_func(self,y_pred,y_true):
  53. y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype =
  54. torch.float32),
  55. torch.zeros_like(y_pred,dtype = torch.float32))
  56. acc = torch.mean(1-torch.abs(y_true-y_pred))
  57. return acc
  58. # 优化器
  59. @property
  60. def optimizer(self):
  61. return torch.optim.Adam(self.parameters(),lr = 0.001)
  62. model = DNNModel()
  63. # 测试模型结构
  64. (features,labels) = next(iter(dl))
  65. predictions = model(features)
  66. loss = model.loss_func(predictions,labels)
  67. metric = model.metric_func(predictions,labels)
  68. print("init loss:",loss.item())
  69. print("init metric:",metric.item())
  70. #第三步 训练模型
  71. def train_step(model, features, labels):
  72. # 正向传播求损失
  73. predictions = model(features)
  74. loss = model.loss_func(predictions,labels)
  75. metric = model.metric_func(predictions,labels)
  76. # 反向传播求梯度
  77. loss.backward()
  78. # 更新模型参数
  79. model.optimizer.step()
  80. model.optimizer.zero_grad()
  81. return loss.item(),metric.item()
  82. # 测试train_step效果
  83. features,labels = next(iter(dl))
  84. train_step(model,features,labels)
  85. def train_model(model,epochs):
  86. for epoch in range(1,epochs+1):
  87. loss_list,metric_list = [],[]
  88. for features, labels in dl:
  89. lossi,metrici = train_step(model,features,labels)
  90. loss_list.append(lossi)
  91. metric_list.append(metrici)
  92. loss = np.mean(loss_list)
  93. metric = np.mean(metric_list)
  94. if epoch%100==0:
  95. printbar()
  96. print("epoch =",epoch,"loss = ",loss,"metric = ",metric)
  97. train_model(model,epochs = 300)
  98. # 结果可视化
  99. fig, (ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize = (12,5))
  100. ax1.scatter(Xp[:,0],Xp[:,1], c="r")
  101. ax1.scatter(Xn[:,0],Xn[:,1],c = "g")
  102. ax1.legend(["positive","negative"]);
  103. ax1.set_title("y_true");
  104. Xp_pred = X[torch.squeeze(model.forward(X)>=0.5)]
  105. Xn_pred = X[torch.squeeze(model.forward(X)<0.5)]
  106. ax2.scatter(Xp_pred[:,0],Xp_pred[:,1],c = "r")
  107. ax2.scatter(Xn_pred[:,0],Xn_pred[:,1],c = "g")
  108. ax2.legend(["positive","negative"]);
  109. ax2.set_title("y_pred");

2.3.3 高阶API示范

Pytorch没有官方的高阶API,一般需要用户自己实现训练循环、验证循环、和预测循环。

这里我们通过仿照tf.keras.Model的功能对Pytorch的nn.Module进行了封装, 实现了 fit, validate,predict, summary 方法,相当于用户自定义高阶API。 并在其基础上实现线性回归模型和DNN二分类模型。

  1. import os
  2. import datetime
  3. from torchkeras import Model, summary
  4. #打印时间
  5. def printbar():
  6. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  7. print("\n"+"=========="*8 + "%s"%nowtime)
  1. #例2-3-3-a 高阶API实现线性回归范例
  2. #第一步 准备数据
  3. import numpy as np
  4. import pandas as pd
  5. from matplotlib import pyplot as plt
  6. import torch
  7. from torch import nn
  8. import torch.nn.functional as F
  9. from torch.utils.data import Dataset,DataLoader,TensorDataset
  10. #样本数量
  11. n = 400
  12. # 生成测试用数据集
  13. X = 10*torch.rand([n,2])-5.0 #torch.rand是均匀分布
  14. w0 = torch.tensor([[2.0],[-3.0]])
  15. b0 = torch.tensor([[10.0]])
  16. Y = X@w0 + b0 + torch.normal( 0.0,2.0,size = [n,1]) # @表示矩阵乘法,增加正态扰动
  17. # 数据可视化
  18. %matplotlib inline
  19. %config InlineBackend.figure_format = 'svg'
  20. plt.figure(figsize = (12,5))
  21. ax1 = plt.subplot(121)
  22. ax1.scatter(X[:,0],Y[:,0], c = "b",label = "samples")
  23. ax1.legend()
  24. plt.xlabel("x1")
  25. plt.ylabel("y",rotation = 0)
  26. ax2 = plt.subplot(122)
  27. ax2.scatter(X[:,1],Y[:,0], c = "g",label = "samples")
  28. ax2.legend()
  29. plt.xlabel("x2")
  30. plt.ylabel("y",rotation = 0)
  31. plt.show()
  32. #构建输入数据管道
  33. ds = TensorDataset(X,Y)
  34. ds_train,ds_valid = torch.utils.data.random_split(ds,[int(400*0.7),400-
  35. int(400*0.7)])
  36. dl_train = DataLoader(ds_train,batch_size = 10,shuffle=True,num_workers=2)
  37. dl_valid = DataLoader(ds_valid,batch_size = 10,num_workers=2)
  38. #第二步 定义模型
  39. # 继承用户自定义模型
  40. from torchkeras import Model
  41. class LinearRegression(Model):
  42. def __init__(self):
  43. super(LinearRegression, self).__init__()
  44. self.fc = nn.Linear(2,1)
  45. def forward(self,x):
  46. return self.fc(x)
  47. model = LinearRegression()
  48. model.summary(input_shape = (2,))
  49. #第三步 训练模型
  50. ### 使用fit方法进行训练
  51. def mean_absolute_error(y_pred,y_true):
  52. return torch.mean(torch.abs(y_pred-y_true))
  53. def mean_absolute_percent_error(y_pred,y_true):
  54. absolute_percent_error = (torch.abs(y_pred-y_true)+1e-7)/
  55. (torch.abs(y_true)+1e-7)
  56. return torch.mean(absolute_percent_error)
  57. model.compile(loss_func = nn.MSELoss(),
  58. optimizer= torch.optim.Adam(model.parameters(),lr = 0.01),
  59. metrics_dict={"mae":mean_absolute_error,"mape":mean_absolute_percent_error})
  60. dfhistory = model.fit(200,dl_train = dl_train, dl_val = dl_valid,log_step_freq
  61. = 20)
  62. # 结果可视化
  63. %matplotlib inline
  64. %config InlineBackend.figure_format = 'svg'
  65. w,b = model.state_dict()["fc.weight"],model.state_dict()["fc.bias"]
  66. plt.figure(figsize = (12,5))
  67. ax1 = plt.subplot(121)
  68. ax1.scatter(X[:,0],Y[:,0], c = "b",label = "samples")
  69. ax1.plot(X[:,0],w[0,0]*X[:,0]+b[0],"-r",linewidth = 5.0,label = "model")
  70. ax1.legend()
  71. plt.xlabel("x1")
  72. plt.ylabel("y",rotation = 0)
  73. ax2 = plt.subplot(122)
  74. ax2.scatter(X[:,1],Y[:,0], c = "g",label = "samples")
  75. ax2.plot(X[:,1],w[0,1]*X[:,1]+b[0],"-r",linewidth = 5.0,label = "model")
  76. ax2.legend()
  77. plt.xlabel("x2")
  78. plt.ylabel("y",rotation = 0)
  79. plt.show()
  80. #第四步 评估模型
  81. dfhistory.tail()
  82. %matplotlib inline
  83. %config InlineBackend.figure_format = 'svg'
  84. import matplotlib.pyplot as plt
  85. def plot_metric(dfhistory, metric):
  86. train_metrics = dfhistory[metric]
  87. val_metrics = dfhistory['val_'+metric]
  88. epochs = range(1, len(train_metrics) + 1)
  89. plt.plot(epochs, train_metrics, 'bo--')
  90. plt.plot(epochs, val_metrics, 'ro-')
  91. plt.title('Training and validation '+ metric)
  92. plt.xlabel("Epochs")
  93. plt.ylabel(metric)
  94. plt.legend(["train_"+metric, 'val_'+metric])
  95. plt.show()
  96. plot_metric(dfhistory,"loss")
  97. plot_metric(dfhistory,"mape")
  98. # 评估
  99. model.evaluate(dl_valid)
  100. #第五步 使用模型
  101. # 预测
  102. dl = DataLoader(TensorDataset(X))
  103. model.predict(dl)[0:10]
  104. # 预测
  105. model.predict(dl_valid)[0:10]
  1. #例2-3-3-b 高阶API实现DNN二分类范例
  2. #第一步 准备数据
  3. import numpy as np
  4. import pandas as pd
  5. from matplotlib import pyplot as plt
  6. import torch
  7. from torch import nn
  8. import torch.nn.functional as F
  9. from torch.utils.data import Dataset,DataLoader,TensorDataset
  10. import torchkeras
  11. %matplotlib inline
  12. %config InlineBackend.figure_format = 'svg'
  13. #正负样本数量
  14. n_positive,n_negative = 2000,2000
  15. #生成正样本, 小圆环分布
  16. r_p = 5.0 + torch.normal(0.0,1.0,size = [n_positive,1])
  17. theta_p = 2*np.pi*torch.rand([n_positive,1])
  18. Xp = torch.cat([r_p*torch.cos(theta_p),r_p*torch.sin(theta_p)],axis = 1)
  19. Yp = torch.ones_like(r_p)
  20. #生成负样本, 大圆环分布
  21. r_n = 8.0 + torch.normal(0.0,1.0,size = [n_negative,1])
  22. theta_n = 2*np.pi*torch.rand([n_negative,1])
  23. Xn = torch.cat([r_n*torch.cos(theta_n),r_n*torch.sin(theta_n)],axis = 1)
  24. Yn = torch.zeros_like(r_n)
  25. #汇总样本
  26. X = torch.cat([Xp,Xn],axis = 0)
  27. Y = torch.cat([Yp,Yn],axis = 0)
  28. #可视化
  29. plt.figure(figsize = (6,6))
  30. plt.scatter(Xp[:,0],Xp[:,1],c = "r")
  31. plt.scatter(Xn[:,0],Xn[:,1],c = "g")
  32. plt.legend(["positive","negative"]);
  33. ds = TensorDataset(X,Y)
  34. ds_train,ds_valid = torch.utils.data.random_split(ds,
  35. [int(len(ds)*0.7),len(ds)-int(len(ds)*0.7)])
  36. dl_train = DataLoader(ds_train,batch_size = 100,shuffle=True,num_workers=2)
  37. dl_valid = DataLoader(ds_valid,batch_size = 100,num_workers=2)
  38. #第二步 定义模型
  39. class Net(nn.Module):
  40. def __init__(self):
  41. super().__init__()
  42. self.fc1 = nn.Linear(2,4)
  43. self.fc2 = nn.Linear(4,8)
  44. self.fc3 = nn.Linear(8,1)
  45. def forward(self,x):
  46. x = F.relu(self.fc1(x))
  47. x = F.relu(self.fc2(x))
  48. y = nn.Sigmoid()(self.fc3(x))
  49. return y
  50. model = torchkeras.Model(Net())
  51. model.summary(input_shape =(2,))
  52. #第三步 训练模型
  53. # 准确率
  54. def accuracy(y_pred,y_true):
  55. y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype =
  56. torch.float32),
  57. torch.zeros_like(y_pred,dtype = torch.float32))
  58. acc = torch.mean(1-torch.abs(y_true-y_pred))
  59. return acc
  60. model.compile(loss_func = nn.BCELoss(),optimizer=
  61. torch.optim.Adam(model.parameters(),lr = 0.01),
  62. metrics_dict={"accuracy":accuracy})
  63. dfhistory = model.fit(100,dl_train = dl_train,dl_val = dl_valid,log_step_freq
  64. = 10)
  65. # 结果可视化
  66. fig, (ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize = (12,5))
  67. ax1.scatter(Xp[:,0],Xp[:,1], c="r")
  68. ax1.scatter(Xn[:,0],Xn[:,1],c = "g")
  69. ax1.legend(["positive","negative"]);
  70. ax1.set_title("y_true");
  71. Xp_pred = X[torch.squeeze(model.forward(X)>=0.5)]
  72. Xn_pred = X[torch.squeeze(model.forward(X)<0.5)]
  73. ax2.scatter(Xp_pred[:,0],Xp_pred[:,1],c = "r")
  74. ax2.scatter(Xn_pred[:,0],Xn_pred[:,1],c = "g")
  75. ax2.legend(["positive","negative"]);
  76. ax2.set_title("y_pred");
  77. #第四步 评估模型
  78. %matplotlib inline
  79. %config InlineBackend.figure_format = 'svg'
  80. import matplotlib.pyplot as plt
  81. def plot_metric(dfhistory, metric):
  82. train_metrics = dfhistory[metric]
  83. val_metrics = dfhistory['val_'+metric]
  84. epochs = range(1, len(train_metrics) + 1)
  85. plt.plot(epochs, train_metrics, 'bo--')
  86. plt.plot(epochs, val_metrics, 'ro-')
  87. plt.title('Training and validation '+ metric)
  88. plt.xlabel("Epochs")
  89. plt.ylabel(metric)
  90. plt.legend(["train_"+metric, 'val_'+metric])
  91. plt.show()
  92. plot_metric(dfhistory,"loss")
  93. model.evaluate(dl_valid)
  94. #第五步 使用模型
  95. model.predict(dl_valid)[0:10]

3、pytorch的数据加载和处理 -- Dataset和DataLoader

Pytorch通常使用Dataset和DataLoader这两个工具类来构建数据管道。

Dataset定义了数据集的内容,它相当于一个类似列表的数据结构,具有确定的长度,能够用索 引获取数据集中的元素。

而DataLoader定义了按batch加载数据集的方法,它是一个实现了 __iter__ 方法的可迭代对 象,每次迭代输出一个batch的数据。

DataLoader能够控制batch的大小,batch中元素的采样方法,以及将batch结果整理成模型所需 输入形式的方法,并且能够使用多进程读取数据。

在绝大部分情况下,用户只需实现Dataset的 __len__ 方法和 __getitem__ 方法,就可以轻松构 建自己的数据集,并用默认数据管道进行加载

3.1  Dataset和DataLoader概述

3.1.1 获取一个batch数据的步骤

让我们考虑一下从一个数据集中获取一个batch的数据需要哪些步骤。 (假定数据集的特征和标签分别表示为张量 X 和 Y ,数据集可以表示为 (X,Y) , 假定batch大小为 m )

a) 首先我们要确定数据集的长度 n 。 结果类似: n = 1000 。

b) 然后我们从 0 到 n-1 的范围中抽样出 m 个数(batch大小)。

假定 m=4 , 拿到的结果是一个列表,类似: indices = [1,4,8,9]

c) 接着我们从数据集中去取这 m 个数对应下标的元素。 拿到的结果是一个元组列表,类似: samples = [(X[1],Y[1]),(X[4],Y[4]),(X[8],Y[8]), (X[9],Y[9])]

d) 最后我们将结果整理成两个张量作为输出。 拿到的结果是两个张量,

类似 batch = (features,labels)   

其中 features = torch.stack([X[1],X[4],X[8],X[9]])   labels = torch.stack([Y[1],Y[4],Y[8],Y[9]])

3.1.2 Dataset和DataLoader的功能分工

上述第a个步骤确定数据集的长度是由 Dataset的 __len__ 方法实现的。

第2个步骤从 0 到 n-1 的范围中抽样出 m 个数的方法是由 DataLoader的 sampler 和 batch_sampler 参数指定的。

sampler 参数指定单个元素抽样方法,一般无需用户设置,程序默认在DataLoader的参数 shuffle=True 时采用随机抽样, shuffle=False 时采用顺序抽样。

batch_sampler 参数将多个抽样的元素整理成一个列表,一般无需用户设置,默认方法在 DataLoader的参数 drop_last=True 时会丢弃数据集最后一个长度不能被batch大小整除的批次,在 drop_last=False 时保留最后一个批次。

第3个步骤的核心逻辑根据下标取数据集中的元素 是由 Dataset的 __getitem__ 方法实现的。

第4个步骤的逻辑由DataLoader的参数 collate_fn 指定。一般情况下也无需用户设置。

3.1.3 Dataset和DataLoader的主要接口

以下是 Dataset和 DataLoader的核心接口逻辑伪代码,不完全和源码一致

  1. #例3-1-3 Dataset和DataLoader的主要接口
  2. import torch
  3. class Dataset(object):
  4. def __init__(self):
  5. pass
  6. def __len__(self):
  7. raise NotImplementedError
  8. def __getitem__(self,index):
  9. raise NotImplementedError
  10. class DataLoader(object):
  11. def __init__(self,dataset,batch_size,collate_fn,shuffle = True,drop_last = False):
  12. self.dataset = dataset
  13. self.sampler =torch.utils.data.RandomSampler if shuffle else \
  14. torch.utils.data.SequentialSampler
  15. self.batch_sampler = torch.utils.data.BatchSampler
  16. self.sample_iter = self.batch_sampler(self.sampler(range(len(dataset))), batch_size = batch_size,drop_last = drop_last)
  17. def __next__(self):
  18. indices = next(self.sample_iter)
  19. batch = self.collate_fn([self.dataset[i] for i in indices])
  20. return batch

3.2  使用Dataset创建数据集

Dataset创建数据集常用的方法有:

a) 使用 torch.utils.data.TensorDataset 根据Tensor创建数据集(numpy的array,Pandas的 DataFrame需要先转换成Tensor)。

b) 使用 torchvision.datasets.ImageFolder 根据图片目录创建图片数据集。

c) 继承 torch.utils.data.Dataset 创建自定义数据集。

d) 此外,还可以通过 torch.utils.data.random_split 将一个数据集分割成多份,常用于分割训练集,验证集和测试 集。

e) 调用Dataset的加法运算符( + )将多个数据集合并成一个数据集。

  1. #例3-2-1 根据Tensor创建数据集
  2. import numpy as np
  3. import torch
  4. from torch.utils.data import TensorDataset,Dataset,DataLoader,random_split
  5. # 根据Tensor创建数据集
  6. from sklearn import datasets
  7. iris = datasets.load_iris()
  8. ds_iris = TensorDataset(torch.tensor(iris.data),torch.tensor(iris.target))
  9. # 分割成训练集和预测集
  10. n_train = int(len(ds_iris)*0.8)
  11. n_valid = len(ds_iris) - n_train
  12. ds_train,ds_valid = random_split(ds_iris,[n_train,n_valid])
  13. print(type(ds_iris))
  14. print(type(ds_train))
  15. out:
  16. <class 'torch.utils.data.dataset.TensorDataset'>
  17. <class 'torch.utils.data.dataset.Subset'>
  18. # 使用DataLoader加载数据集
  19. dl_train,dl_valid = DataLoader(ds_train,batch_size =
  20. 8),DataLoader(ds_valid,batch_size = 8)
  21. for features,labels in dl_train:
  22. print(features,labels)
  23. break
  24. out:
  25. tensor([[6.5000, 3.0000, 5.2000, 2.0000],
  26. [6.3000, 3.4000, 5.6000, 2.4000],
  27. [4.9000, 2.4000, 3.3000, 1.0000],
  28. [6.7000, 3.1000, 4.7000, 1.5000],
  29. [4.5000, 2.3000, 1.3000, 0.3000],
  30. [5.7000, 2.5000, 5.0000, 2.0000],
  31. [5.2000, 4.1000, 1.5000, 0.1000],
  32. [5.7000, 2.6000, 3.5000, 1.0000]], dtype=torch.float64) tensor([2, 2, 1, 1, 0, 2, 0, 1], dtype=torch.int32)
  33. # 演示加法运算符(`+`)的合并作用
  34. ds_data = ds_train + ds_valid
  35. print('len(ds_train) = ',len(ds_train))
  36. print('len(ds_valid) = ',len(ds_valid))
  37. print('len(ds_train+ds_valid) = ',len(ds_data))
  38. print(type(ds_data))
  39. out:
  40. len(ds_train) = 120
  41. len(ds_valid) = 30
  42. len(ds_train+ds_valid) = 150
  43. <class 'torch.utils.data.dataset.ConcatDataset'>
  1. #3-2-2 根据图片目录创建图片数据集
  2. import numpy as np
  3. import torch
  4. from torch.utils.data import DataLoader
  5. from torchvision import transforms,datasets
  6. #演示一些常用的图片增强操作
  7. from PIL import Image
  8. img = Image.open('./data/dog2.jpg')
  9. img
  10. # 随机数值翻转
  11. transforms.RandomVerticalFlip()(img)
  12. #随机旋转
  13. transforms.RandomRotation(45)(img)
  14. # 定义图片增强操作
  15. transform_train = transforms.Compose([
  16. transforms.RandomHorizontalFlip(), #随机水平翻转
  17. transforms.RandomVerticalFlip(), #随机垂直翻转
  18. transforms.RandomRotation(45), #随机在45度角度内旋转
  19. transforms.ToTensor() #转换成张量
  20. ]
  21. )
  22. transform_valid = transforms.Compose([
  23. transforms.ToTensor()
  24. ]
  25. )
  26. # 根据图片目录创建数据集
  27. # 这里用到的animal数据集是我自己整理的,链接在文章末尾
  28. #注意这里要在train 和 test 目录下按照图片类别分别新建文件夹,文件夹的名称就是类别名,然后把图片分别放入各个文件夹
  29. ds_train = datasets.ImageFolder("data/animal/train/", transform = transform_train,target_transform= lambda t:torch.tensor([t]).float())
  30. ds_valid = datasets.ImageFolder("data/animal/test/", transform = transform_valid,target_transform= lambda t:torch.tensor([t]).float())
  31. print(ds_train.class_to_idx)
  32. # 使用DataLoader加载数据集
  33. dl_train = DataLoader(ds_train,batch_size = 2,shuffle = True,num_workers=1)
  34. dl_valid = DataLoader(ds_valid,batch_size = 2,shuffle = True,num_workers=1)
  35. for features,labels in dl_train:
  36. print(features)
  37. print(labels)
  38. break
  1. #例3-2-3 创建自定义数据集
  2. #下面通过继承Dataset类创建douban文本分类任务的自定义数据集。 douban数据集链接在文章末尾。
  3. #大概思路如下:首先,对训练集文本分词构建词典。然后将训练集文本和测试集文本数据转换成 token单词编码。 接着将转换成单词编码的训练集数据和测试集数据按样本分割成多个文件,一个文件代表一个样本。 最后,我们可以根据文件名列表获取对应序号的样本内容,从而构建Dataset数据集。
  4. import numpy as np
  5. import pandas as pd
  6. from collections import OrderedDict
  7. import re,string,jieba,csv
  8. #from keras.datasets import imdb
  9. #(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
  10. MAX_WORDS = 10000 # 仅考虑最高频的10000个词
  11. MAX_LEN = 200 # 每个样本保留200个词的长度
  12. BATCH_SIZE = 20
  13. train_data_path = 'data/douban/train.csv'
  14. test_data_path = 'data/douban/test.csv'
  15. train_token_path = 'data/douban/train_token.csv'
  16. test_token_path = 'data/douban/test_token.csv'
  17. train_samples_path = 'data/douban/train_samples/'
  18. test_samples_path = 'data/douban/test_samples/'
  19. #print(train_data[0])
  20. ##构建词典
  21. word_count_dict = {}
  22. #清洗文本
  23. def clean_text(text):
  24. bd='[’!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~]+,。!?“”《》:、. '
  25. for i in bd:
  26. text=text.replace(i,'') #字符串替换去标点符号
  27. fenci=jieba.lcut(text)
  28. return fenci
  29. with open(train_data_path,"r",encoding = 'utf-8',newline='') as f:
  30. reader = csv.reader(f,delimiter=',')
  31. for row in reader:
  32. #print(row)
  33. text = row[1]
  34. label = row[0]
  35. #print(label,text)
  36. cleaned_text = clean_text(text)
  37. for word in cleaned_text:
  38. #print(word)
  39. word_count_dict[word] = word_count_dict.get(word,0)+1
  40. print(len(word_count_dict))
  41. df_word_dict = pd.DataFrame(pd.Series(word_count_dict,name = "count"))
  42. df_word_dict = df_word_dict.sort_values(by = "count",ascending =False)
  43. df_word_dict = df_word_dict[0:MAX_WORDS-2] #
  44. df_word_dict["word_id"] = range(2,MAX_WORDS) #编号0和1分别留给未知词<unkown>和填充<padding>
  45. word_id_dict = df_word_dict["word_id"].to_dict()
  46. df_word_dict.head(10)
  47. out:
  48. count word_id
  49. 68229 2
  50. 20591 3
  51. 15321 4
  52. 9312 5
  53. 7423 6
  54. 7395 7
  55. 7256 8
  56. 7053 9
  57. 6753 10
  58. 6388 11
  59. #转换token
  60. # 填充文本
  61. def pad(data_list,pad_length):
  62. padded_list = data_list.copy()
  63. if len(data_list)> pad_length:
  64. padded_list = data_list[-pad_length:]
  65. if len(data_list)< pad_length:
  66. padded_list = [1]*(pad_length-len(data_list))+data_list
  67. return padded_list
  68. def text_to_token(text_file,token_file):
  69. with open(train_data_path,"r",encoding = 'utf-8',newline='') as f,\
  70. open(token_file,"w",encoding = 'utf-8') as fout:
  71. reader = csv.reader(f,delimiter=',')
  72. for row in reader:
  73. text = row[1]
  74. label = row[0]
  75. cleaned_text = clean_text(text)
  76. word_token_list = [word_id_dict.get(word, 0) for word in cleaned_text]
  77. pad_list = pad(word_token_list,MAX_LEN)
  78. out_line = label+"\t"+" ".join([str(x) for x in pad_list])
  79. fout.write(out_line+"\n")
  80. text_to_token(train_data_path,train_token_path)
  81. text_to_token(test_data_path,test_token_path)
  82. # 分割样本
  83. import os
  84. if not os.path.exists(train_samples_path):
  85. os.mkdir(train_samples_path)
  86. if not os.path.exists(test_samples_path):
  87. os.mkdir(test_samples_path)
  88. def split_samples(token_path,samples_dir):
  89. with open(token_path,"r",encoding = 'utf-8') as fin:
  90. i = 0
  91. for line in fin:
  92. with open(samples_dir+"%d.txt"%i,"w",encoding = "utf-8") as fout:
  93. fout.write(line)
  94. i = i+1
  95. split_samples(train_token_path,train_samples_path)
  96. split_samples(test_token_path,test_samples_path)
  97. #创建数据集
  98. import os
  99. import torch
  100. from torch.utils.data import DataLoader,Dataset
  101. from torchvision import transforms,datasets
  102. class imdbDataset(Dataset):
  103. def __init__(self,samples_dir):
  104. self.samples_dir = samples_dir
  105. self.samples_paths = os.listdir(samples_dir)
  106. def __len__(self):
  107. return len(self.samples_paths)
  108. def __getitem__(self,index):
  109. path = self.samples_dir + self.samples_paths[index]
  110. with open(path,"r",encoding = "utf-8") as f:
  111. line = f.readline()
  112. label,tokens = line.split("\t")
  113. label = torch.tensor([float(label)],dtype = torch.float)
  114. feature = torch.tensor([int(x) for x in tokens.split(" ")],dtype = torch.long)
  115. return (feature,label)
  116. ds_train = imdbDataset(train_samples_path)
  117. ds_test = imdbDataset(test_samples_path)
  118. print(len(ds_train))
  119. print(len(ds_test))
  120. dl_train = DataLoader(ds_train,batch_size = BATCH_SIZE,shuffle = True,num_workers=4)
  121. dl_test = DataLoader(ds_test,batch_size = BATCH_SIZE,num_workers=4)
  122. for features,labels in dl_train:
  123. print(features)
  124. print(labels)
  125. break
  126. out:
  127. #创建模型
  128. import torch
  129. from torch import nn
  130. import importlib
  131. from torchkeras import Model,summary
  132. class Net(Model):
  133. def __init__(self):
  134. super(Net, self).__init__() #设置padding_idx参数后将在训练过程中将填充的token始终赋值为0向量
  135. self.embedding = nn.Embedding(num_embeddings = MAX_WORDS,embedding_dim = 3,padding_idx = 1)
  136. self.conv = nn.Sequential()
  137. self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
  138. self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
  139. self.conv.add_module("relu_1",nn.ReLU())
  140. self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
  141. self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
  142. self.conv.add_module("relu_2",nn.ReLU())
  143. self.dense = nn.Sequential()
  144. self.dense.add_module("flatten",nn.Flatten())
  145. self.dense.add_module("linear",nn.Linear(6144,1))
  146. self.dense.add_module("sigmoid",nn.Sigmoid())
  147. def forward(self,x):
  148. x = self.embedding(x).transpose(1,2)
  149. x = self.conv(x)
  150. y = self.dense(x)
  151. return y
  152. model = Net()
  153. print(model)
  154. model.summary(input_shape = (200,),input_dtype = torch.LongTensor)
  155. # 编译模型
  156. def accuracy(y_pred,y_true):
  157. y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype = torch.float32),torch.zeros_like(y_pred,dtype = torch.float32))
  158. acc = torch.mean(1-torch.abs(y_true-y_pred))
  159. return acc
  160. model.compile(loss_func = nn.BCELoss(),optimizer=
  161. torch.optim.Adagrad(model.parameters(),lr = 0.02),
  162. metrics_dict={"accuracy":accuracy})
  163. # 训练模型
  164. dfhistory = model.fit(10,dl_train,dl_val=dl_test,log_step_freq= 200)

3.3  使用DataLoader加载数据集

DataLoader能够控制batch的大小,batch中元素的采样方法,以及将batch结果整理成模型所需输入形式的方法,并且能够使用多进程读取数据。

DataLoader的函数签名如下

  1. DataLoader(
  2. dataset,
  3. batch_size=1,
  4. shuffle=False,
  5. sampler=None,
  6. batch_sampler=None,
  7. num_workers=0,
  8. collate_fn=None,
  9. pin_memory=False,
  10. drop_last=False,
  11. timeout=0,
  12. worker_init_fn=None,
  13. multiprocessing_context=None,
  14. )

一般情况下,我们仅仅会配置 dataset, batch_size, shuffle, num_workers, drop_last这五个参 数,其他参数使用默认值即可。

dataset : 数据集

batch_size: 批次大小

shuffle: 是否乱序

sampler: 样本采样函数,一般无需设置。

batch_sampler: 批次采样函数,一般无需设置。

num_workers: 使用多进程读取数据,设置的进程数。

collate_fn: 整理一个批次数据的函数。

pin_memory: 是否设置为锁业内存。默认为False,锁业内存不会使用虚拟内存(硬盘),从锁 业内存拷贝到GPU上速度会更快。

drop_last: 是否丢弃最后一个样本数量不足batch_size批次数据。

timeout: 加载一个数据批次的最长等待时间,一般无需设置。

worker_init_fn: 每个worker中dataset的初始化函数,常用于 IterableDataset。一般不使用。 

DataLoader除了可以加载我们前面讲的 torch.utils.data.Dataset 外,还能够加载另外一种数据集 torch.utils.data.IterableDataset。 和Dataset数据集相当于一种列表结构不同,IterableDataset相当于一种迭代器结构。 它更加复 杂,一般较少使用。

  1. #例3-3 使用DataLoader加载数据集
  2. import numpy as np
  3. import torch
  4. from torch.utils.data import DataLoader,TensorDataset,Dataset
  5. from torchvision import transforms,datasets
  6. #构建输入数据管道
  7. ds = TensorDataset(torch.arange(1,50))
  8. dl = DataLoader(ds,
  9. batch_size = 10,
  10. shuffle= True,
  11. num_workers=2,
  12. drop_last = True)
  13. #迭代数据
  14. for batch, in dl:
  15. print(batch)
  16. out:
  17. tensor([35, 19, 3, 1, 24, 20, 8, 37, 32, 38])
  18. tensor([28, 26, 7, 48, 4, 41, 15, 45, 11, 14])
  19. tensor([23, 5, 10, 6, 18, 39, 31, 22, 42, 12])
  20. tensor([34, 47, 30, 25, 29, 49, 44, 46, 33, 13])

4、pytorch的模型搭建--torch.nn模块

4.1  nn.functional 和 nn.Module

4.1.1 nn.functional 和 nn.Module

        前面我们介绍了Pytorch的张量的结构操作和数学运算中的一些常用API。 利用这些张量的API我们可以构建出神经网络相关的组件(如激活函数,模型层,损失函数)。

        Pytorch和神经网络相关的功能组件大多都封装在 torch.nn模块下。 这些功能组件的绝大部分既有函数形式实现,也有类形式实现。

        其中nn.functional(一般引入后改名为F)有各种功能组件的函数实现。例如:

(激活函数) * F.relu * F.sigmoid * F.tanh * F.softmax

(模型层) * F.linear * F.conv2d * F.max_pool2d * F.dropout2d * F.embedding

(损失函数) * F.binary_cross_entropy * F.mse_loss * F.cross_entropy

        为了便于对参数进行管理,一般通过继承 nn.Module 转换成为类的实现形式,并直接封装在 nn 模块下。例如:

       (激活函数) * nn.ReLU * nn.Sigmoid * nn.Tanh * nn.Softmax

       (模型层) * nn.Linear * nn.Conv2d * nn.MaxPool2d * nn.Dropout2d * nn.Embedding

       (损失函数) * nn.BCELoss * nn.MSELoss * nn.CrossEntropyLoss

       实际上nn.Module除了可以管理其引用的各种参数,还可以管理其引用的子模块,功能十分强 大。

4.1.2 使用nn.Module来管理参数

在Pytorch中,模型的参数是需要被优化器训练的,因此,通常要设置参数为 requires_grad = True 的张量。 同时,在一个模型中,往往有许多的参数,要手动管理这些参数并不是一件容易的事情。 Pytorch一般将参数用nn.Parameter来表示,并且用nn.Module来管理其结构下的所有参数。

  1. #例4-1-2 使用nn.Module来管理参数
  2. import torch
  3. from torch import nn
  4. import torch.nn.functional as F
  5. from matplotlib import pyplot as plt
  6. # nn.Parameter 具有 requires_grad = True 属性
  7. w = nn.Parameter(torch.randn(2,2))
  8. print(w)
  9. print(w.requires_grad)
  10. out:
  11. Parameter containing:
  12. tensor([[ 0.8579, -0.3747],
  13. [-0.1361, 0.2524]], requires_grad=True)
  14. True
  15. # nn.ParameterList 可以将多个nn.Parameter组成一个列表
  16. params_list = nn.ParameterList([nn.Parameter(torch.rand(8,i)) for i in
  17. range(1,3)])
  18. print(params_list)
  19. print(params_list[0].requires_grad)
  20. out:
  21. ParameterList(
  22. (0): Parameter containing: [torch.FloatTensor of size 8x1]
  23. (1): Parameter containing: [torch.FloatTensor of size 8x2]
  24. )
  25. True
  26. # nn.ParameterDict 可以将多个nn.Parameter组成一个字典
  27. params_dict = nn.ParameterDict({"a":nn.Parameter(torch.rand(2,2)),
  28. "b":nn.Parameter(torch.zeros(2))})
  29. print(params_dict)
  30. print(params_dict["a"].requires_grad)
  31. out:
  32. ParameterDict(
  33. (a): Parameter containing: [torch.FloatTensor of size 2x2]
  34. (b): Parameter containing: [torch.FloatTensor of size 2]
  35. )
  36. True
  37. # 可以用Module将它们管理起来
  38. # module.parameters()返回一个生成器,包括其结构下的所有parameters
  39. module = nn.Module()
  40. module.w = w
  41. module.params_list = params_list
  42. module.params_dict = params_dict
  43. num_param = 0
  44. for param in module.parameters():
  45. print(param,"\n")
  46. num_param = num_param + 1
  47. print("number of Parameters =",num_param)
  48. out:
  49. Parameter containing:
  50. tensor([[ 0.8579, -0.3747],
  51. [-0.1361, 0.2524]], requires_grad=True)
  52. Parameter containing:
  53. tensor([[0.9753],
  54. [0.1606],
  55. [0.2186],
  56. [0.6484],
  57. [0.8174],
  58. [0.2587],
  59. [0.5496],
  60. [0.7685]], requires_grad=True)
  61. Parameter containing:
  62. tensor([[0.5034, 0.2805],
  63. [0.9023, 0.1758],
  64. [0.1499, 0.5110],
  65. [0.2113, 0.4445],
  66. [0.6116, 0.8562],
  67. [0.2120, 0.8932],
  68. [0.3098, 0.9548],
  69. [0.4298, 0.4322]], requires_grad=True)
  70. Parameter containing:
  71. tensor([[0.4966, 0.5429],
  72. [0.8729, 0.5744]], requires_grad=True)
  73. Parameter containing:
  74. tensor([0., 0.], requires_grad=True)
  75. number of Parameters = 5
  76. #实践当中,一般通过继承nn.Module来构建模块类,并将所有含有需要学习的参数的部分放在构造函数中。
  77. #以下范例为Pytorch中nn.Linear的源码的简化版本
  78. #可以看到它将需要学习的参数放在了__init__构造函数中,并在forward中调用F.linear函数来实现计算逻辑。
  79. class Linear(nn.Module):
  80. __constants__ = ['in_features', 'out_features']
  81. def __init__(self, in_features, out_features, bias=True):
  82. super(Linear, self).__init__()
  83. self.in_features = in_features
  84. self.out_features = out_features
  85. self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
  86. if bias:
  87. self.bias = nn.Parameter(torch.Tensor(out_features))
  88. else:
  89. self.register_parameter('bias', None)
  90. def forward(self, input):
  91. return F.linear(input, self.weight, self.bias)

4.1.3 使用nn.Module来管理子模块

        一般情况下,我们都很少直接使用 nn.Parameter来定义参数构建模型,而是通过一些拼装一些常用的模型层来构造模型。

这些模型层也是继承自nn.Module的对象,本身也包括参数,属于我们要定义的模块的子模块。 nn.Module提供了一些方法可以管理这些子模块。

children() 方法: 返回生成器,包括模块下的所有子模块。

named_children()方法:返回一个生成器,包括模块下的所有子模块,以及它们的名字。

modules()方法:返回一个生成器,包括模块下的所有各个层级的模块,包括模块本身。

named_modules()方法:返回一个生成器,包括模块下的所有各个层级的模块以及它们的名 字,包括模块本身。

其中chidren()方法和named_children()方法较多使用。

modules()方法和named_modules()方法较少使用,其功能可以通过多个named_children()的嵌 套使用实现。

  1. #例4-1-3 使用nn.Module来管理子模块
  2. import torch
  3. from torch import nn
  4. import torch.nn.functional as F
  5. class Net(nn.Module):
  6. def __init__(self):
  7. super(Net, self).__init__()
  8. self.embedding = nn.Embedding(num_embeddings = 10000,embedding_dim = 3,padding_idx = 1)
  9. self.conv = nn.Sequential()
  10. self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
  11. self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
  12. self.conv.add_module("relu_1",nn.ReLU())
  13. self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
  14. self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
  15. self.conv.add_module("relu_2",nn.ReLU())
  16. self.dense = nn.Sequential()
  17. self.dense.add_module("flatten",nn.Flatten())
  18. self.dense.add_module("linear",nn.Linear(6144,1))
  19. self.dense.add_module("sigmoid",nn.Sigmoid())
  20. def forward(self,x):
  21. x = self.embedding(x).transpose(1,2)
  22. x = self.conv(x)
  23. y = self.dense(x)
  24. return y
  25. net = Net()
  26. i = 0
  27. for child in net.children():
  28. i+=1
  29. print(child,"\n")
  30. print("child number",i)
  31. out:
  32. Sequential(
  33. (flatten): Flatten(start_dim=1, end_dim=-1)
  34. (linear): Linear(in_features=6144, out_features=1, bias=True)
  35. (sigmoid): Sigmoid()
  36. )
  37. child number 3
  38. i = 0
  39. for name,child in net.named_children():
  40. i+=1
  41. print(name,":",child,"\n")
  42. print("child number",i)
  43. out:
  44. embedding : Embedding(10000, 3, padding_idx=1)
  45. conv : Sequential(
  46. (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  47. (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  48. (relu_1): ReLU()
  49. (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  50. (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  51. (relu_2): ReLU()
  52. )
  53. dense : Sequential(
  54. (flatten): Flatten(start_dim=1, end_dim=-1)
  55. (linear): Linear(in_features=6144, out_features=1, bias=True)
  56. (sigmoid): Sigmoid()
  57. )
  58. child number 3
  59. i = 0
  60. for module in net.modules():
  61. i+=1
  62. print(module)
  63. print("module number:",i)
  64. out:
  65. Net(
  66. (embedding): Embedding(10000, 3, padding_idx=1)
  67. (conv): Sequential(
  68. (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  69. (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  70. (relu_1): ReLU()
  71. (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  72. (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  73. (relu_2): ReLU()
  74. )
  75. (dense): Sequential(
  76. (flatten): Flatten(start_dim=1, end_dim=-1)
  77. (linear): Linear(in_features=6144, out_features=1, bias=True)
  78. (sigmoid): Sigmoid()
  79. )
  80. )
  81. Embedding(10000, 3, padding_idx=1)
  82. Sequential(
  83. (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  84. (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  85. (relu_1): ReLU()
  86. (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  87. (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  88. (relu_2): ReLU()
  89. )
  90. Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  91. MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  92. ReLU()
  93. Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  94. MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  95. ReLU()
  96. Sequential(
  97. (flatten): Flatten(start_dim=1, end_dim=-1)
  98. (linear): Linear(in_features=6144, out_features=1, bias=True)
  99. (sigmoid): Sigmoid()
  100. )
  101. Flatten(start_dim=1, end_dim=-1)
  102. Linear(in_features=6144, out_features=1, bias=True)
  103. Sigmoid()
  104. module number: 13
  105. #下面我们通过named_children方法找到embedding层,并将其参数设置为不可训练(相当于冻结embedding层)。
  106. children_dict = {name:module for name,module in net.named_children()}
  107. print(children_dict)
  108. embedding = children_dict["embedding"]
  109. embedding.requires_grad_(False) #冻结其参数
  110. out:
  111. {'embedding': Embedding(10000, 3, padding_idx=1), 'conv': Sequential(
  112. (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  113. (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  114. (relu_1): ReLU()
  115. (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  116. (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  117. (relu_2): ReLU()
  118. ), 'dense': Sequential(
  119. (flatten): Flatten(start_dim=1, end_dim=-1)
  120. (linear): Linear(in_features=6144, out_features=1, bias=True)
  121. (sigmoid): Sigmoid()
  122. )}
  123. #可以看到其第一层的参数已经不可以被训练了。
  124. for param in embedding.parameters():
  125. print(param.requires_grad)
  126. print(param.numel())
  127. out:
  128. ----------------------------------------------------------------
  129. Layer (type) Output Shape Param #
  130. ================================================================
  131. Embedding-1 [-1, 200, 3] 30,000
  132. Conv1d-2 [-1, 16, 196] 256
  133. MaxPool1d-3 [-1, 16, 98] 0
  134. ReLU-4 [-1, 16, 98] 0
  135. Conv1d-5 [-1, 128, 97] 4,224
  136. MaxPool1d-6 [-1, 128, 48] 0
  137. ReLU-7 [-1, 128, 48] 0
  138. Flatten-8 [-1, 6144] 0
  139. Linear-9 [-1, 1] 6,145
  140. Sigmoid-10 [-1, 1] 0
  141. ================================================================
  142. Total params: 40,625
  143. Trainable params: 10,625
  144. Non-trainable params: 30,000
  145. ----------------------------------------------------------------
  146. Input size (MB): 0.000763
  147. Forward/backward pass size (MB): 0.287796
  148. Params size (MB): 0.154972
  149. Estimated Total Size (MB): 0.443531
  150. ----------------------------------------------------------------

4.2  模型层layers

深度学习模型一般由各种模型层组合而成。

torch.nn中内置了非常丰富的各种模型层。它们都属于nn.Module的子类,具备参数管理功能。

例如:

nn.Linear, nn.Flatten, nn.Dropout, nn.BatchNorm2d

nn.Conv2d,nn.AvgPool2d,nn.Conv1d,nn.ConvTranspose2d

nn.Embedding,nn.GRU,nn.LSTM

nn.Transformer

如果这些内置模型层不能够满足需求,我们也可以通过继承nn.Module基类构建自定义的模型层。

实际上,pytorch不区分模型和模型层,都是通过继承nn.Module进行构建。

因此,我们只要继承nn.Module基类并实现forward方法即可自定义模型层。

4.2.1 内置模型层

基础层

nn.Linear:全连接层。参数个数 = 输入层特征数× 输出层特征数(weight)+ 输出层特征数 (bias)

nn.Flatten:压平层,用于将多维张量样本压成一维张量样本。

nn.BatchNorm1d:一维批标准化层。通过线性变换将输入批次缩放平移到稳定的均值和标 准差。可以增强模型对输入不同分布的适应性,加快模型训练速度,有轻微正则化效果。一 般在激活函数之前使用。可以用afine参数设置该层是否含有可以训练的参数。

nn.BatchNorm2d:二维批标准化层。

nn.BatchNorm3d:三维批标准化层。

nn.Dropout:一维随机丢弃层。一种正则化手段。

nn.Dropout2d:二维随机丢弃层。

nn.Dropout3d:三维随机丢弃层。

nn.Threshold:限幅层。当输入大于或小于阈值范围时,截断之。

nn.ConstantPad2d: 二维常数填充层。对二维张量样本填充常数扩展长度。

nn.ReplicationPad1d: 一维复制填充层。对一维张量样本通过复制边缘值填充扩展长度。

nn.ZeroPad2d:二维零值填充层。对二维张量样本在边缘填充0值.

nn.GroupNorm:组归一化。一种替代批归一化的方法,将通道分成若干组进行归一。不受 batch大小限制,据称性能和效果都优于BatchNorm。

nn.LayerNorm:层归一化。较少使用。

nn.InstanceNorm2d: 样本归一化。较少使用。

卷积网络相关层

nn.Conv1d:普通一维卷积,常用于文本。参数个数 = 输入通道数×卷积核尺寸(如3)×卷积核 个数 + 卷积核尺寸(如3)

nn.Conv2d:普通二维卷积,常用于图像。参数个数 = 输入通道数×卷积核尺寸(如3乘3)×卷 积核个数 + 卷积核尺寸(如3乘3) 通过调整dilation参数大于1,可以变成空洞卷积,增大卷积 核感受野。 通过调整groups参数不为1,可以变成分组卷积。分组卷积中不同分组使用相同 的卷积核,显著减少参数数量。 当groups参数等于通道数时,相当于tensorflow中的二维深 度卷积层tf.keras.layers.DepthwiseConv2D。 利用分组卷积和1乘1卷积的组合操作,可以构 造相当于Keras中的二维深度可分离卷积层tf.keras.layers.SeparableConv2D。

nn.Conv3d:普通三维卷积,常用于视频。参数个数 = 输入通道数×卷积核尺寸(如3乘3乘3)× 卷积核个数 + 卷积核尺寸(如3乘3乘3) 。

nn.MaxPool1d: 一维最大池化。

nn.MaxPool2d:二维最大池化。一种下采样方式。没有需要训练的参数。 

nn.MaxPool3d:三维最大池化。

nn.AdaptiveMaxPool2d:二维自适应最大池化。无论输入图像的尺寸如何变化,输出的图 像尺寸是固定的。 该函数的实现原理,大概是通过输入图像的尺寸和要得到的输出图像的 尺寸来反向推算池化算子的padding,stride等参数。

nn.FractionalMaxPool2d:二维分数最大池化。普通最大池化通常输入尺寸是输出的整数 倍。而分数最大池化则可以不必是整数。分数最大池化使用了一些随机采样策略,有一定的 正则效果,可以用它来代替普通最大池化和Dropout层。

nn.AvgPool2d:二维平均池化。

nn.AdaptiveAvgPool2d:二维自适应平均池化。无论输入的维度如何变化,输出的维度是固 定的。

nn.ConvTranspose2d:二维卷积转置层,俗称反卷积层。并非卷积的逆操作,但在卷积核 相同的情况下,当其输入尺寸是卷积操作输出尺寸的情况下,卷积转置的输出尺寸恰好是卷 积操作的输入尺寸。在语义分割中可用于上采样。

nn.Upsample:上采样层,操作效果和池化相反。可以通过mode参数控制上采样策略 为”nearest”最邻近策略或”linear”线性插值策略。

nn.Unfold:滑动窗口提取层。其参数和卷积操作nn.Conv2d相同。实际上,卷积操作可以等 价于nn.Unfold和nn.Linear以及nn.Fold的一个组合。 其中nn.Unfold操作可以从输入中提取各 个滑动窗口的数值矩阵,并将其压平成一维。利用nn.Linear将nn.Unfold的输出和卷积核做 乘法后,再使用 nn.Fold操作将结果转换成输出图片形状。

nn.Fold:逆滑动窗口提取层。

循环网络相关层

nn.Embedding:嵌入层。一种比Onehot更加有效的对离散特征进行编码的方法。一般用于 将输入中的单词映射为稠密向量。嵌入层的参数需要学习。

nn.LSTM:长短记忆循环网络层【支持多层】。最普遍使用的循环网络层。具有携带轨道, 遗忘门,更新门,输出门。可以较为有效地缓解梯度消失问题,从而能够适用长期依赖问 题。设置bidirectional = True时可以得到双向LSTM。需要注意的时,默认的输入和输出形状 是(seq,batch,feature), 如果需要将batch维度放在第0维,则要设置batch_first参数设置为 True。

nn.GRU:门控循环网络层【支持多层】。LSTM的低配版,不具有携带轨道,参数数量少于 LSTM,训练速度更快。

nn.RNN:简单循环网络层【支持多层】。容易存在梯度消失,不能够适用长期依赖问题。 一般较少使用。

nn.LSTMCell:长短记忆循环网络单元。和nn.LSTM在整个序列上迭代相比,它仅在序列上 迭代一步。一般较少使用。

nn.GRUCell:门控循环网络单元。和nn.GRU在整个序列上迭代相比,它仅在序列上迭代一 步。一般较少使用。

nn.RNNCell:简单循环网络单元。和nn.RNN在整个序列上迭代相比,它仅在序列上迭代一 步。一般较少使用。

Transformer相关层

nn.Transformer:Transformer网络结构。Transformer网络结构是替代循环网络的一种结构,解决了循环网络难以并行,难以捕捉长期依赖的缺陷。它是目前NLP任务的主流模型的 主要构成部分。

Transformer网络结构由TransformerEncoder编码器和TransformerDecoder 解码器组成。编码器和解码器的核心是MultiheadAttention多头注意力层。

nn.TransformerEncoder:Transformer编码器结构。由多个 nn.TransformerEncoderLayer编 码器层组成。

nn.TransformerDecoder:Transformer解码器结构。由多个 nn.TransformerDecoderLayer 解码器层组成。

nn.TransformerEncoderLayer:Transformer的编码器层。

nn.TransformerDecoderLayer:Transformer的解码器层。

nn.MultiheadAttention:多头注意力层。

4.2.2 自定义模型层

        如果Pytorch的内置模型层不能够满足需求,我们也可以通过继承nn.Module基类构建自定义的模型层。 实际上,pytorch不区分模型和模型层,都是通过继承nn.Module进行构建。 因此,我们只要继承nn.Module基类并实现forward方法即可自定义模型层。 下面是Pytorch的nn.Linear层的源码,我们可以仿照它来自定义模型层。

  1. #例4-2-2  自定义模型层
  2. import torch
  3. from torch import nn
  4. import torch.nn.functional as F
  5. class Linear(nn.Module):
  6. __constants__ = ['in_features', 'out_features']
  7. def __init__(self, in_features, out_features, bias=True):
  8. super(Linear, self).__init__()
  9. self.in_features = in_features
  10. self.out_features = out_features
  11. self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
  12. if bias:
  13. self.bias = nn.Parameter(torch.Tensor(out_features))
  14. else:
  15. self.register_parameter('bias', None)
  16. self.reset_parameters()
  17. def reset_parameters(self):
  18. nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  19. if self.bias is not None:
  20. fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.weight)
  21. bound = 1 / math.sqrt(fan_in)
  22. nn.init.uniform_(self.bias, -bound, bound)
  23. def forward(self, input):
  24. return F.linear(input, self.weight, self.bias)
  25. def extra_repr(self):
  26. return 'in_features={}, out_features={}, bias={}'.format(self.in_features, self.out_features, self.bias is not None)
  27. linear = nn.Linear(20, 30)
  28. inputs = torch.randn(128, 20)
  29. output = linear(inputs)
  30. print(output.size())
  31. out:
  32. torch.Size([128, 30])

4.3  损失函数losses

        一般来说,监督学习的目标函数由损失函数和正则化项组成。(Objective = Loss + Regularization)

        Pytorch中的损失函数一般在训练模型时候指定。 注意Pytorch中内置的损失函数的参数和tensorflow不同,是y_pred在前,y_true在后,而 Tensorflow是y_true在前,y_pred在后。

        对于回归模型,通常使用的内置损失函数是均方损失函数nn.MSELoss 。

        对于二分类模型,通常使用的是二元交叉熵损失函数nn.BCELoss (输入已经是sigmoid激活函数 之后的结果) 或者 nn.BCEWithLogitsLoss (输入尚未经过nn.Sigmoid激活函数) 。

        对于多分类模型,一般推荐使用交叉熵损失函数 nn.CrossEntropyLoss。 (y_true需要是一维的, 是类别编码。y_pred未经过nn.Softmax激活。) 此外,如果多分类的y_pred经过了nn.LogSoftmax激活,可以使用nn.NLLLoss损失函数(The negative log likelihood loss)。 这种方法和直接使用nn.CrossEntropyLoss等价。

        如果有需要,也可以自定义损失函数,自定义损失函数需要接收两个张量y_pred,y_true作为输 入参数,并输出一个标量作为损失函数值。 Pytorch中的正则化项一般通过自定义的方式和损失函数一起添加作为目标函数。 如果仅仅使用L2正则化,也可以利用优化器的weight_decay参数来实现相同的效果。

4.3.1 内置损失函数

内置的损失函数一般有类的实现和函数的实现两种形式。

如:nn.BCE 和 F.binary_cross_entropy 都是二元交叉熵损失函数,前者是类的实现形式,后者是 函数的实现形式。

实际上类的实现形式通常是调用函数的实现形式并用nn.Module封装后得到的。

一般我们常用的是类的实现形式。它们封装在torch.nn模块下,并且类名以Loss结尾。

常用的一些内置损失函数说明如下。

nn.MSELoss(均方误差损失,也叫做L2损失,用于回归)

nn.L1Loss (L1损失,也叫做绝对值误差损失,用于回归)

nn.SmoothL1Loss (平滑L1损失,当输入在-1到1之间时,平滑为L2损失,用于回归)

nn.BCELoss (二元交叉熵,用于二分类,输入已经过nn.Sigmoid激活,对不平衡数据集可以 用weigths参数调整类别权重)

nn.BCEWithLogitsLoss (二元交叉熵,用于二分类,输入未经过nn.Sigmoid激活)

nn.CrossEntropyLoss (交叉熵,用于多分类,要求label为稀疏编码,输入未经过 nn.Softmax激活,对不平衡数据集可以用weigths参数调整类别权重)

nn.NLLLoss (负对数似然损失,用于多分类,要求label为稀疏编码,输入经过 nn.LogSoftmax激活)

nn.CosineSimilarity(余弦相似度,可用于多分类)

nn.AdaptiveLogSoftmaxWithLoss (一种适合非常多类别且类别分布很不均衡的损失函数, 会自适应地将多个小类别合成一个cluster)

  1. #例4-3-1 内置损失函数
  2. import numpy as np
  3. import pandas as pd
  4. import torch
  5. from torch import nn
  6. import torch.nn.functional as F
  7. y_pred = torch.tensor([[10.0,0.0,-10.0],[8.0,8.0,8.0]])
  8. y_true = torch.tensor([0,2])
  9. # 直接调用交叉熵损失
  10. ce = nn.CrossEntropyLoss()(y_pred,y_true)
  11. print(ce)
  12. # 等价于先计算nn.LogSoftmax激活,再调用NLLLoss
  13. y_pred_logsoftmax = nn.LogSoftmax(dim = 1)(y_pred)
  14. nll = nn.NLLLoss()(y_pred_logsoftmax,y_true)
  15. print(nll)
  16. out:
  17. tensor(0.5493)
  18. tensor(0.5493)

4.3.2 自定义损失函数

自定义损失函数接收两个张量y_pred,y_true作为输入参数,并输出一个标量作为损失函数值。

也可以对nn.Module进行子类化,重写forward方法实现损失的计算逻辑,从而得到损失函数的 类的实现。

下面是一个Focal Loss的自定义实现示范。

Focal Loss是一种对binary_crossentropy的改进损失 函数形式。

它在样本不均衡和存在较多易分类的样本时相比binary_crossentropy具有明显的优势。

它有两个可调参数,alpha参数和gamma参数。

其中alpha参数主要用于衰减负样本的权重, gamma参数主要用于衰减容易训练样本的权重。

从而让模型更加聚焦在正样本和困难样本上。这就是为什么这个损失函数叫做Focal Loss。

  1. #例4-3-2 自定义损失函数
  2. import torch
  3. from torch import nn
  4. class FocalLoss(nn.Module):
  5. def __init__(self,gamma=2.0,alpha=0.75):
  6. super().__init__()
  7. self.gamma = gamma
  8. self.alpha = alpha
  9. def forward(self,y_pred,y_true):
  10. bce = torch.nn.BCELoss(reduction = "none")(y_pred,y_true)
  11. p_t = (y_true * y_pred) + ((1 - y_true) * (1 - y_pred))
  12. alpha_factor = y_true * self.alpha + (1 - y_true) * (1 - self.alpha)
  13. modulating_factor = torch.pow(1.0 - p_t, self.gamma)
  14. loss = torch.mean(alpha_factor * modulating_factor * bce)
  15. return loss
  16. #困难样本
  17. y_pred_hard = torch.tensor([[0.5],[0.5]])
  18. y_true_hard = torch.tensor([[1.0],[0.0]])
  19. #容易样本
  20. y_pred_easy = torch.tensor([[0.9],[0.1]])
  21. y_true_easy = torch.tensor([[1.0],[0.0]])
  22. focal_loss = FocalLoss()
  23. bce_loss = nn.BCELoss()
  24. print("focal_loss(hard samples):", focal_loss(y_pred_hard,y_true_hard))
  25. print("bce_loss(hard samples):", bce_loss(y_pred_hard,y_true_hard))
  26. print("focal_loss(easy samples):", focal_loss(y_pred_easy,y_true_easy))
  27. print("bce_loss(easy samples):", bce_loss(y_pred_easy,y_true_easy))
  28. #可见 focal_loss让容易样本的权重衰减到原来的 0.0005/0.1054 = 0.00474
  29. #而让困难样本的权重只衰减到原来的 0.0866/0.6931=0.12496
  30. # 因此相对而言,focal_loss可以衰减容易样本的权重。
  31. out:
  32. focal_loss(hard samples): tensor(0.0866)
  33. bce_loss(hard samples): tensor(0.6931)
  34. focal_loss(easy samples): tensor(0.0005)
  35. bce_loss(easy samples): tensor(0.1054)

4.3.3 自定义L1和L2正则化项

        通常认为L1 正则化可以产生稀疏权值矩阵,即产生一个稀疏模型,可以用于特征选择。

        而L2 正则化可以防止模型过拟合(overfitting)。一定程度上,L1也可以防止过拟合。

        下面以一个二分类问题为例,演示给模型的目标函数添加自定义L1和L2正则化项的方法。

        这个范例同时演示了上一个部分的FocalLoss的使用

  1. import torch
  2. from torch import nn
  3. class FocalLoss(nn.Module):
  4. def __init__(self,gamma=2.0,alpha=0.75):
  5. super().__init__()
  6. self.gamma = gamma
  7. self.alpha = alpha
  8. def forward(self,y_pred,y_true):
  9. bce = torch.nn.BCELoss(reduction = "none")(y_pred,y_true)
  10. p_t = (y_true * y_pred) + ((1 - y_true) * (1 - y_pred))
  11. alpha_factor = y_true * self.alpha + (1 - y_true) * (1 - self.alpha)
  12. modulating_factor = torch.pow(1.0 - p_t, self.gamma)
  13. loss = torch.mean(alpha_factor * modulating_factor * bce)
  14. return loss
  15. #例4-3-3 自定义L1和L2正则化项
  16. import numpy as np
  17. import pandas as pd
  18. from matplotlib import pyplot as plt
  19. import torch
  20. from torch import nn
  21. import torch.nn.functional as F
  22. from torch.utils.data import Dataset,DataLoader,TensorDataset
  23. import torchkeras
  24. %matplotlib inline
  25. %config InlineBackend.figure_format = 'svg'
  26. #1、准备数据
  27. #正负样本数量
  28. n_positive,n_negative = 200,6000
  29. #生成正样本, 小圆环分布
  30. r_p = 5.0 + torch.normal(0.0,1.0,size = [n_positive,1])
  31. theta_p = 2*np.pi*torch.rand([n_positive,1])
  32. Xp = torch.cat([r_p*torch.cos(theta_p),r_p*torch.sin(theta_p)],axis = 1)
  33. Yp = torch.ones_like(r_p)
  34. #生成负样本, 大圆环分布
  35. r_n = 8.0 + torch.normal(0.0,1.0,size = [n_negative,1])
  36. theta_n = 2*np.pi*torch.rand([n_negative,1])
  37. Xn = torch.cat([r_n*torch.cos(theta_n),r_n*torch.sin(theta_n)],axis = 1)
  38. Yn = torch.zeros_like(r_n)
  39. #汇总样本
  40. X = torch.cat([Xp,Xn],axis = 0)
  41. Y = torch.cat([Yp,Yn],axis = 0)
  42. #可视化
  43. plt.figure(figsize = (6,6))
  44. plt.scatter(Xp[:,0],Xp[:,1],c = "r")
  45. plt.scatter(Xn[:,0],Xn[:,1],c = "g")
  46. plt.legend(["positive","negative"]);

  1. ds = TensorDataset(X,Y)
  2. ds_train,ds_valid = torch.utils.data.random_split(ds,
  3. [int(len(ds)*0.7),len(ds)-int(len(ds)*0.7)])
  4. dl_train = DataLoader(ds_train,batch_size = 100,shuffle=True,num_workers=2)
  5. dl_valid = DataLoader(ds_valid,batch_size = 100,num_workers=2)
  6. #2、定义模型
  7. class DNNModel(torchkeras.Model):
  8. def __init__(self):
  9. super(DNNModel, self).__init__()
  10. self.fc1 = nn.Linear(2,4)
  11. self.fc2 = nn.Linear(4,8)
  12. self.fc3 = nn.Linear(8,1)
  13. def forward(self,x):
  14. x = F.relu(self.fc1(x))
  15. x = F.relu(self.fc2(x))
  16. y = nn.Sigmoid()(self.fc3(x))
  17. return y
  18. model = DNNModel()
  19. model.summary(input_shape =(2,))
  20. out:
  21. ----------------------------------------------------------------
  22. Layer (type) Output Shape Param #
  23. ================================================================
  24. Linear-1 [-1, 4] 12
  25. Linear-2 [-1, 8] 40
  26. Linear-3 [-1, 1] 9
  27. ================================================================
  28. Total params: 61
  29. Trainable params: 61
  30. Non-trainable params: 0
  31. ----------------------------------------------------------------
  32. Input size (MB): 0.000008
  33. Forward/backward pass size (MB): 0.000099
  34. Params size (MB): 0.000233
  35. Estimated Total Size (MB): 0.000340
  36. ----------------------------------------------------------------
  1. #3、训练模型
  2. # 准确率
  3. def accuracy(y_pred,y_true):
  4. y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype = torch.float32),torch.zeros_like(y_pred,dtype = torch.float32))
  5. acc = torch.mean(1-torch.abs(y_true-y_pred))
  6. return acc
  7. # L2正则化
  8. def L2Loss(model,alpha):
  9. l2_loss = torch.tensor(0.0, requires_grad=True)
  10. for name, param in model.named_parameters():
  11. if 'bias' not in name: #一般不对偏置项使用正则
  12. l2_loss = l2_loss + (0.5 * alpha * torch.sum(torch.pow(param, 2)))
  13. return l2_loss
  14. # L1正则化
  15. def L1Loss(model,beta):
  16. l1_loss = torch.tensor(0.0, requires_grad=True)
  17. for name, param in model.named_parameters():
  18. if 'bias' not in name:
  19. l1_loss = l1_loss + beta * torch.sum(torch.abs(param))
  20. return l1_loss
  21. # 将L2正则和L1正则添加到FocalLoss损失,一起作为目标函数
  22. def focal_loss_with_regularization(y_pred,y_true):
  23. focal = FocalLoss()(y_pred,y_true)
  24. l2_loss = L2Loss(model,0.001) #注意设置正则化项系数
  25. l1_loss = L1Loss(model,0.001)
  26. total_loss = focal + l2_loss + l1_loss
  27. return total_loss
  28. model.compile(loss_func =focal_loss_with_regularization,optimizer= torch.optim.Adam(model.parameters(),lr = 0.01),metrics_dict={"accuracy":accuracy})
  29. dfhistory = model.fit(30,dl_train = dl_train,dl_val = dl_valid,log_step_freq =30)
  30. out:
  31. Start Training ...
  32. ================================================================================2022-03-20 22:24:58
  33. {'step': 30, 'loss': 0.071, 'accuracy': 0.339}
  34. +-------+-------+----------+----------+--------------+
  35. | epoch | loss | accuracy | val_loss | val_accuracy |
  36. +-------+-------+----------+----------+--------------+
  37. | 1 | 0.059 | 0.538 | 0.025 | 0.97 |
  38. +-------+-------+----------+----------+--------------+
  39. ================================================================================2022-03-20 22:24:59
  40. {'step': 30, 'loss': 0.026, 'accuracy': 0.966}
  41. +-------+-------+----------+----------+--------------+
  42. | epoch | loss | accuracy | val_loss | val_accuracy |
  43. +-------+-------+----------+----------+--------------+
  44. | 2 | 0.025 | 0.967 | 0.023 | 0.97 |
  45. +-------+-------+----------+----------+--------------+
  46. ================================================================================2022-03-20 22:25:01
  47. {'step': 30, 'loss': 0.024, 'accuracy': 0.966}
  48. +-------+-------+----------+----------+--------------+
  49. | epoch | loss | accuracy | val_loss | val_accuracy |
  50. +-------+-------+----------+----------+--------------+
  51. | 3 | 0.023 | 0.967 | 0.022 | 0.97 |
  52. +-------+-------+----------+----------+--------------+
  53. ================================================================================2022-03-20 22:25:03
  54. {'step': 30, 'loss': 0.023, 'accuracy': 0.966}
  55. +-------+-------+----------+----------+--------------+
  56. | epoch | loss | accuracy | val_loss | val_accuracy |
  57. +-------+-------+----------+----------+--------------+
  58. | 4 | 0.023 | 0.967 | 0.021 | 0.97 |
  59. +-------+-------+----------+----------+--------------+
  60. ================================================================================2022-03-20 22:25:05
  61. {'step': 30, 'loss': 0.023, 'accuracy': 0.966}
  62. +-------+-------+----------+----------+--------------+
  63. | epoch | loss | accuracy | val_loss | val_accuracy |
  64. +-------+-------+----------+----------+--------------+
  65. | 5 | 0.023 | 0.966 | 0.021 | 0.97 |
  66. +-------+-------+----------+----------+--------------+
  67. ================================================================================2022-03-20 22:25:06
  68. {'step': 30, 'loss': 0.023, 'accuracy': 0.966}
  69. +-------+-------+----------+----------+--------------+
  70. | epoch | loss | accuracy | val_loss | val_accuracy |
  71. +-------+-------+----------+----------+--------------+
  72. | 6 | 0.023 | 0.967 | 0.021 | 0.97 |
  73. +-------+-------+----------+----------+--------------+
  74. ================================================================================2022-03-20 22:25:08
  75. {'step': 30, 'loss': 0.022, 'accuracy': 0.967}
  76. +-------+-------+----------+----------+--------------+
  77. | epoch | loss | accuracy | val_loss | val_accuracy |
  78. +-------+-------+----------+----------+--------------+
  79. | 7 | 0.022 | 0.967 | 0.021 | 0.97 |
  80. +-------+-------+----------+----------+--------------+
  81. ================================================================================2022-03-20 22:25:10
  82. {'step': 30, 'loss': 0.022, 'accuracy': 0.966}
  83. +-------+-------+----------+----------+--------------+
  84. | epoch | loss | accuracy | val_loss | val_accuracy |
  85. +-------+-------+----------+----------+--------------+
  86. | 8 | 0.022 | 0.967 | 0.02 | 0.97 |
  87. +-------+-------+----------+----------+--------------+
  88. ================================================================================2022-03-20 22:25:12
  89. {'step': 30, 'loss': 0.022, 'accuracy': 0.963}
  90. +-------+-------+----------+----------+--------------+
  91. | epoch | loss | accuracy | val_loss | val_accuracy |
  92. +-------+-------+----------+----------+--------------+
  93. | 9 | 0.021 | 0.967 | 0.02 | 0.97 |
  94. +-------+-------+----------+----------+--------------+
  95. ================================================================================2022-03-20 22:25:14
  96. {'step': 30, 'loss': 0.021, 'accuracy': 0.966}
  97. +-------+-------+----------+----------+--------------+
  98. | epoch | loss | accuracy | val_loss | val_accuracy |
  99. +-------+-------+----------+----------+--------------+
  100. | 10 | 0.021 | 0.967 | 0.019 | 0.97 |
  101. +-------+-------+----------+----------+--------------+
  102. ================================================================================2022-03-20 22:25:16
  103. {'step': 30, 'loss': 0.021, 'accuracy': 0.966}
  104. +-------+------+----------+----------+--------------+
  105. | epoch | loss | accuracy | val_loss | val_accuracy |
  106. +-------+------+----------+----------+--------------+
  107. | 11 | 0.02 | 0.969 | 0.019 | 0.97 |
  108. +-------+------+----------+----------+--------------+
  109. ================================================================================2022-03-20 22:25:18
  110. {'step': 30, 'loss': 0.019, 'accuracy': 0.972}
  111. +-------+-------+----------+----------+--------------+
  112. | epoch | loss | accuracy | val_loss | val_accuracy |
  113. +-------+-------+----------+----------+--------------+
  114. | 12 | 0.019 | 0.971 | 0.018 | 0.972 |
  115. +-------+-------+----------+----------+--------------+
  116. ================================================================================2022-03-20 22:25:19
  117. {'step': 30, 'loss': 0.019, 'accuracy': 0.973}
  118. +-------+-------+----------+----------+--------------+
  119. | epoch | loss | accuracy | val_loss | val_accuracy |
  120. +-------+-------+----------+----------+--------------+
  121. | 13 | 0.019 | 0.971 | 0.018 | 0.972 |
  122. +-------+-------+----------+----------+--------------+
  123. ================================================================================2022-03-20 22:25:21
  124. {'step': 30, 'loss': 0.019, 'accuracy': 0.972}
  125. +-------+-------+----------+----------+--------------+
  126. | epoch | loss | accuracy | val_loss | val_accuracy |
  127. +-------+-------+----------+----------+--------------+
  128. | 14 | 0.019 | 0.974 | 0.017 | 0.971 |
  129. +-------+-------+----------+----------+--------------+
  130. ================================================================================2022-03-20 22:25:23
  131. {'step': 30, 'loss': 0.019, 'accuracy': 0.974}
  132. +-------+-------+----------+----------+--------------+
  133. | epoch | loss | accuracy | val_loss | val_accuracy |
  134. +-------+-------+----------+----------+--------------+
  135. | 15 | 0.018 | 0.976 | 0.018 | 0.972 |
  136. +-------+-------+----------+----------+--------------+
  137. ================================================================================2022-03-20 22:25:25
  138. {'step': 30, 'loss': 0.018, 'accuracy': 0.975}
  139. +-------+-------+----------+----------+--------------+
  140. | epoch | loss | accuracy | val_loss | val_accuracy |
  141. +-------+-------+----------+----------+--------------+
  142. | 16 | 0.018 | 0.976 | 0.017 | 0.978 |
  143. +-------+-------+----------+----------+--------------+
  144. ================================================================================2022-03-20 22:25:26
  145. {'step': 30, 'loss': 0.018, 'accuracy': 0.977}
  146. +-------+-------+----------+----------+--------------+
  147. | epoch | loss | accuracy | val_loss | val_accuracy |
  148. +-------+-------+----------+----------+--------------+
  149. | 17 | 0.018 | 0.978 | 0.017 | 0.979 |
  150. +-------+-------+----------+----------+--------------+
  151. ================================================================================2022-03-20 22:25:28
  152. {'step': 30, 'loss': 0.019, 'accuracy': 0.975}
  153. +-------+-------+----------+----------+--------------+
  154. | epoch | loss | accuracy | val_loss | val_accuracy |
  155. +-------+-------+----------+----------+--------------+
  156. | 18 | 0.018 | 0.977 | 0.017 | 0.978 |
  157. +-------+-------+----------+----------+--------------+
  158. ================================================================================2022-03-20 22:25:30
  159. {'step': 30, 'loss': 0.017, 'accuracy': 0.981}
  160. +-------+-------+----------+----------+--------------+
  161. | epoch | loss | accuracy | val_loss | val_accuracy |
  162. +-------+-------+----------+----------+--------------+
  163. | 19 | 0.018 | 0.98 | 0.017 | 0.982 |
  164. +-------+-------+----------+----------+--------------+
  165. ================================================================================2022-03-20 22:25:32
  166. {'step': 30, 'loss': 0.017, 'accuracy': 0.98}
  167. +-------+-------+----------+----------+--------------+
  168. | epoch | loss | accuracy | val_loss | val_accuracy |
  169. +-------+-------+----------+----------+--------------+
  170. | 20 | 0.018 | 0.979 | 0.018 | 0.984 |
  171. +-------+-------+----------+----------+--------------+
  172. ================================================================================2022-03-20 22:25:34
  173. {'step': 30, 'loss': 0.018, 'accuracy': 0.978}
  174. +-------+-------+----------+----------+--------------+
  175. | epoch | loss | accuracy | val_loss | val_accuracy |
  176. +-------+-------+----------+----------+--------------+
  177. | 21 | 0.018 | 0.979 | 0.017 | 0.981 |
  178. +-------+-------+----------+----------+--------------+
  179. ================================================================================2022-03-20 22:25:36
  180. {'step': 30, 'loss': 0.018, 'accuracy': 0.981}
  181. +-------+-------+----------+----------+--------------+
  182. | epoch | loss | accuracy | val_loss | val_accuracy |
  183. +-------+-------+----------+----------+--------------+
  184. | 22 | 0.018 | 0.98 | 0.016 | 0.98 |
  185. +-------+-------+----------+----------+--------------+
  186. ================================================================================2022-03-20 22:25:37
  187. {'step': 30, 'loss': 0.017, 'accuracy': 0.982}
  188. +-------+-------+----------+----------+--------------+
  189. | epoch | loss | accuracy | val_loss | val_accuracy |
  190. +-------+-------+----------+----------+--------------+
  191. | 23 | 0.017 | 0.98 | 0.016 | 0.982 |
  192. +-------+-------+----------+----------+--------------+
  193. ================================================================================2022-03-20 22:25:39
  194. {'step': 30, 'loss': 0.018, 'accuracy': 0.978}
  195. +-------+-------+----------+----------+--------------+
  196. | epoch | loss | accuracy | val_loss | val_accuracy |
  197. +-------+-------+----------+----------+--------------+
  198. | 24 | 0.017 | 0.98 | 0.016 | 0.98 |
  199. +-------+-------+----------+----------+--------------+
  200. ================================================================================2022-03-20 22:25:41
  201. {'step': 30, 'loss': 0.017, 'accuracy': 0.982}
  202. +-------+-------+----------+----------+--------------+
  203. | epoch | loss | accuracy | val_loss | val_accuracy |
  204. +-------+-------+----------+----------+--------------+
  205. | 25 | 0.017 | 0.979 | 0.017 | 0.983 |
  206. +-------+-------+----------+----------+--------------+
  207. ================================================================================2022-03-20 22:25:43
  208. {'step': 30, 'loss': 0.018, 'accuracy': 0.98}
  209. +-------+-------+----------+----------+--------------+
  210. | epoch | loss | accuracy | val_loss | val_accuracy |
  211. +-------+-------+----------+----------+--------------+
  212. | 26 | 0.017 | 0.981 | 0.016 | 0.983 |
  213. +-------+-------+----------+----------+--------------+
  214. ================================================================================2022-03-20 22:25:45
  215. {'step': 30, 'loss': 0.017, 'accuracy': 0.982}
  216. +-------+-------+----------+----------+--------------+
  217. | epoch | loss | accuracy | val_loss | val_accuracy |
  218. +-------+-------+----------+----------+--------------+
  219. | 27 | 0.017 | 0.979 | 0.016 | 0.985 |
  220. +-------+-------+----------+----------+--------------+
  221. ================================================================================2022-03-20 22:25:47
  222. {'step': 30, 'loss': 0.017, 'accuracy': 0.98}
  223. +-------+-------+----------+----------+--------------+
  224. | epoch | loss | accuracy | val_loss | val_accuracy |
  225. +-------+-------+----------+----------+--------------+
  226. | 28 | 0.017 | 0.979 | 0.016 | 0.984 |
  227. +-------+-------+----------+----------+--------------+
  228. ================================================================================2022-03-20 22:25:49
  229. {'step': 30, 'loss': 0.017, 'accuracy': 0.981}
  230. +-------+-------+----------+----------+--------------+
  231. | epoch | loss | accuracy | val_loss | val_accuracy |
  232. +-------+-------+----------+----------+--------------+
  233. | 29 | 0.017 | 0.98 | 0.016 | 0.986 |
  234. +-------+-------+----------+----------+--------------+
  235. ================================================================================2022-03-20 22:25:51
  236. {'step': 30, 'loss': 0.017, 'accuracy': 0.978}
  237. +-------+-------+----------+----------+--------------+
  238. | epoch | loss | accuracy | val_loss | val_accuracy |
  239. +-------+-------+----------+----------+--------------+
  240. | 30 | 0.018 | 0.979 | 0.016 | 0.986 |
  241. +-------+-------+----------+----------+--------------+
  242. ================================================================================2022-03-20 22:25:53
  243. Finished Training...
  1. # 结果可视化
  2. fig, (ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize = (12,5))
  3. ax1.scatter(Xp[:,0],Xp[:,1], c="r")
  4. ax1.scatter(Xn[:,0],Xn[:,1],c = "g")
  5. ax1.legend(["positive","negative"]);
  6. ax1.set_title("y_true");
  7. Xp_pred = X[torch.squeeze(model.forward(X)>=0.5)]
  8. Xn_pred = X[torch.squeeze(model.forward(X)<0.5)]
  9. ax2.scatter(Xp_pred[:,0],Xp_pred[:,1],c = "r")
  10. ax2.scatter(Xn_pred[:,0],Xn_pred[:,1],c = "g")
  11. ax2.legend(["positive","negative"]);
  12. ax2.set_title("y_pred");

4.3.4 通过优化器实现L2正则化

        如果仅仅需要使用L2正则化,那么也可以利用优化器的weight_decay参数来实现。 weight_decay参数可以设置参数在训练过程中的衰减,这和L2正则化的作用效果等价。

  1. before L2 regularization:
  2. gradient descent: w = w - lr * dloss_dw
  3. after L2 regularization:
  4. gradient descent: w = w - lr * (dloss_dw+beta*w) = (1-lr*beta)*w - lr*dloss_dw
  5. so (1-lr*beta)is the weight decay ratio.

Pytorch的优化器支持一种称之为Per-parameter options的操作,就是对每一个参数进行特定的 学习率,权重衰减率指定,以满足更为细致的要求。

  1. weight_params = [param for name, param in model.named_parameters() if "bias"
  2. not in name]
  3. bias_params = [param for name, param in model.named_parameters() if "bias" in
  4. name]
  5. optimizer = torch.optim.SGD([{'params': weight_params, 'weight_decay':1e-5},
  6. {'params': bias_params, 'weight_decay':0}],
  7. lr=1e-2, momentum=0.9)

5、用pytorch构建和训练模型的方法

5.1  用pytorch构建模型的三种方法

        pytorch可以使用以下3种方式构建模型:

        1,继承nn.Module基类构建自定义模型。

        2,使用nn.Sequential按层顺序构建模型。

        3,继承nn.Module基类构建模型并辅助应用模型容器进行封装                                (nn.Sequential,nn.ModuleList,nn.ModuleDict)。

其中 第1种方式最为常见,第2种方式最简单,第3种方式最为灵活也较为复杂。

5.1.1 继承nn.Module基类构建自定义模型

        模型中的用到的层一般在 __init__ 函 数中定义,然后在 forward 方法中定义模型的正向传播逻辑。

  1. #例5-1-1继承nn.Module基类构建自定义模型
  2. import torch
  3. from torch import nn
  4. from torchkeras import summary
  5. class Net(nn.Module):
  6. def __init__(self):
  7. super(Net, self).__init__()
  8. self.conv1 = nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3)
  9. self.pool1 = nn.MaxPool2d(kernel_size = 2,stride = 2)
  10. self.conv2 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5)
  11. self.pool2 = nn.MaxPool2d(kernel_size = 2,stride = 2)
  12. self.dropout = nn.Dropout2d(p = 0.1)
  13. self.adaptive_pool = nn.AdaptiveMaxPool2d((1,1))
  14. self.flatten = nn.Flatten()
  15. self.linear1 = nn.Linear(64,32)
  16. self.relu = nn.ReLU()
  17. self.linear2 = nn.Linear(32,1)
  18. self.sigmoid = nn.Sigmoid()
  19. def forward(self,x):
  20. x = self.conv1(x)
  21. x = self.pool1(x)
  22. x = self.conv2(x)
  23. x = self.pool2(x)
  24. x = self.dropout(x)
  25. x = self.adaptive_pool(x)
  26. x = self.flatten(x)
  27. x = self.linear1(x)
  28. x = self.relu(x)
  29. x = self.linear2(x)
  30. y = self.sigmoid(x)
  31. return y
  32. net = Net()
  33. print(net)
  34. out:
  35. Net(
  36. (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  37. (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  38. (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  39. (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  40. (dropout): Dropout2d(p=0.1, inplace=False)
  41. (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1))
  42. (flatten): Flatten(start_dim=1, end_dim=-1)
  43. (linear1): Linear(in_features=64, out_features=32, bias=True)
  44. (relu): ReLU()
  45. (linear2): Linear(in_features=32, out_features=1, bias=True)
  46. (sigmoid): Sigmoid()
  47. )
  48. summary(net,input_shape= (3,32,32))
  49. out:
  50. ----------------------------------------------------------------
  51. Layer (type) Output Shape Param #
  52. ================================================================
  53. Conv2d-1 [-1, 32, 30, 30] 896
  54. MaxPool2d-2 [-1, 32, 15, 15] 0
  55. Conv2d-3 [-1, 64, 11, 11] 51,264
  56. MaxPool2d-4 [-1, 64, 5, 5] 0
  57. Dropout2d-5 [-1, 64, 5, 5] 0
  58. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  59. Flatten-7 [-1, 64] 0
  60. Linear-8 [-1, 32] 2,080
  61. ReLU-9 [-1, 32] 0
  62. Linear-10 [-1, 1] 33
  63. Sigmoid-11 [-1, 1] 0
  64. ================================================================
  65. Total params: 54,273
  66. Trainable params: 54,273
  67. Non-trainable params: 0
  68. ----------------------------------------------------------------
  69. Input size (MB): 0.011719
  70. Forward/backward pass size (MB): 0.359634
  71. Params size (MB): 0.207035
  72. Estimated Total Size (MB): 0.578388
  73. ----------------------------------------------------------------

5.1.2 使用nn.Sequential按层顺序构建模型

使用nn.Sequential按层顺序构建模型无需定义forward方法。仅仅适合于简单的模型。

a) 利用add_module方法

b) 利用变长参数 这种方式构建时不能给每个层指定名称。

c) 利用OrderedDict

  1. #例5-1-2 使用nn.Sequential按层顺序构建模型
  2. #a)利用add_module方法
  3. import torch
  4. from torch import nn
  5. from torchkeras import summary
  6. net = nn.Sequential()
  7. net.add_module("conv1",nn.Conv2d(in_channels=3,out_channels=32,kernel_size =
  8. 3))
  9. net.add_module("pool1",nn.MaxPool2d(kernel_size = 2,stride = 2))
  10. net.add_module("conv2",nn.Conv2d(in_channels=32,out_channels=64,kernel_size =
  11. 5))
  12. net.add_module("pool2",nn.MaxPool2d(kernel_size = 2,stride = 2))
  13. net.add_module("dropout",nn.Dropout2d(p = 0.1))
  14. net.add_module("adaptive_pool",nn.AdaptiveMaxPool2d((1,1)))
  15. net.add_module("flatten",nn.Flatten())
  16. net.add_module("linear1",nn.Linear(64,32))
  17. net.add_module("relu",nn.ReLU())
  18. net.add_module("linear2",nn.Linear(32,1))
  19. net.add_module("sigmoid",nn.Sigmoid())
  20. print(net)
  21. out:
  22. Sequential(
  23. (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  24. (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  25. (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  26. (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  27. (dropout): Dropout2d(p=0.1, inplace=False)
  28. (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1))
  29. (flatten): Flatten(start_dim=1, end_dim=-1)
  30. (linear1): Linear(in_features=64, out_features=32, bias=True)
  31. (relu): ReLU()
  32. (linear2): Linear(in_features=32, out_features=1, bias=True)
  33. (sigmoid): Sigmoid()
  34. )
  35. #b)利用变长参数 这种方式构建时不能给每个层指定名称。
  36. net = nn.Sequential(
  37. nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3),
  38. nn.MaxPool2d(kernel_size = 2,stride = 2),
  39. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  40. nn.MaxPool2d(kernel_size = 2,stride = 2),
  41. nn.Dropout2d(p = 0.1),
  42. nn.AdaptiveMaxPool2d((1,1)),
  43. nn.Flatten(),
  44. nn.Linear(64,32),
  45. nn.ReLU(),
  46. nn.Linear(32,1),
  47. nn.Sigmoid()
  48. )
  49. print(net)
  50. out:
  51. Sequential(
  52. (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  53. (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  54. (2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  55. (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  56. (4): Dropout2d(p=0.1, inplace=False)
  57. (5): AdaptiveMaxPool2d(output_size=(1, 1))
  58. (6): Flatten(start_dim=1, end_dim=-1)
  59. (7): Linear(in_features=64, out_features=32, bias=True)
  60. (8): ReLU()
  61. (9): Linear(in_features=32, out_features=1, bias=True)
  62. (10): Sigmoid()
  63. )
  64. #c) 利用OrderedDict
  65. from collections import OrderedDict
  66. net = nn.Sequential(OrderedDict(
  67. [("conv1",nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3)),
  68. ("pool1",nn.MaxPool2d(kernel_size = 2,stride = 2)),
  69. ("conv2",nn.Conv2d(in_channels=32,out_channels=64,kernel_size =
  70. 5)),
  71. ("pool2",nn.MaxPool2d(kernel_size = 2,stride = 2)),
  72. ("dropout",nn.Dropout2d(p = 0.1)),
  73. ("adaptive_pool",nn.AdaptiveMaxPool2d((1,1))),
  74. ("flatten",nn.Flatten()),
  75. ("linear1",nn.Linear(64,32)),
  76. ("relu",nn.ReLU()),
  77. ("linear2",nn.Linear(32,1)),
  78. ("sigmoid",nn.Sigmoid())
  79. ])
  80. )
  81. print(net)
  82. summary(net,input_shape= (3,32,32))
  83. out:
  84. Sequential(
  85. (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  86. (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  87. (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  88. (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  89. (dropout): Dropout2d(p=0.1, inplace=False)
  90. (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1))
  91. (flatten): Flatten(start_dim=1, end_dim=-1)
  92. (linear1): Linear(in_features=64, out_features=32, bias=True)
  93. (relu): ReLU()
  94. (linear2): Linear(in_features=32, out_features=1, bias=True)
  95. (sigmoid): Sigmoid()
  96. )
  97. ----------------------------------------------------------------
  98. Layer (type) Output Shape Param #
  99. ================================================================
  100. Conv2d-1 [-1, 32, 30, 30] 896
  101. MaxPool2d-2 [-1, 32, 15, 15] 0
  102. Conv2d-3 [-1, 64, 11, 11] 51,264
  103. MaxPool2d-4 [-1, 64, 5, 5] 0
  104. Dropout2d-5 [-1, 64, 5, 5] 0
  105. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  106. Flatten-7 [-1, 64] 0
  107. Linear-8 [-1, 32] 2,080
  108. ReLU-9 [-1, 32] 0
  109. Linear-10 [-1, 1] 33
  110. Sigmoid-11 [-1, 1] 0
  111. ================================================================
  112. Total params: 54,273
  113. Trainable params: 54,273
  114. Non-trainable params: 0
  115. ----------------------------------------------------------------
  116. Input size (MB): 0.011719
  117. Forward/backward pass size (MB): 0.359634
  118. Params size (MB): 0.207035
  119. Estimated Total Size (MB): 0.578388
  120. ----------------------------------------------------------------

5.1.3 继承nn.Module基类构建模型并辅助应用模型容器进行封装

    当模型的结构比较复杂时,我们可以应用模型容器(nn.Sequential,nn.ModuleList,nn.ModuleDict) 对模型的部分结构进行封装。 这样做会让模型整体更加有层次感,有时候也能减少代码量。

        注意,在下面的范例中我们每次仅仅使用一种模型容器,但实际上这些模型容器的使用是非常灵活的,可以在一个模型中任意组合任意嵌套使用。

  1. #5-1-3继承nn.Module基类构建模型并辅助应用模型容器进行封装
  2. import torch
  3. from torch import nn
  4. from torchkeras import summary
  5. #a)nn.Sequential作为模型容器
  6. class Net(nn.Module):
  7. def __init__(self):
  8. super(Net, self).__init__()
  9. self.conv = nn.Sequential(
  10. nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3),
  11. nn.MaxPool2d(kernel_size = 2,stride = 2),
  12. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  13. nn.MaxPool2d(kernel_size = 2,stride = 2),
  14. nn.Dropout2d(p = 0.1),
  15. nn.AdaptiveMaxPool2d((1,1))
  16. )
  17. self.dense = nn.Sequential(
  18. nn.Flatten(),
  19. nn.Linear(64,32),
  20. nn.ReLU(),
  21. nn.Linear(32,1),
  22. nn.Sigmoid()
  23. )
  24. def forward(self,x):
  25. x = self.conv(x)
  26. y = self.dense(x)
  27. return y
  28. net = Net()
  29. print(net)
  30. out:
  31. Net(
  32. (conv): Sequential(
  33. (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  34. (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  35. (2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  36. (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  37. (4): Dropout2d(p=0.1, inplace=False)
  38. (5): AdaptiveMaxPool2d(output_size=(1, 1))
  39. )
  40. (dense): Sequential(
  41. (0): Flatten(start_dim=1, end_dim=-1)
  42. (1): Linear(in_features=64, out_features=32, bias=True)
  43. (2): ReLU()
  44. (3): Linear(in_features=32, out_features=1, bias=True)
  45. (4): Sigmoid()
  46. )
  47. )
  48. #b) nn.ModuleList作为模型容器. 注意下面中的ModuleList不能用Python中的列表代替。
  49. class Net(nn.Module):
  50. def __init__(self):
  51. super(Net, self).__init__()
  52. self.layers = nn.ModuleList([
  53. nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3),
  54. nn.MaxPool2d(kernel_size = 2,stride = 2),
  55. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  56. nn.MaxPool2d(kernel_size = 2,stride = 2),
  57. nn.Dropout2d(p = 0.1),
  58. nn.AdaptiveMaxPool2d((1,1)),
  59. nn.Flatten(),
  60. nn.Linear(64,32),
  61. nn.ReLU(),
  62. nn.Linear(32,1),
  63. nn.Sigmoid()]
  64. )
  65. def forward(self,x):
  66. for layer in self.layers:
  67. x = layer(x)
  68. return x
  69. net = Net()
  70. print(net)
  71. out:
  72. Net(
  73. (layers): ModuleList(
  74. (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  75. (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  76. (2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  77. (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  78. (4): Dropout2d(p=0.1, inplace=False)
  79. (5): AdaptiveMaxPool2d(output_size=(1, 1))
  80. (6): Flatten(start_dim=1, end_dim=-1)
  81. (7): Linear(in_features=64, out_features=32, bias=True)
  82. (8): ReLU()
  83. (9): Linear(in_features=32, out_features=1, bias=True)
  84. (10): Sigmoid()
  85. )
  86. )
  87. summary(net,input_shape= (3,32,32))
  88. out:
  89. ----------------------------------------------------------------
  90. Layer (type) Output Shape Param #
  91. ================================================================
  92. Conv2d-1 [-1, 32, 30, 30] 896
  93. ================================================================
  94. Total params: 896
  95. Trainable params: 896
  96. Non-trainable params: 0
  97. ----------------------------------------------------------------
  98. Input size (MB): 0.011719
  99. Forward/backward pass size (MB): 0.219727
  100. Params size (MB): 0.003418
  101. Estimated Total Size (MB): 0.234863
  102. ----------------------------------------------------------------
  103. #c) nn.ModuleDict作为模型容器. 注意下面中的ModuleDict不能用Python中的字典代替。
  104. class Net(nn.Module):
  105. def __init__(self):
  106. super(Net, self).__init__()
  107. self.layers_dict = nn.ModuleDict({"conv1":nn.Conv2d(in_channels=3,out_channels=32,kernel_size =3),
  108. "pool": nn.MaxPool2d(kernel_size = 2,stride = 2),
  109. "conv2":nn.Conv2d(in_channels=32,out_channels=64,kernel_size =5),
  110. "dropout": nn.Dropout2d(p = 0.1),
  111. "adaptive":nn.AdaptiveMaxPool2d((1,1)),
  112. "flatten": nn.Flatten(),
  113. "linear1": nn.Linear(64,32),
  114. "relu":nn.ReLU(),
  115. "linear2": nn.Linear(32,1),
  116. "sigmoid": nn.Sigmoid()
  117. })
  118. def forward(self,x):
  119. layers = ["conv1","pool","conv2","pool","dropout","adaptive","flatten","linear1","relu","linear2","sigmoid"]
  120. for layer in layers:
  121. x = self.layers_dict[layer](x)
  122. return x
  123. net = Net()
  124. print(net)
  125. out:
  126. Net(
  127. (layers_dict): ModuleDict(
  128. (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  129. (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  130. (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  131. (dropout): Dropout2d(p=0.1, inplace=False)
  132. (adaptive): AdaptiveMaxPool2d(output_size=(1, 1))
  133. (flatten): Flatten(start_dim=1, end_dim=-1)
  134. (linear1): Linear(in_features=64, out_features=32, bias=True)
  135. (relu): ReLU()
  136. (linear2): Linear(in_features=32, out_features=1, bias=True)
  137. (sigmoid): Sigmoid()
  138. )
  139. )
  140. summary(net,input_shape= (3,32,32))
  141. out:
  142. ----------------------------------------------------------------
  143. Layer (type) Output Shape Param #
  144. ================================================================
  145. Conv2d-1 [-1, 32, 30, 30] 896
  146. ================================================================
  147. Total params: 896
  148. Trainable params: 896
  149. Non-trainable params: 0
  150. ----------------------------------------------------------------
  151. Input size (MB): 0.011719
  152. Forward/backward pass size (MB): 0.219727
  153. Params size (MB): 0.003418
  154. Estimated Total Size (MB): 0.234863
  155. ----------------------------------------------------------------

5.2  用pytorch训练模型的三种方法:

Pytorch通常需要用户编写自定义训练循环,训练循环的代码风格因人而异。

有3类典型的训练循环代码风格:脚本形式训练循环,函数形式训练循环,类形式训练循环。

下面以minist数据集的分类模型的训练为例,演示这3种训练模型的风格。

其中类形式训练循环我们会使用torchkeras.Model和torchkeras.LightModel这两种方法。

5.2.1 脚本风格

脚本风格的训练循环最为常见。

  1. #例5-2 用pytorch训练模型的三种方法:
  2. #准备数据
  3. import torch
  4. from torch import nn
  5. from torchkeras import summary
  6. import torchvision
  7. from torchvision import transforms
  8. transform = transforms.Compose([transforms.ToTensor()])
  9. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  10. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  11. dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128,shuffle=True, num_workers=4)
  12. dl_valid = torch.utils.data.DataLoader(ds_valid, batch_size=128,shuffle=False, num_workers=4)
  13. print(len(ds_train))
  14. print(len(ds_valid))
  15. out:
  16. 60000
  17. 10000
  18. %matplotlib inline
  19. %config InlineBackend.figure_format = 'svg'
  20. #查看部分样本
  21. from matplotlib import pyplot as plt
  22. plt.figure(figsize=(8,8))
  23. for i in range(9):
  24. img,label = ds_train[i]
  25. img = torch.squeeze(img)
  26. ax=plt.subplot(3,3,i+1)
  27. ax.imshow(img.numpy())
  28. ax.set_title("label = %d"%label)
  29. ax.set_xticks([])
  30. ax.set_yticks([])
  31. plt.show()
  32. #例5-2-1 脚本风格
  33. net = nn.Sequential()
  34. net.add_module("conv1",nn.Conv2d(in_channels=1,out_channels=32,kernel_size =
  35. 3))
  36. net.add_module("pool1",nn.MaxPool2d(kernel_size = 2,stride = 2))
  37. net.add_module("conv2",nn.Conv2d(in_channels=32,out_channels=64,kernel_size =
  38. 5))
  39. net.add_module("pool2",nn.MaxPool2d(kernel_size = 2,stride = 2))
  40. net.add_module("dropout",nn.Dropout2d(p = 0.1))
  41. net.add_module("adaptive_pool",nn.AdaptiveMaxPool2d((1,1)))
  42. net.add_module("flatten",nn.Flatten())
  43. net.add_module("linear1",nn.Linear(64,32))
  44. net.add_module("relu",nn.ReLU())
  45. net.add_module("linear2",nn.Linear(32,10))
  46. print(net)
  47. out:
  48. Sequential(
  49. (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  50. (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  51. (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  52. (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  53. (dropout): Dropout2d(p=0.1, inplace=False)
  54. (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1))
  55. (flatten): Flatten(start_dim=1, end_dim=-1)
  56. (linear1): Linear(in_features=64, out_features=32, bias=True)
  57. (relu): ReLU()
  58. (linear2): Linear(in_features=32, out_features=10, bias=True)
  59. )
  60. summary(net,input_shape=(1,32,32))
  61. out:
  62. ----------------------------------------------------------------
  63. Layer (type) Output Shape Param #
  64. ================================================================
  65. Conv2d-1 [-1, 32, 30, 30] 320
  66. MaxPool2d-2 [-1, 32, 15, 15] 0
  67. Conv2d-3 [-1, 64, 11, 11] 51,264
  68. MaxPool2d-4 [-1, 64, 5, 5] 0
  69. Dropout2d-5 [-1, 64, 5, 5] 0
  70. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  71. Flatten-7 [-1, 64] 0
  72. Linear-8 [-1, 32] 2,080
  73. ReLU-9 [-1, 32] 0
  74. Linear-10 [-1, 10] 330
  75. ================================================================
  76. Total params: 53,994
  77. Trainable params: 53,994
  78. Non-trainable params: 0
  79. ----------------------------------------------------------------
  80. Input size (MB): 0.003906
  81. Forward/backward pass size (MB): 0.359695
  82. Params size (MB): 0.205971
  83. Estimated Total Size (MB): 0.569572
  84. ----------------------------------------------------------------
  85. import datetime
  86. import numpy as np
  87. import pandas as pd
  88. from sklearn.metrics import accuracy_score
  89. def accuracy(y_pred,y_true):
  90. y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  91. return accuracy_score(y_true,y_pred_cls)
  92. loss_func = nn.CrossEntropyLoss()
  93. optimizer = torch.optim.Adam(params=net.parameters(),lr = 0.01)
  94. metric_func = accuracy
  95. metric_name = "accuracy"
  96. epochs = 3
  97. log_step_freq = 100
  98. dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name])
  99. print("Start Training...")
  100. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  101. print("=========="*8 + "%s"%nowtime)
  102. for epoch in range(1,epochs+1):
  103. # 1,训练循环-------------------------------------------------
  104. net.train()
  105. loss_sum = 0.0
  106. metric_sum = 0.0
  107. step = 1
  108. for step, (features,labels) in enumerate(dl_train, 1):
  109. # 梯度清零
  110. optimizer.zero_grad()
  111. # 正向传播求损失
  112. predictions = net(features)
  113. loss = loss_func(predictions,labels)
  114. metric = metric_func(predictions,labels)
  115. # 反向传播求梯度
  116. loss.backward()
  117. optimizer.step()
  118. # 打印batch级别日志
  119. loss_sum += loss.item()
  120. metric_sum += metric.item()
  121. if step%log_step_freq == 0:
  122. print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") % (step, loss_sum/step, metric_sum/step))
  123. # 2,验证循环-------------------------------------------------
  124. net.eval()
  125. val_loss_sum = 0.0
  126. val_metric_sum = 0.0
  127. val_step = 1
  128. for val_step, (features,labels) in enumerate(dl_valid, 1):
  129. with torch.no_grad():
  130. predictions = net(features)
  131. val_loss = loss_func(predictions,labels)
  132. val_metric = metric_func(predictions,labels)
  133. val_loss_sum += val_loss.item()
  134. val_metric_sum += val_metric.item()
  135. # 3,记录日志-------------------------------------------------
  136. info = (epoch, loss_sum/step, metric_sum/step,val_loss_sum/val_step, val_metric_sum/val_step)
  137. dfhistory.loc[epoch-1] = info
  138. # 打印epoch级别日志
  139. print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + " = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f") %info)
  140. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  141. print("\n"+"=========="*8 + "%s"%nowtime)
  142. print('Finished Training...')
  143. out:
  144. Start Training...
  145. ================================================================================2022-03-21 14:28:46
  146. [step = 100] loss: 0.735, accuracy: 0.750
  147. [step = 200] loss: 0.459, accuracy: 0.847
  148. [step = 300] loss: 0.351, accuracy: 0.885
  149. [step = 400] loss: 0.294, accuracy: 0.904
  150. EPOCH = 1, loss = 0.270,accuracy = 0.913, val_loss = 0.093, val_accuracy = 0.974
  151. ================================================================================2022-03-21 14:29:12
  152. [step = 100] loss: 0.103, accuracy: 0.968
  153. [step = 200] loss: 0.108, accuracy: 0.967
  154. [step = 300] loss: 0.107, accuracy: 0.967
  155. [step = 400] loss: 0.105, accuracy: 0.968
  156. EPOCH = 2, loss = 0.103,accuracy = 0.968, val_loss = 0.059, val_accuracy = 0.981
  157. ================================================================================2022-03-21 14:29:40
  158. [step = 100] loss: 0.083, accuracy: 0.973
  159. [step = 200] loss: 0.092, accuracy: 0.972
  160. [step = 300] loss: 0.092, accuracy: 0.972
  161. [step = 400] loss: 0.092, accuracy: 0.972
  162. EPOCH = 3, loss = 0.091,accuracy = 0.973, val_loss = 0.078, val_accuracy = 0.977
  163. ================================================================================2022-03-21 14:30:08
  164. Finished Training...

5.2.2 函数风格

函数风格就是在脚本风格的形式上作了简单的函数封装。

  1. #例5-2 用pytorch训练模型的三种方法:
  2. #准备数据
  3. import torch
  4. from torch import nn
  5. from torchkeras import summary
  6. import torchvision
  7. from torchvision import transforms
  8. transform = transforms.Compose([transforms.ToTensor()])
  9. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  10. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  11. dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128,shuffle=True, num_workers=4)
  12. dl_valid = torch.utils.data.DataLoader(ds_valid, batch_size=128,shuffle=False, num_workers=4)
  13. print(len(ds_train))
  14. print(len(ds_valid))
  15. out:
  16. 60000
  17. 10000
  18. %matplotlib inline
  19. %config InlineBackend.figure_format = 'svg'
  20. #查看部分样本
  21. from matplotlib import pyplot as plt
  22. plt.figure(figsize=(8,8))
  23. for i in range(9):
  24. img,label = ds_train[i]
  25. img = torch.squeeze(img)
  26. ax=plt.subplot(3,3,i+1)
  27. ax.imshow(img.numpy())
  28. ax.set_title("label = %d"%label)
  29. ax.set_xticks([])
  30. ax.set_yticks([])
  31. plt.show()
  32. #例5-2-2 函数风格
  33. class Net(nn.Module):
  34. def __init__(self):
  35. super(Net, self).__init__()
  36. self.layers = nn.ModuleList([
  37. nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  38. nn.MaxPool2d(kernel_size = 2,stride = 2),
  39. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  40. nn.MaxPool2d(kernel_size = 2,stride = 2),
  41. nn.Dropout2d(p = 0.1),
  42. nn.AdaptiveMaxPool2d((1,1)),
  43. nn.Flatten(),
  44. nn.Linear(64,32),
  45. nn.ReLU(),
  46. nn.Linear(32,10)]
  47. )
  48. def forward(self,x):
  49. for layer in self.layers:
  50. x = layer(x)
  51. return x
  52. net = Net()
  53. print(net)
  54. out:
  55. Net(
  56. (layers): ModuleList(
  57. (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  58. (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  59. (2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  60. (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  61. (4): Dropout2d(p=0.1, inplace=False)
  62. (5): AdaptiveMaxPool2d(output_size=(1, 1))
  63. (6): Flatten(start_dim=1, end_dim=-1)
  64. (7): Linear(in_features=64, out_features=32, bias=True)
  65. (8): ReLU()
  66. (9): Linear(in_features=32, out_features=10, bias=True)
  67. )
  68. )
  69. summary(net,input_shape=(1,32,32))
  70. out:
  71. ----------------------------------------------------------------
  72. Layer (type) Output Shape Param #
  73. ================================================================
  74. Conv2d-1 [-1, 32, 30, 30] 320
  75. MaxPool2d-2 [-1, 32, 15, 15] 0
  76. Conv2d-3 [-1, 64, 11, 11] 51,264
  77. MaxPool2d-4 [-1, 64, 5, 5] 0
  78. Dropout2d-5 [-1, 64, 5, 5] 0
  79. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  80. Flatten-7 [-1, 64] 0
  81. Linear-8 [-1, 32] 2,080
  82. ReLU-9 [-1, 32] 0
  83. Linear-10 [-1, 10] 330
  84. ================================================================
  85. Total params: 53,994
  86. Trainable params: 53,994
  87. Non-trainable params: 0
  88. ----------------------------------------------------------------
  89. Input size (MB): 0.003906
  90. Forward/backward pass size (MB): 0.359695
  91. Params size (MB): 0.205971
  92. Estimated Total Size (MB): 0.569572
  93. ----------------------------------------------------------------
  94. import datetime
  95. import numpy as np
  96. import pandas as pd
  97. from sklearn.metrics import accuracy_score
  98. def accuracy(y_pred,y_true):
  99. y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  100. return accuracy_score(y_true,y_pred_cls)
  101. model = net
  102. model.optimizer = torch.optim.SGD(model.parameters(),lr = 0.01)
  103. model.loss_func = nn.CrossEntropyLoss()
  104. model.metric_func = accuracy
  105. model.metric_name = "accuracy"
  106. def train_step(model,features,labels):
  107. # 训练模式,dropout层发生作用
  108. model.train()
  109. # 梯度清零
  110. model.optimizer.zero_grad()
  111. # 正向传播求损失
  112. predictions = model(features)
  113. loss = model.loss_func(predictions,labels)
  114. metric = model.metric_func(predictions,labels)
  115. # 反向传播求梯度
  116. loss.backward()
  117. model.optimizer.step()
  118. return loss.item(),metric.item()
  119. @torch.no_grad()
  120. def valid_step(model,features,labels):
  121. # 预测模式,dropout层不发生作用
  122. model.eval()
  123. predictions = model(features)
  124. loss = model.loss_func(predictions,labels)
  125. metric = model.metric_func(predictions,labels)
  126. return loss.item(), metric.item()
  127. # 测试train_step效果
  128. features,labels = next(iter(dl_train))
  129. train_step(model,features,labels)
  130. out:
  131. (2.3077056407928467, 0.125)
  132. def train_model(model,epochs,dl_train,dl_valid,log_step_freq):
  133. metric_name = model.metric_name
  134. dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name])
  135. print("Start Training...")
  136. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  137. print("=========="*8 + "%s"%nowtime)
  138. for epoch in range(1,epochs+1):
  139. # 1,训练循环-------------------------------------------------
  140. loss_sum = 0.0
  141. metric_sum = 0.0
  142. step = 1
  143. for step, (features,labels) in enumerate(dl_train, 1):
  144. loss,metric = train_step(model,features,labels)
  145. # 打印batch级别日志
  146. loss_sum += loss
  147. metric_sum += metric
  148. if step%log_step_freq == 0:
  149. print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") % (step, loss_sum/step, metric_sum/step))
  150. # 2,验证循环-------------------------------------------------
  151. val_loss_sum = 0.0
  152. val_metric_sum = 0.0
  153. val_step = 1
  154. for val_step, (features,labels) in enumerate(dl_valid, 1):
  155. val_loss,val_metric = valid_step(model,features,labels)
  156. val_loss_sum += val_loss
  157. val_metric_sum += val_metric
  158. # 3,记录日志-------------------------------------------------
  159. info = (epoch, loss_sum/step, metric_sum/step,val_loss_sum/val_step, val_metric_sum/val_step)
  160. dfhistory.loc[epoch-1] = info
  161. # 打印epoch级别日志
  162. print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \
  163. " = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f")
  164. %info)
  165. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  166. print("\n"+"=========="*8 + "%s"%nowtime)
  167. print('Finished Training...')
  168. return dfhistory
  169. epochs = 3
  170. dfhistory = train_model(model,epochs,dl_train,dl_valid,log_step_freq = 100)
  171. out:
  172. Start Training...
  173. ================================================================================2022-03-21 14:43:36
  174. [step = 100] loss: 2.301, accuracy: 0.111
  175. [step = 200] loss: 2.291, accuracy: 0.132
  176. [step = 300] loss: 2.282, accuracy: 0.167
  177. [step = 400] loss: 2.270, accuracy: 0.208
  178. EPOCH = 1, loss = 2.260,accuracy = 0.231, val_loss = 2.178, val_accuracy = 0.432
  179. ================================================================================2022-03-21 14:44:01
  180. [step = 100] loss: 2.157, accuracy: 0.404
  181. [step = 200] loss: 2.114, accuracy: 0.423
  182. [step = 300] loss: 2.058, accuracy: 0.437
  183. [step = 400] loss: 1.990, accuracy: 0.461
  184. EPOCH = 2, loss = 1.936,accuracy = 0.478, val_loss = 1.505, val_accuracy = 0.745
  185. ================================================================================2022-03-21 14:44:29
  186. [step = 100] loss: 1.473, accuracy: 0.613
  187. [step = 200] loss: 1.374, accuracy: 0.637
  188. [step = 300] loss: 1.277, accuracy: 0.661
  189. [step = 400] loss: 1.187, accuracy: 0.684
  190. EPOCH = 3, loss = 1.132,accuracy = 0.698, val_loss = 0.629, val_accuracy = 0.876
  191. ================================================================================2022-03-21 14:44:58
  192. Finished Training...

5.2.3 类风格

类风格有两种:torchkeras.Model 和 torchkeras.LightModel

  1. #例5-2 用pytorch训练模型的三种方法:
  2. #准备数据
  3. import torch
  4. from torch import nn
  5. from torchkeras import summary
  6. import torchvision
  7. from torchvision import transforms
  8. transform = transforms.Compose([transforms.ToTensor()])
  9. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  10. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  11. dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128,shuffle=True, num_workers=4)
  12. dl_valid = torch.utils.data.DataLoader(ds_valid, batch_size=128,shuffle=False, num_workers=4)
  13. print(len(ds_train))
  14. print(len(ds_valid))
  15. out:
  16. 60000
  17. 10000
  18. %matplotlib inline
  19. %config InlineBackend.figure_format = 'svg'
  20. #查看部分样本
  21. from matplotlib import pyplot as plt
  22. plt.figure(figsize=(8,8))
  23. for i in range(9):
  24. img,label = ds_train[i]
  25. img = torch.squeeze(img)
  26. ax=plt.subplot(3,3,i+1)
  27. ax.imshow(img.numpy())
  28. ax.set_title("label = %d"%label)
  29. ax.set_xticks([])
  30. ax.set_yticks([])
  31. plt.show()
  32. #例5-2-3-a 类风格--torchkeras.Model
  33. import torchkeras
  34. class CnnModel(nn.Module):
  35. def __init__(self):
  36. super().__init__()
  37. self.layers = nn.ModuleList([
  38. nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  39. nn.MaxPool2d(kernel_size = 2,stride = 2),
  40. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  41. nn.MaxPool2d(kernel_size = 2,stride = 2),
  42. nn.Dropout2d(p = 0.1),
  43. nn.AdaptiveMaxPool2d((1,1)),
  44. nn.Flatten(),
  45. nn.Linear(64,32),
  46. nn.ReLU(),
  47. nn.Linear(32,10)]
  48. )
  49. def forward(self,x):
  50. for layer in self.layers:
  51. x = layer(x)
  52. return x
  53. model = torchkeras.Model(CnnModel())
  54. print(model)
  55. out:
  56. Model(
  57. (net): CnnModel(
  58. (layers): ModuleList(
  59. (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  60. (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  61. (2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  62. (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  63. (4): Dropout2d(p=0.1, inplace=False)
  64. (5): AdaptiveMaxPool2d(output_size=(1, 1))
  65. (6): Flatten(start_dim=1, end_dim=-1)
  66. (7): Linear(in_features=64, out_features=32, bias=True)
  67. (8): ReLU()
  68. (9): Linear(in_features=32, out_features=10, bias=True)
  69. )
  70. )
  71. )
  72. model.summary(input_shape=(1,32,32))
  73. out:
  74. ----------------------------------------------------------------
  75. Layer (type) Output Shape Param #
  76. ================================================================
  77. Conv2d-1 [-1, 32, 30, 30] 320
  78. MaxPool2d-2 [-1, 32, 15, 15] 0
  79. Conv2d-3 [-1, 64, 11, 11] 51,264
  80. MaxPool2d-4 [-1, 64, 5, 5] 0
  81. Dropout2d-5 [-1, 64, 5, 5] 0
  82. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  83. Flatten-7 [-1, 64] 0
  84. Linear-8 [-1, 32] 2,080
  85. ReLU-9 [-1, 32] 0
  86. Linear-10 [-1, 10] 330
  87. ================================================================
  88. Total params: 53,994
  89. Trainable params: 53,994
  90. Non-trainable params: 0
  91. ----------------------------------------------------------------
  92. Input size (MB): 0.003906
  93. Forward/backward pass size (MB): 0.359695
  94. Params size (MB): 0.205971
  95. Estimated Total Size (MB): 0.569572
  96. ----------------------------------------------------------------
  97. from sklearn.metrics import accuracy_score
  98. def accuracy(y_pred,y_true):
  99. y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  100. return accuracy_score(y_true.numpy(),y_pred_cls.numpy())
  101. model.compile(loss_func = nn.CrossEntropyLoss(),optimizer= torch.optim.Adam(model.parameters(),lr = 0.02),metrics_dict={"accuracy":accuracy})
  102. dfhistory = model.fit(3,dl_train = dl_train, dl_val=dl_valid,log_step_freq=100)
  103. out:
  104. Start Training ...
  105. ================================================================================2022-03-21 14:56:50
  106. {'step': 100, 'loss': 0.734, 'accuracy': 0.753}
  107. {'step': 200, 'loss': 0.482, 'accuracy': 0.84}
  108. {'step': 300, 'loss': 0.384, 'accuracy': 0.875}
  109. {'step': 400, 'loss': 0.332, 'accuracy': 0.893}
  110. +-------+-------+----------+----------+--------------+
  111. | epoch | loss | accuracy | val_loss | val_accuracy |
  112. +-------+-------+----------+----------+--------------+
  113. | 1 | 0.305 | 0.903 | 0.1 | 0.971 |
  114. +-------+-------+----------+----------+--------------+
  115. ================================================================================2022-03-21 14:57:16
  116. {'step': 100, 'loss': 0.135, 'accuracy': 0.961}
  117. {'step': 200, 'loss': 0.187, 'accuracy': 0.948}
  118. {'step': 300, 'loss': 0.188, 'accuracy': 0.948}
  119. {'step': 400, 'loss': 0.175, 'accuracy': 0.952}
  120. +-------+-------+----------+----------+--------------+
  121. | epoch | loss | accuracy | val_loss | val_accuracy |
  122. +-------+-------+----------+----------+--------------+
  123. | 2 | 0.165 | 0.955 | 0.086 | 0.976 |
  124. +-------+-------+----------+----------+--------------+
  125. ================================================================================2022-03-21 14:57:44
  126. {'step': 100, 'loss': 0.109, 'accuracy': 0.969}
  127. {'step': 200, 'loss': 0.127, 'accuracy': 0.967}
  128. {'step': 300, 'loss': 0.155, 'accuracy': 0.961}
  129. {'step': 400, 'loss': 0.181, 'accuracy': 0.956}
  130. +-------+-------+----------+----------+--------------+
  131. | epoch | loss | accuracy | val_loss | val_accuracy |
  132. +-------+-------+----------+----------+--------------+
  133. | 3 | 0.179 | 0.956 | 0.122 | 0.971 |
  134. +-------+-------+----------+----------+--------------+
  135. ================================================================================2022-03-21 14:58:13
  136. Finished Training...
  1. #例 5-2用pytorch训练模型的三种方法:
  2. import torch
  3. #准备数据
  4. from torch import nn
  5. from torchkeras import summary
  6. import torchvision
  7. from torchvision import transforms
  8. transform = transforms.Compose([transforms.ToTensor()])
  9. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  10. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  11. dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128,shuffle=True, num_workers=4)
  12. dl_valid = torch.utils.data.DataLoader(ds_valid, batch_size=128,shuffle=False, num_workers=4)
  13. print(len(ds_train))
  14. print(len(ds_valid))
  15. out:
  16. 60000
  17. 10000
  18. %matplotlib inline
  19. %config InlineBackend.figure_format = 'svg'
  20. #查看部分样本
  21. from matplotlib import pyplot as plt
  22. plt.figure(figsize=(8,8))
  23. for i in range(9):
  24. img,label = ds_train[i]
  25. img = torch.squeeze(img)
  26. ax=plt.subplot(3,3,i+1)
  27. ax.imshow(img.numpy())
  28. ax.set_title("label = %d"%label)
  29. ax.set_xticks([])
  30. ax.set_yticks([])
  31. plt.show()
  32. #例5-2-3-b 类风格--torchkeras.LightModel
  33. import torchkeras
  34. import torchmetrics
  35. import pytorch_lightning as pl
  36. class CnnNet(nn.Module):
  37. def __init__(self):
  38. super().__init__()
  39. self.layers = nn.ModuleList([
  40. nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  41. nn.MaxPool2d(kernel_size = 2,stride = 2),
  42. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  43. nn.MaxPool2d(kernel_size = 2,stride = 2),
  44. nn.Dropout2d(p = 0.1),
  45. nn.AdaptiveMaxPool2d((1,1)),
  46. nn.Flatten(),
  47. nn.Linear(64,32),
  48. nn.ReLU(),
  49. nn.Linear(32,10)]
  50. )
  51. def forward(self,x):
  52. for layer in self.layers:
  53. x = layer(x)
  54. return x
  55. class Model(torchkeras.LightModel):
  56. def shared_step(self,batch)->dict:
  57. self.train_acc = torchmetrics.Accuracy()
  58. x, y = batch
  59. prediction = self(x)
  60. loss = nn.CrossEntropyLoss()(prediction,y)
  61. preds = torch.argmax(nn.Softmax(dim=1)(prediction),dim=1).data
  62. acc=self.train_acc(preds,y)
  63. self.log('train_acc',self.train_acc,metric_attribute='train_acc',on_step=True,on_epoch=False)
  64. dic = {"loss":loss,"acc":acc}
  65. return dic
  66. def configure_optimizers(self):
  67. optimizer = torch.optim.Adam(self.parameters(), lr=1e-2)
  68. lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=10, gamma=0.0001)
  69. return {"optimizer":optimizer,"lr_scheduler":lr_scheduler}
  70. pl.seed_everything(6666)
  71. net = CnnNet()
  72. model = Model(net)
  73. torchkeras.summary(model,input_shape=(1,32,32))
  74. print(model)
  75. out:
  76. Global seed set to 6666
  77. ----------------------------------------------------------------
  78. Layer (type) Output Shape Param #
  79. ================================================================
  80. Conv2d-1 [-1, 32, 30, 30] 320
  81. MaxPool2d-2 [-1, 32, 15, 15] 0
  82. Conv2d-3 [-1, 64, 11, 11] 51,264
  83. MaxPool2d-4 [-1, 64, 5, 5] 0
  84. Dropout2d-5 [-1, 64, 5, 5] 0
  85. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  86. Flatten-7 [-1, 64] 0
  87. Linear-8 [-1, 32] 2,080
  88. ReLU-9 [-1, 32] 0
  89. Linear-10 [-1, 10] 330
  90. ================================================================
  91. Total params: 53,994
  92. Trainable params: 53,994
  93. Non-trainable params: 0
  94. ----------------------------------------------------------------
  95. Input size (MB): 0.003906
  96. Forward/backward pass size (MB): 0.359695
  97. Params size (MB): 0.205971
  98. Estimated Total Size (MB): 0.569572
  99. ----------------------------------------------------------------
  100. Model(
  101. (net): CnnNet(
  102. (layers): ModuleList(
  103. (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  104. (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  105. (2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  106. (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  107. (4): Dropout2d(p=0.1, inplace=False)
  108. (5): AdaptiveMaxPool2d(output_size=(1, 1))
  109. (6): Flatten(start_dim=1, end_dim=-1)
  110. (7): Linear(in_features=64, out_features=32, bias=True)
  111. (8): ReLU()
  112. (9): Linear(in_features=32, out_features=10, bias=True)
  113. )
  114. )
  115. )
  116. ckpt_cb = pl.callbacks.ModelCheckpoint(monitor='val_loss')
  117. # set gpus=0 will use cpu,
  118. # set gpus=1 will use 1 gpu
  119. # set gpus=2 will use 2gpus
  120. # set gpus = -1 will use all gpus
  121. # you can also set gpus = [0,1] to use the given gpus
  122. # you can even set tpu_cores=2 to use two tpus
  123. trainer = pl.Trainer(max_epochs=10,gpus = 0, callbacks=[ckpt_cb])
  124. trainer.fit(model,dl_train,dl_valid)
  125. out:
  126. GPU available: True, used: False
  127. TPU available: False, using: 0 TPU cores
  128. IPU available: False, using: 0 IPUs
  129. | Name | Type | Params
  130. --------------------------------
  131. 0 | net | CnnNet | 54.0 K
  132. --------------------------------
  133. 54.0 K Trainable params
  134. 0 Non-trainable params
  135. 54.0 K Total params
  136. 0.216 Total estimated model params size (MB)
  137. #以下训练过程略

6、Pytorch的建模流程范例

使用Pytorch实现神经网络模型的一般流程包括:

a),准备数据

b),定义模型

c),训练模型

d),评估模型

e),使用模型

f),保存模型。

        对新手来说,其中最困难的部分实际上是准备数据过程。

        我们在实践中通常会遇到的数据类型包括结构化数据,图片数据,文本数据,时间序列数据。 我们将分别以titanic生存预测问题,cifar2图片分类问题,imdb电影评论分类问题,国内新冠疫 情结束时间预测问题为例,演示应用Pytorch对这四类数据的建模方法。

6.1 结构化数据建模流程范例

这里我们以Titanic数据集为例,先准备好一个打印时间的函数

  1. #例6-1  结构化数据建模流程范例
  2. import os
  3. import datetime
  4. #打印时间
  5. def printbar():
  6. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  7. print("\n"+"=========="*8 + "%s"%nowtime)

a)准备数据

  1. # a) 第一步:准备数据
  2. #titanic数据集的目标是根据乘客信息预测他们在Titanic号撞击冰山沉没后能否生存。
  3. #结构化数据一般会使用Pandas中的DataFrame进行预处理。
  4. import numpy as np
  5. import pandas as pd
  6. import matplotlib.pyplot as plt
  7. import torch
  8. from torch import nn
  9. from torch.utils.data import Dataset,DataLoader,TensorDataset
  10. dftrain_raw = pd.read_csv('./data/titanic/train.csv')
  11. dftest_raw = pd.read_csv('./data/titanic/test.csv')
  12. dftrain_raw.head(10)

输出数据集的前10行看看

survivedsexagen_sib_spparchfareclassdeckembark_townalone
00male22.0107.2500ThirdunknownSouthamptonn
11female38.01071.2833FirstCCherbourgn
21female26.0007.9250ThirdunknownSouthamptony
31female35.01053.1000FirstCSouthamptonn
40male28.0008.4583ThirdunknownQueenstowny
50male2.03121.0750ThirdunknownSouthamptonn
61female27.00211.1333ThirdunknownSouthamptonn
71female14.01030.0708SecondunknownCherbourgn
81female4.01116.7000ThirdGSouthamptonn
90male20.0008.0500ThirdunknownSouthamptony

看一下分布情况

  1. ax = dftrain_raw['survived'].value_counts().plot(kind = 'bar',figsize = (12,8),fontsize=15,rot = 0)
  2. ax.set_ylabel('Counts',fontsize = 15)
  3. ax.set_xlabel('survived',fontsize = 15)
  4. plt.show()

  1. #年龄分布情况
  2. ax = dftrain_raw['age'].plot(kind = 'hist',bins = 20,color= 'purple',figsize = (12,8),fontsize=15)
  3. ax.set_ylabel('Frequency',fontsize = 15)
  4. ax.set_xlabel('age',fontsize = 15)
  5. plt.show()

 

  1. #年龄和label的相关性
  2. ax = dftrain_raw.query('survived == 0')['age'].plot(kind = 'density',
  3. figsize = (12,8),fontsize=15)
  4. dftrain_raw.query('survived == 1')['age'].plot(kind = 'density',
  5. figsize = (12,8),fontsize=15)
  6. ax.legend(['survived==0','survived==1'],fontsize = 12)
  7. ax.set_ylabel('Density',fontsize = 15)
  8. ax.set_xlabel('age',fontsize = 15)
  9. plt.show()

  1. def preprocessing(dfdata):
  2. dfresult= pd.DataFrame()
  3. #Pclass
  4. dfPclass = pd.get_dummies(dfdata['class'])
  5. dfPclass.columns = ['class_' +str(x) for x in dfPclass.columns ]
  6. dfresult = pd.concat([dfresult,dfPclass],axis = 1)
  7. #Sex
  8. dfSex = pd.get_dummies(dfdata['sex'])
  9. dfresult = pd.concat([dfresult,dfSex],axis = 1)
  10. #Age
  11. dfresult['age'] = dfdata['age'].fillna(0)
  12. dfresult['Age_null'] = pd.isna(dfdata['age']).astype('int32')
  13. #SibSp,Parch,Fare
  14. dfresult['SibSp'] = dfdata['n_siblings_spouses']
  15. dfresult['Parch'] = dfdata['parch']
  16. dfresult['Fare'] = dfdata['fare']
  17. #Carbin
  18. dfresult['Cabin_null'] = pd.isna(dfdata['deck']).astype('int32')
  19. #Embarked
  20. dfEmbarked = pd.get_dummies(dfdata['embark_town'],dummy_na=True)
  21. dfEmbarked.columns = ['Embarked_' + str(x) for x in dfEmbarked.columns]
  22. dfresult = pd.concat([dfresult,dfEmbarked],axis = 1)
  23. return(dfresult)
  24. x_train = preprocessing(dftrain_raw).values
  25. y_train = dftrain_raw[['survived']].values
  26. x_test = preprocessing(dftest_raw).values
  27. y_test = dftest_raw[['survived']].values
  28. print("x_train.shape =", x_train.shape )
  29. print("x_test.shape =", x_test.shape)
  30. print("y_train.shape =", y_train.shape )
  31. print("y_test.shape =", y_test.shape )
  32. dl_train = DataLoader(TensorDataset(torch.tensor(x_train).float(),torch.tensor(y_train).float()), shuffle = True, batch_size = 8)
  33. dl_valid = DataLoader(TensorDataset(torch.tensor(x_test).float(),torch.tensor(y_test).float()), shuffle = False, batch_size = 8)
  34. out:
  35. x_train.shape = (627, 16)
  36. x_test.shape = (264, 16)
  37. y_train.shape = (627, 1)
  38. y_test.shape = (264, 1)

b) 定义模型

  1. # b) 第二步:定义模型
  2. def create_net():
  3. net = nn.Sequential()
  4. net.add_module("linear1",nn.Linear(16,20))
  5. net.add_module("relu1",nn.ReLU())
  6. net.add_module("linear2",nn.Linear(20,16))
  7. net.add_module("relu2",nn.ReLU())
  8. net.add_module("linear3",nn.Linear(16,1))
  9. net.add_module("sigmoid",nn.Sigmoid())
  10. return net
  11. net = create_net()
  12. print(net)
  13. from torchkeras import summary
  14. summary(net,input_shape=(16,))
  15. out:
  16. Sequential(
  17. (linear1): Linear(in_features=16, out_features=20, bias=True)
  18. (relu1): ReLU()
  19. (linear2): Linear(in_features=20, out_features=16, bias=True)
  20. (relu2): ReLU()
  21. (linear3): Linear(in_features=16, out_features=1, bias=True)
  22. (sigmoid): Sigmoid()
  23. )
  24. ----------------------------------------------------------------
  25. Layer (type) Output Shape Param #
  26. ================================================================
  27. Linear-1 [-1, 20] 340
  28. ReLU-2 [-1, 20] 0
  29. Linear-3 [-1, 16] 336
  30. ReLU-4 [-1, 16] 0
  31. Linear-5 [-1, 1] 17
  32. Sigmoid-6 [-1, 1] 0
  33. ================================================================
  34. Total params: 693
  35. Trainable params: 693
  36. Non-trainable params: 0
  37. ----------------------------------------------------------------
  38. Input size (MB): 0.000061
  39. Forward/backward pass size (MB): 0.000565
  40. Params size (MB): 0.002644
  41. Estimated Total Size (MB): 0.003269
  42. ----------------------------------------------------------------

c) 训练模型

  1. # c) 第三步:训练模型
  2. from sklearn.metrics import accuracy_score
  3. loss_func = nn.BCELoss()
  4. optimizer = torch.optim.Adam(params=net.parameters(),lr = 0.01)
  5. metric_func = lambda y_pred,y_true:accuracy_score(y_true.data.numpy(),y_pred.data.numpy()>0.5)
  6. metric_name = "accuracy"
  7. epochs = 50
  8. log_step_freq = 50
  9. dfhistory = pd.DataFrame(columns =
  10. ["epoch","loss",metric_name,"val_loss","val_"+metric_name])
  11. print("Start Training...")
  12. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  13. print("=========="*8 + "%s"%nowtime)
  14. for epoch in range(1,epochs+1):
  15. # 1,训练循环-------------------------------------------------
  16. net.train()
  17. loss_sum = 0.0
  18. metric_sum = 0.0
  19. step = 1
  20. for step, (features,labels) in enumerate(dl_train, 1):
  21. # 梯度清零
  22. optimizer.zero_grad()
  23. # 正向传播求损失
  24. predictions = net(features)
  25. loss = loss_func(predictions,labels)
  26. metric = metric_func(predictions,labels)
  27. # 反向传播求梯度
  28. loss.backward()
  29. optimizer.step()
  30. # 打印batch级别日志
  31. loss_sum += loss.item()
  32. metric_sum += metric.item()
  33. if step%log_step_freq == 0:
  34. print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") % (step, loss_sum/step, metric_sum/step))
  35. # 2,验证循环-------------------------------------------------
  36. net.eval()
  37. val_loss_sum = 0.0
  38. val_metric_sum = 0.0
  39. val_step = 1
  40. for val_step, (features,labels) in enumerate(dl_valid, 1):
  41. predictions = net(features)
  42. val_loss = loss_func(predictions,labels)
  43. val_metric = metric_func(predictions,labels)
  44. val_loss_sum += val_loss.item()
  45. val_metric_sum += val_metric.item()
  46. # 3,记录日志-------------------------------------------------
  47. info = (epoch, loss_sum/step, metric_sum/step,val_loss_sum/val_step, val_metric_sum/val_step)
  48. dfhistory.loc[epoch-1] = info
  49. # 打印epoch级别日志
  50. print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \
  51. " = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f")
  52. %info)
  53. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  54. print("\n"+"=========="*8 + "%s"%nowtime)
  55. print('Finished Training...')
  56. out:
  57. #训练过程略

d)评估模型

  1. # d) 第四步:评估模型
  2. import matplotlib.pyplot as plt
  3. def plot_metric(dfhistory, metric):
  4. train_metrics = dfhistory[metric]
  5. val_metrics = dfhistory['val_'+metric]
  6. epochs = range(1, len(train_metrics) + 1)
  7. plt.plot(epochs, train_metrics, 'bo--')
  8. plt.plot(epochs, val_metrics, 'ro-')
  9. plt.title('Training and validation '+ metric)
  10. plt.xlabel("Epochs")
  11. plt.ylabel(metric)
  12. plt.legend(["train_"+metric, 'val_'+metric])
  13. plt.show()
  14. plot_metric(dfhistory,"loss")
  15. plot_metric(dfhistory,"accuracy")
  16. print('USE model')

 e) 使用模型

  1. # e) 第五步:使用模型
  2. #预测概率
  3. y_pred_probs = net(torch.tensor(x_test[0:10]).float()).data
  4. print('预测概率:',y_pred_probs)
  5. #预测类别
  6. y_pred = torch.where(y_pred_probs>0.5,torch.ones_like(y_pred_probs),torch.zeros_like(y_pred_probs))
  7. print('预测类别:',y_pred)
  8. out:
  9. 预测概率: tensor([[0.1144],
  10. [0.1724],
  11. [0.8196],
  12. [0.7453],
  13. [0.0762],
  14. [0.8451],
  15. [0.2092],
  16. [0.1089],
  17. [0.4448],
  18. [0.8436]])
  19. 预测类别: tensor([[0.],
  20. [0.],
  21. [1.],
  22. [1.],
  23. [0.],
  24. [1.],
  25. [0.],
  26. [0.],
  27. [0.],
  28. [1.]])

f) 保存模型

  1. # f)第六步:保存模型
  2. '''
  3. print('打印参数:',net.state_dict().keys())
  4. '''
  5. # 保存模型参数(推荐)
  6. torch.save(net.state_dict(), "./data/6-1_model_parameter.pkl")
  7. net_clone = create_net()
  8. net_clone.load_state_dict(torch.load("./data/6-1_model_parameter.pkl"))
  9. net_clone.forward(torch.tensor(x_test[0:10]).float()).data
  10. #保存完整模型(不推荐)
  11. '''
  12. torch.save(net, './data/6-1_model_parameter.pkl')
  13. net_loaded = torch.load('./data/6-1_model_parameter.pkl')
  14. net_loaded(torch.tensor(x_test[0:10]).float()).data
  15. '''

6.2 图片数据建模流程范例

在Pytorch中构建图片数据管道通常有两种方法。

第一种是使用 torchvision中的datasets.ImageFolder来读取图片然后用 DataLoader来并行加载。 第二种是通过继承 torch.utils.data.Dataset 实现用户自定义读取逻辑然后用 DataLoader来并行加载。

第二种方法是读取用户自定义数据集的通用方法,既可以读取图片数据集,也可以读取文本数据 集。 

下面我们以第一种方法为例示范图片数据建模流程

a)准备数据

  1. #例6-2  图片数据建模流程范例
  2. import os
  3. import datetime
  4. #打印时间
  5. def printbar():
  6. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  7. print("\n"+"=========="*8 + "%s"%nowtime)
  8. a) 第一步:准备数据
  9. import torch
  10. from torch import nn
  11. from torch.utils.data import Dataset,DataLoader
  12. from torchvision import transforms,datasets
  13. transform_train = transforms.Compose(
  14. [transforms.ToTensor()])
  15. transform_valid = transforms.Compose(
  16. [transforms.ToTensor()])
  17. ds_train = datasets.ImageFolder("data/cifar10/train/",
  18. transform = transform_train,target_transform= lambda
  19. t:torch.tensor([t]).float())
  20. ds_valid = datasets.ImageFolder("data/cifar10/test/",
  21. transform = transform_train,target_transform= lambda
  22. t:torch.tensor([t]).float())
  23. print(ds_train.class_to_idx)
  24. out:
  25. {'bird': 0, 'car': 1, 'cat': 2, 'deer': 3, 'dog': 4, 'frog': 5, 'horse': 6, 'plane': 7, 'ship': 8, 'truck': 9}
  26. dl_train = DataLoader(ds_train,batch_size = 50,shuffle = True,num_workers=0)#这里num_workers=0 是因为上面用了lambda,无法开启子进程,除非把它们放在‘main’主进程中
  27. dl_valid = DataLoader(ds_valid,batch_size = 50,shuffle = True,num_workers=0)
  28. %matplotlib inline
  29. %config InlineBackend.figure_format = 'svg'
  30. #查看部分样本
  31. from matplotlib import pyplot as plt
  32. plt.figure(figsize=(8,8))
  33. for i in range(9):
  34. img,label = ds_train[i]
  35. img = img.permute(1,2,0)
  36. ax=plt.subplot(3,3,i+1)
  37. ax.imshow(img.numpy())
  38. ax.set_title("label = %d"%label.item())
  39. ax.set_xticks([])
  40. ax.set_yticks([])
  41. plt.show()
  42. out:
  43. 图片显示略
  44. # Pytorch的图片默认顺序是 Batch,Channel,Width,Height
  45. for x,y in dl_train:
  46. print(x.shape,y.shape)
  47. break
  48. #上面的num_workers要设为0,否则这里就会报错
  49. out:
  50. torch.Size([50, 3, 32, 32]) torch.Size([50, 1])

b) 定义模型

  1. # b) 第二步: 定义模型
  2. import torch.nn.functional as F
  3. class Net(nn.Module):
  4. def __init__(self):
  5. super(Net, self).__init__()
  6. self.conv1 = nn.Conv2d(in_channels=3,out_channels=6,kernel_size = 5)
  7. self.pool = nn.MaxPool2d(kernel_size = 2,stride = 2)
  8. self.conv2 = nn.Conv2d(in_channels=6,out_channels=16,kernel_size = 5)
  9. self.dropout = nn.Dropout2d(p = 0.1)
  10. #self.adaptive_pool = nn.AdaptiveMaxPool2d((1,1))
  11. #self.flatten = nn.Flatten()
  12. self.linear1 = nn.Linear(16*5*5,120)
  13. self.relu = nn.ReLU()
  14. self.linear2 = nn.Linear(120,84)
  15. self.linear3 = nn.Linear(84,10)
  16. self.softmax = nn.Softmax(dim=0)
  17. def forward(self,x):
  18. x = self.conv1(x)
  19. x = self.relu(x)
  20. x = self.pool(x)
  21. x = self.conv2(x)
  22. x = self.relu(x)
  23. x = self.pool(x)
  24. x = x.view(-1,16*5*5)
  25. x = self.dropout(x)
  26. #x = self.adaptive_pool(x)
  27. #x = self.flatten(x)
  28. x = self.linear1(x)
  29. x = self.relu(x)
  30. x = self.linear2(x)
  31. x = self.relu(x)
  32. x = self.linear3(x)
  33. #y = self.softmax(x)
  34. return x
  35. net = Net()
  36. print(net)
  37. out:
  38. Net(
  39. (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  40. (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  41. (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  42. (dropout): Dropout2d(p=0.1, inplace=False)
  43. (linear1): Linear(in_features=400, out_features=120, bias=True)
  44. (relu): ReLU()
  45. (linear2): Linear(in_features=120, out_features=84, bias=True)
  46. (linear3): Linear(in_features=84, out_features=10, bias=True)
  47. (softmax): Softmax(dim=0)
  48. )
  49. import torchkeras
  50. torchkeras.summary(net,input_shape= (3,32,32))
  51. out:
  52. ----------------------------------------------------------------
  53. Layer (type) Output Shape Param #
  54. ================================================================
  55. Conv2d-1 [-1, 6, 28, 28] 456
  56. ReLU-2 [-1, 6, 28, 28] 0
  57. MaxPool2d-3 [-1, 6, 14, 14] 0
  58. Conv2d-4 [-1, 16, 10, 10] 2,416
  59. ReLU-5 [-1, 16, 10, 10] 0
  60. MaxPool2d-6 [-1, 16, 5, 5] 0
  61. Dropout2d-7 [-1, 400] 0
  62. Linear-8 [-1, 120] 48,120
  63. ReLU-9 [-1, 120] 0
  64. Linear-10 [-1, 84] 10,164
  65. ReLU-11 [-1, 84] 0
  66. Linear-12 [-1, 10] 850
  67. ================================================================
  68. Total params: 62,006
  69. Trainable params: 62,006
  70. Non-trainable params: 0
  71. ----------------------------------------------------------------
  72. Input size (MB): 0.011719
  73. Forward/backward pass size (MB): 0.114456
  74. Params size (MB): 0.236534
  75. Estimated Total Size (MB): 0.362709
  76. ----------------------------------------------------------------

c)  训练模型

  1. #c) 第三步:训练模型
  2. import pandas as pd
  3. import numpy as np
  4. from sklearn.metrics import roc_auc_score
  5. from sklearn.metrics import accuracy_score
  6. from sklearn.metrics import recall_score
  7. from sklearn.metrics import precision_score
  8. model = net
  9. model.optimizer = torch.optim.SGD(model.parameters(),lr = 0.01)
  10. #model.loss_func = torch.nn.BCELoss() # 这个类实现的是二分类交叉熵
  11. model.loss_func = torch.nn.CrossEntropyLoss() #这里的交叉熵实现多分类
  12. #y_pred = model.predict(X_test_data)
  13. #model.metric_func = lambda y_pred,y_true:roc_auc_score(y_true.data.numpy(),y_pred.data.numpy(),multi_class='ovr') #要添加multi_class=‘ovo’
  14. #model.metric_func = lambda y_pred,y_true:precision_score(y_true.data.numpy(),np.argmax(y_pred.data.numpy(), axis=1),average='weighted')
  15. model.metric_func = lambda y_pred,y_true:accuracy_score(y_true.data.numpy(),np.argmax(y_pred.data.numpy(), axis=1))
  16. model.metric_name = "acc"
  17. def train_step(model,features,labels):
  18. # 训练模式,dropout层发生作用
  19. model.train()
  20. # 梯度清零
  21. model.optimizer.zero_grad()
  22. # 正向传播求损失
  23. pred = model(features)
  24. #print(pred)
  25. #print(labels)
  26. labels = labels.squeeze().long()
  27. #print(labels)
  28. loss = model.loss_func(pred,labels)
  29. metric = model.metric_func(pred,labels)
  30. # 反向传播求梯度
  31. loss.backward()
  32. model.optimizer.step()
  33. return loss.item(),metric.item()
  34. def valid_step(model,features,labels):
  35. # 预测模式,dropout层不发生作用
  36. model.eval()
  37. predictions = model(features)
  38. labels = labels.squeeze().long()
  39. loss = model.loss_func(predictions,labels)
  40. metric = model.metric_func(predictions,labels)
  41. return loss.item(), metric.item()
  42. # 测试train_step效果
  43. features,labels = next(iter(dl_train))
  44. train_step(model,features,labels)
  45. out:
  46. (2.302344560623169, 0.08)
  47. import warnings
  48. warnings.filterwarnings("ignore")
  49. def train_model(model,epochs,dl_train,dl_valid,log_step_freq):
  50. metric_name = model.metric_name
  51. dfhistory = pd.DataFrame(columns = ["epoch","loss",metric_name,"val_loss","val_"+metric_name])
  52. print("Start Training...")
  53. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  54. print("=========="*8 + "%s"%nowtime)
  55. for epoch in range(1,epochs+1):
  56. # 1,训练循环-------------------------------------------------
  57. loss_sum = 0.0
  58. metric_sum = 0.0
  59. step = 1
  60. for step, (features,labels) in enumerate(dl_train, 1):
  61. #print(features.shape,labels.shape)
  62. loss,metric = train_step(model,features,labels)
  63. # 打印batch级别日志
  64. loss_sum += loss
  65. metric_sum += metric
  66. if step%log_step_freq == 0:
  67. print(("[step = %d] loss: %.3f, "+metric_name+": %.3f") % (step, loss_sum/step, metric_sum/step))
  68. # 2,验证循环-------------------------------------------------
  69. val_loss_sum = 0.0
  70. val_metric_sum = 0.0
  71. val_step = 1
  72. for val_step, (features,labels) in enumerate(dl_valid, 1):
  73. val_loss,val_metric = valid_step(model,features,labels)
  74. val_loss_sum += val_loss
  75. val_metric_sum += val_metric
  76. # 3,记录日志-------------------------------------------------
  77. info = (epoch, loss_sum/step, metric_sum/step,
  78. val_loss_sum/val_step, val_metric_sum/val_step)
  79. dfhistory.loc[epoch-1] = info
  80. # 打印epoch级别日志
  81. print(("\nEPOCH = %d, loss = %.3f,"+ metric_name + \
  82. " = %.3f, val_loss = %.3f, "+"val_"+ metric_name+" = %.3f")
  83. %info)
  84. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  85. print("\n"+"=========="*8 + "%s"%nowtime)
  86. print('Finished Training...')
  87. return dfhistory
  88. epochs = 80
  89. dfhistory = train_model(model,epochs,dl_train,dl_valid,log_step_freq = 50)
  90. out:
  91. #训练过程略

d)  评估模型
 

  1. # d) 第四步:评估模型
  2. print(dfhistory)
  3. out:
  4. epoch loss acc val_loss val_acc
  5. 0 1.0 2.300890 0.12294 2.295850 0.1349
  6. 1 2.0 2.229730 0.15780 2.082908 0.2141
  7. 2 3.0 2.029092 0.24178 1.965499 0.2808
  8. 3 4.0 1.947292 0.28162 1.884967 0.3101
  9. 4 5.0 1.852998 0.32598 1.753345 0.3635
  10. ... ... ... ... ... ...
  11. 75 76.0 0.691221 0.75330 1.114598 0.6435
  12. 76 77.0 0.682672 0.75696 1.107986 0.6433
  13. 77 78.0 0.678343 0.75846 1.126077 0.6427
  14. 78 79.0 0.674721 0.75902 1.108517 0.6399
  15. 79 80.0 0.671216 0.76088 1.128913 0.6426
  16. 80 rows × 5 columns
  17. %matplotlib inline
  18. %config InlineBackend.figure_format = 'svg'
  19. import matplotlib.pyplot as plt
  20. def plot_metric(dfhistory, metric):
  21. train_metrics = dfhistory[metric]
  22. val_metrics = dfhistory['val_'+metric]
  23. epochs = range(1, len(train_metrics) + 1)
  24. plt.plot(epochs, train_metrics, 'bo--')
  25. plt.plot(epochs, val_metrics, 'ro-')
  26. plt.title('Training and validation '+ metric)
  27. plt.xlabel("Epochs")
  28. plt.ylabel(metric)
  29. plt.legend(["train_"+metric, 'val_'+metric])
  30. plt.show()
  31. plot_metric(dfhistory,"loss")
  32. out:
  33. #训练图略

e) 使用模型

  1. # e) 第五步:使用模型
  2. def predict(model,dl):
  3. model.eval()
  4. result = torch.cat([model.forward(t[0]) for t in dl])
  5. return(result.data)
  6. #预测概率
  7. y_pred_probs = predict(model,dl_valid)
  8. print(y_pred_probs)
  9. #预测类别
  10. def transclass(inputtensor):
  11. classes={0:'bird',1:'car',2:'cat',3:'deer',4:'dog',5:'frog',6:'horse',7:'plane',8:'ship',9:'truck'}
  12. numlist=inputtensor.numpy()
  13. classlist=[]
  14. for n in numlist:
  15. classlist.append(classes[n])
  16. return classlist
  17. values, predictions = torch.max(y_pred_probs.data, 1)
  18. print(transclass(predictions))
  19. out:
  20. tensor([[ 1.8810, -6.9985, 5.2611, ..., -1.9135, -3.2537, -2.1470],
  21. [-7.4959, 17.1518, -1.6427, ..., -0.1497, -0.7619, 11.8685],
  22. [ 0.1047, 1.8250, -3.2704, ..., 6.7653, 1.8235, 3.0359],
  23. ...,
  24. [ 0.9541, 2.4879, -1.7344, ..., 1.5825, 1.0240, -1.2932],
  25. [ 1.8352, -2.5948, 3.4929, ..., -1.1408, 2.0713, 4.4101],
  26. [ 1.8674, 1.3027, 2.9357, ..., -4.3468, -0.2972, -1.9376]])
  27. ['cat', 'car', 'plane', 'dog', 'dog', 'frog', 'deer', 'dog', 'horse', 'plane', 'frog', 'frog', 'dog', 'cat', 'dog', 'ship', 'cat', 'bird', 'dog', 'dog', 'cat', 'horse', 'dog', 'car', 'dog', 'frog', 'frog', 'car', 'dog', 'frog', 'cat', 'bird', 'ship', 'plane', 'horse', 'dog', 'cat', 'car', 'plane', 'ship', 'horse', 'truck', 'plane', 'car', 'ship', ...,
  28. 'dog', 'ship', 'frog', 'horse', 'horse', 'truck', 'horse', 'bird', 'dog', 'ship', 'frog', 'frog', 'horse', 'cat', 'frog', 'dog', 'horse', 'plane', 'deer', 'ship', 'plane', 'plane', 'truck', 'truck', 'bird', 'truck', 'truck', 'frog', 'cat', 'dog', 'bird', 'horse', 'frog', 'plane', 'truck', 'cat', 'truck', 'dog', 'deer', 'horse', 'plane', 'bird', 'bird', 'horse', 'deer', 'dog', 'plane', 'car', 'ship', 'car', 'car', 'plane', 'truck', 'bird', 'deer', 'cat', 'car', 'truck', 'cat']

f)  保存模型

  1. # f) 第六步:保存模型
  2. print(model.state_dict().keys())
  3. # 保存模型参数
  4. torch.save(model.state_dict(), "./data/6-2_model_parameter.pkl")
  5. net_clone = Net()
  6. net_clone.load_state_dict(torch.load("./data/6-2_model_parameter.pkl"))
  7. predict(net_clone,dl_valid)
  8. out:
  9. odict_keys(['conv1.weight', 'conv1.bias', 'conv2.weight', 'conv2.bias', 'linear1.weight', 'linear1.bias', 'linear2.weight', 'linear2.bias', 'linear3.weight', 'linear3.bias'])
  10. tensor([[ 2.6804, -6.3126, 3.4907, ..., -2.9137, -2.9597, -2.6070],
  11. [-0.4212, -4.3194, 2.4341, ..., -3.3696, -4.6787, -4.0673],
  12. [-1.9623, 1.0195, 2.6026, ..., 3.6996, 1.7376, -0.9068],
  13. ...,
  14. [ 4.3411, -4.9298, 0.9338, ..., 2.1229, 2.3299, -2.9601],
  15. [ 0.9571, -4.2395, -0.1160, ..., -0.1015, -5.6702, 0.6289],
  16. [ 0.2636, -1.0872, -1.6660, ..., 4.7971, 8.0454, -1.1111]])

6.3 文本数据建模流程范例

        文本数据预处理较为繁琐,包括中文切词(本示例不涉及),构建词典,编码转换,序列填充, 构建数据管道等等。

        在torch中预处理文本数据一般使用torchtext或者自定义Dataset,torchtext功能非常强大,可以 构建文本分类,序列标注,问答模型,机器翻译等NLP任务的数据集。

torchtext常见API一览

torchtext.data.Example : 用来表示一个样本,数据和标签

torchtext.vocab.Vocab: 词汇表,可以导入一些预训练词向量

torchtext.data.Datasets: 数据集类, __getitem__ 返回 Example实例, torchtext.data.TabularDataset是其子类。

torchtext.data.Field : 用来定义字段的处理方法(文本字段,标签字段)创建 Example时的 预处理,batch 时的一些处理操作。

torchtext.data.Iterator: 迭代器,用来生成 batch torchtext.datasets: 包含了常见的数据集.

下面以豆瓣电影评论为例,示范文本数据建模流程:

a) 准备数据

  1. #例6-3  文本数据建模流程范例
  2. import os
  3. import datetime
  4. #打印时间
  5. def printbar():
  6. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  7. print("\n"+"=========="*8 + "%s"%nowtime)
  8. # a) 第一步:准备数据
  9. import torch
  10. import jieba
  11. import string,re
  12. import torchtext
  13. #from torchtext.legacy.data import Field,TabularDataset,Iterator,BucketIterator
  14. MAX_WORDS = 10000 # 仅考虑最高频的10000个词
  15. MAX_LEN = 200 # 每个样本保留200个词的长度
  16. BATCH_SIZE = 20
  17. #分词方法
  18. def clean_text(text):
  19. #过滤不需要的符号
  20. bd='[’!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~]+,。!?“”《》:、.'
  21. for i in bd:
  22. text=text.replace(i,'') #字符串替换去标点符号
  23. #用jieba分词
  24. fenci=jieba.lcut(text)
  25. return fenci
  26. #过滤掉低频词
  27. def filterLowFreqWords(arr,vocab):
  28. arr = [[x if x<MAX_WORDS else 0 for x in example] for example in arr]
  29. return arr
  30. #1,定义各个字段的预处理方法
  31. TEXT = torchtext.legacy.data.Field(sequential=True, tokenize=clean_text, lower=True,fix_length=MAX_LEN,postprocessing = filterLowFreqWords)
  32. LABEL = torchtext.legacy.data.Field(sequential=False, use_vocab=False)
  33. #2,构建表格型dataset
  34. #torchtext.data.TabularDataset可读取csv,tsv,json等格式
  35. ds_train, ds_test = torchtext.legacy.data.TabularDataset.splits(
  36. path='./data/douban', train='train.csv',test='test.csv', format='csv',fields=[('label', LABEL), ('text', TEXT)],skip_header = True)
  37. #因为豆瓣评分是1-5分,这里把它们转化为好评和差评,3分以上的算好评,3分以下的算差评
  38. for dltrain in ds_train:
  39. if int(dltrain.label)>3:
  40. dltrain.label=1
  41. else:
  42. dltrain.label=0
  43. for dltest in ds_test:
  44. if int(dltest.label)>3:
  45. dltest.label=1
  46. else:
  47. dltest.label=0
  48. #3,构建词典
  49. TEXT.build_vocab(ds_train)
  50. #4,构建数据管道迭代器
  51. train_iter, test_iter = torchtext.legacy.data.Iterator.splits( (ds_train, ds_test), sort_within_batch=True,sort_key=lambda x:len(x.text), batch_sizes=(BATCH_SIZE,BATCH_SIZE))
  52. #查看example信息
  53. print(ds_train[0].text)
  54. print(ds_train[0].label)
  55. out:
  56. ['小', '的', '时候', '完全', '不', '爱看', '太', '暴力', '太', '黑社会', '了']
  57. 0
  58. # 查看词典信息
  59. print(len(TEXT.vocab))
  60. #itos: index to string
  61. print(TEXT.vocab.itos[0])
  62. print(TEXT.vocab.itos[1])
  63. #stoi: string to index
  64. print(TEXT.vocab.stoi['<unk>']) #unknown 未知词
  65. print(TEXT.vocab.stoi['<pad>']) #padding 填充
  66. #freqs: 词频
  67. print(TEXT.vocab.freqs['<unk>'])
  68. print('"好看"的数量:',TEXT.vocab.freqs['好看'])
  69. print('"电影"的数量',TEXT.vocab.freqs['电影'])
  70. print('"导演"的数量',TEXT.vocab.freqs['导演'])
  71. # 查看数据管道信息
  72. # 注意有坑:text第0维是句子长度
  73. out:
  74. 77428
  75. <unk>
  76. <pad>
  77. 0
  78. 1
  79. 0
  80. "好看"的数量: 1505
  81. "电影"的数量 6122
  82. "导演"的数量 1567
  83. # 将数据管道组织成torch.utils.data.DataLoader相似的features,label输出形式
  84. class DataLoader:
  85. def __init__(self,data_iter):
  86. self.data_iter = data_iter
  87. self.length = len(data_iter)
  88. def __len__(self):
  89. return self.length
  90. def __iter__(self):
  91. # 注意:此处调整features为 batch first,并调整label的shape和dtype
  92. for batch in self.data_iter:
  93. yield(torch.transpose(batch.text,0,1),torch.unsqueeze(batch.label.float(),dim = 1))
  94. dl_train = DataLoader(train_iter)
  95. dl_test = DataLoader(test_iter)

b)  定义模型

  1. import torch
  2. from torch import nn
  3. import torchkeras
  4. torch.random.seed()
  5. import torch
  6. from torch import nn
  7. class Net(torchkeras.Model):
  8. def __init__(self):
  9. super(Net, self).__init__()
  10. #设置padding_idx参数后将在训练过程中将填充的token始终赋值为0向量
  11. self.embedding = nn.Embedding(num_embeddings = MAX_WORDS,embedding_dim = 3,padding_idx = 1)
  12. self.conv = nn.Sequential()
  13. self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
  14. self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
  15. self.conv.add_module("relu_1",nn.ReLU())
  16. self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
  17. self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
  18. self.conv.add_module("relu_2",nn.ReLU())
  19. self.dense = nn.Sequential()
  20. self.dense.add_module("flatten",nn.Flatten())
  21. self.dense.add_module("linear",nn.Linear(6144,1))
  22. self.dense.add_module("sigmoid",nn.Sigmoid())
  23. def forward(self,x):
  24. x = self.embedding(x).transpose(1,2)
  25. x = self.conv(x)
  26. y = self.dense(x)
  27. return y
  28. model = Net()
  29. print(model)
  30. model.summary(input_shape = (200,),input_dtype = torch.LongTensor)
  31. out:
  32. Net(
  33. (embedding): Embedding(10000, 3, padding_idx=1)
  34. (conv): Sequential(
  35. (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  36. (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  37. (relu_1): ReLU()
  38. (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  39. (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  40. (relu_2): ReLU()
  41. )
  42. (dense): Sequential(
  43. (flatten): Flatten(start_dim=1, end_dim=-1)
  44. (linear): Linear(in_features=6144, out_features=1, bias=True)
  45. (sigmoid): Sigmoid()
  46. )
  47. )
  48. ----------------------------------------------------------------
  49. Layer (type) Output Shape Param #
  50. ================================================================
  51. Embedding-1 [-1, 200, 3] 30,000
  52. Conv1d-2 [-1, 16, 196] 256
  53. MaxPool1d-3 [-1, 16, 98] 0
  54. ReLU-4 [-1, 16, 98] 0
  55. Conv1d-5 [-1, 128, 97] 4,224
  56. MaxPool1d-6 [-1, 128, 48] 0
  57. ReLU-7 [-1, 128, 48] 0
  58. Flatten-8 [-1, 6144] 0
  59. Linear-9 [-1, 1] 6,145
  60. Sigmoid-10 [-1, 1] 0
  61. ================================================================
  62. Total params: 40,625
  63. Trainable params: 40,625
  64. Non-trainable params: 0
  65. ----------------------------------------------------------------
  66. Input size (MB): 0.000763
  67. Forward/backward pass size (MB): 0.287796
  68. Params size (MB): 0.154972
  69. Estimated Total Size (MB): 0.443531
  70. ----------------------------------------------------------------

c)  训练模型

  1. # c) 第三步:训练模型
  2. # 准确率
  3. def accuracy(y_pred,y_true):
  4. y_pred = torch.where(y_pred>0.5,torch.ones_like(y_pred,dtype = torch.float32),torch.zeros_like(y_pred,dtype = torch.float32))
  5. acc = torch.mean(1-torch.abs(y_true-y_pred))
  6. return acc
  7. model.compile(loss_func = nn.BCELoss(),optimizer=torch.optim.Adagrad(model.parameters(),lr = 0.02),metrics_dict={"accuracy":accuracy})
  8. # 有时候模型训练过程中不收敛,需要多试几次
  9. dfhistory = model.fit(30,dl_train,dl_val=dl_test,log_step_freq= 200)
  10. out:
  11. #训练过程略
  12. 。。。。。。
  13. +-------+-------+----------+----------+--------------+
  14. | epoch | loss | accuracy | val_loss | val_accuracy |
  15. +-------+-------+----------+----------+--------------+
  16. | 30 | 0.509 | 0.741 | 0.672 | 0.635 |
  17. +-------+-------+----------+----------+--------------+
  18. ================================================================================2022-03-26 12:44:45
  19. Finished Training...

d)  评估模型

  1. # d) 第四步:评估模型
  2. %matplotlib inline
  3. %config InlineBackend.figure_format = 'svg'
  4. import matplotlib.pyplot as plt
  5. def plot_metric(dfhistory, metric):
  6. train_metrics = dfhistory[metric]
  7. val_metrics = dfhistory['val_'+metric]
  8. epochs = range(1, len(train_metrics) + 1)
  9. plt.plot(epochs, train_metrics, 'bo--')
  10. plt.plot(epochs, val_metrics, 'ro-')
  11. plt.title('Training and validation '+ metric)
  12. plt.xlabel("Epochs")
  13. plt.ylabel(metric)
  14. plt.legend(["train_"+metric, 'val_'+metric])
  15. plt.show()
  16. plot_metric(dfhistory,"loss")
  17. plot_metric(dfhistory,"accuracy")
  18. # 评估
  19. model.evaluate(dl_test)
  20. out:
  21. {'val_loss': 0.6723344844852034, 'val_accuracy': 0.6349082730424244}

e)  使用模型

  1. # e) 第五步:使用模型
  2. model.predict(dl_test)
  3. out:
  4. tensor([[0.2567],
  5. [0.2567],
  6. [0.2567],
  7. ...,
  8. [0.8605],
  9. [0.7361],
  10. [0.3952]])

f)  保存模型

  1. # f)第六步:保存模型
  2. print(model.state_dict().keys())
  3. # 保存模型参数
  4. torch.save(model.state_dict(), "./data/6-3_model_parameter.pkl")
  5. model_clone = Net()
  6. model_clone.load_state_dict(torch.load("./data/6-3_model_parameter.pkl"))
  7. model_clone.compile(loss_func = nn.BCELoss(),optimizer= torch.optim.Adagrad(model.parameters(),lr = 0.02),metrics_dict={"accuracy":accuracy})
  8. # 评估模型
  9. model_clone.evaluate(dl_test)
  10. out:
  11. odict_keys(['embedding.weight', 'conv.conv_1.weight', 'conv.conv_1.bias', 'conv.conv_2.weight', 'conv.conv_2.bias', 'dense.linear.weight', 'dense.linear.bias'])
  12. {'val_loss': 0.6818984377266586, 'val_accuracy': 0.6283229489038048}

6.4 时间序列数据建模流程范例

时间序列数据(time series data)是在不同时间上收集到的数据,用于所描述现象随时间变化的情况。这类数据反映了某一事物、现象等随时间的变化状态或程度。

时间序列的处理对经济、金融数据尤为重要。这里我们以某只股票的股价分析为例说明用pytorch进行时间序列数据建模的一般流程。

  1. #例6-4 时间序列数据建模流程范例
  2. import os
  3. import datetime
  4. import importlib
  5. import torchkeras
  6. #打印时间
  7. def printbar():
  8. nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
  9. print("\n"+"=========="*8 + "%s"%nowtime)

我们可以从 tushare获取所需要的股票数据

Tushare是一个免费、开源的python财经数据接口包。主要实现对股票等金融数据从数据采集、清洗加工 到 数据存储的过程,能够为金融分析人员提供快速、整洁、和多样的便于分析的数据,为他们在数据获取方面极大地减轻工作量,使他们更加专注于策略和模型的研究与实现上。

Tushare -财经数据接口包

  1. #从 tushare 下载股票数据
  2. import tushare as ts
  3. import matplotlib.pyplot as plt
  4. df1 = ts.get_k_data('600104', ktype='D', start='2017-01-01', end='2022-03-25')
  5. #获取600104这只股票2017年1月1日至2022年3月25日的股票数据
  6. datapath1 = "data/stock/SH600104(20170101-20220325).csv"
  7. #保存为一个csv文件
  8. df1.to_csv(datapath1)

a) 准备数据

  1. # a) 第一步:准备数据
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib.pyplot as plt
  5. %matplotlib inline
  6. %config InlineBackend.figure_format = 'svg'
  7. df = pd.read_csv("./data/stock/SH600104(20170101-20220325).csv",sep = ",")
  8. df.plot(x = 'date',y = ["open","close"],figsize=(10,6))
  9. plt.xticks(rotation=60)
  10. out:

  1. from torch import nn
  2. #数据有很多,我们只取需要的前面几个数据:开盘价、收盘价、最高价、最低价、成交量
  3. dfdata = df.iloc[:, 1:7]
  4. print(dfdata)
  5. #dfdiff = dfdata.set_index("date")
  6. dfdata.plot(x = 'date',y = ["open","close"],figsize=(10,6))
  7. plt.xticks(rotation=60)
  8. dfdata = dfdata.drop("date",axis = 1).astype("float32")
  9. #对数据进行归一化,因为这些数据数量级差别比较大,归一化有利于训练
  10. dfguiyi = (dfdata-dfdata.min())/(dfdata.max()-dfdata.min())
  11. print(dfdata)
  12. print(dfguiyi)
  13. out:
  14. date open close high low volume
  15. 0 2017-01-03 17.34 17.66 18.07 17.34 368556.0
  16. 1 2017-01-04 17.74 18.06 18.27 17.66 335319.0
  17. 2 2017-01-05 18.15 17.82 18.15 17.72 208593.0
  18. 3 2017-01-06 17.81 17.68 17.93 17.55 229796.0
  19. 4 2017-01-09 17.70 17.93 17.99 17.69 258683.0
  20. ... ... ... ... ... ... ...
  21. 1266 2022-03-21 17.60 17.19 17.65 17.11 310405.0
  22. 1267 2022-03-22 17.15 17.31 17.37 17.09 195424.0
  23. 1268 2022-03-23 17.31 17.21 17.41 17.13 208347.0
  24. 1269 2022-03-24 17.11 17.12 17.22 17.06 152451.0
  25. 1270 2022-03-25 17.14 16.98 17.21 16.91 219721.0
  26. [1271 rows x 6 columns]
  27. open close high low volume
  28. 0 17.340000 17.660000 18.070000 17.340000 368556.0
  29. 1 17.740000 18.059999 18.270000 17.660000 335319.0
  30. 2 18.150000 17.820000 18.150000 17.719999 208593.0
  31. 3 17.809999 17.680000 17.930000 17.549999 229796.0
  32. 4 17.700001 17.930000 17.990000 17.690001 258683.0
  33. ... ... ... ... ... ...
  34. 1266 17.600000 17.190001 17.650000 17.110001 310405.0
  35. 1267 17.150000 17.309999 17.370001 17.090000 195424.0
  36. 1268 17.309999 17.209999 17.410000 17.129999 208347.0
  37. 1269 17.110001 17.120001 17.219999 17.059999 152451.0
  38. 1270 17.139999 16.980000 17.209999 16.910000 219721.0
  39. [1271 rows x 5 columns]
  40. open close high low volume
  41. 0 0.063307 0.087719 0.099580 0.083895 0.172035
  42. 1 0.087892 0.111918 0.111578 0.103490 0.153485
  43. 2 0.113092 0.097399 0.104379 0.107165 0.082761
  44. 3 0.092194 0.088929 0.091182 0.096754 0.094594
  45. 4 0.085433 0.104053 0.094781 0.105328 0.110715
  46. ... ... ... ... ... ...
  47. 1266 0.079287 0.059286 0.074385 0.069810 0.139581
  48. 1267 0.051629 0.066546 0.057589 0.068585 0.075411
  49. 1268 0.061463 0.060496 0.059988 0.071035 0.082623
  50. 1269 0.049170 0.055052 0.048590 0.066748 0.051428
  51. 1270 0.051014 0.046582 0.047990 0.057563 0.088971
  52. [1271 rows x 5 columns]
  53. dfguiyi.head()
  54. out:
  55. open close high low volume
  56. 0 0.063307 0.087719 0.099580 0.083895 0.172035
  57. 1 0.087892 0.111918 0.111578 0.103490 0.153485
  58. 2 0.113092 0.097399 0.104379 0.107165 0.082761
  59. 3 0.092194 0.088929 0.091182 0.096754 0.094594
  60. 4 0.085433 0.104053 0.094781 0.105328 0.110715

制作适合pytorch训练用的数据集

  1. #制作数据集
  2. import torch
  3. from torch import nn
  4. from torch.utils.data import Dataset,DataLoader,TensorDataset
  5. #用某日前60天的数据作为输入,当日数据作为标签
  6. day_size = 60
  7. class stockDataset(Dataset):
  8. def __len__(self):
  9. return len(dfguiyi) - day_size
  10. def __getitem__(self,i):
  11. x = dfguiyi.iloc[i:i+day_size-1,:]
  12. feature = torch.tensor(x.values)
  13. y = dfguiyi.iloc[i+day_size,:]
  14. label = torch.tensor(y.values)
  15. return (feature,label)
  16. ds_train = stockDataset()
  17. #用DataLoader制作训练数据集,batch设为20
  18. dl_train = DataLoader(ds_train,batch_size = 20)
  19. #查看一下数据集数据
  20. for a,b in dl_train:
  21. print(a,b)
  22. break
  23. out:
  24. tensor([[[0.0633, 0.0877, 0.0996, 0.0839, 0.1720],
  25. [0.0879, 0.1119, 0.1116, 0.1035, 0.1535],
  26. [0.1131, 0.0974, 0.1044, 0.1072, 0.0828],
  27. ...,
  28. [0.1309, 0.1337, 0.1278, 0.1298, 0.0826],
  29. [0.1334, 0.1385, 0.1326, 0.1384, 0.0903],
  30. [0.1371, 0.1779, 0.1686, 0.1574, 0.2161]],
  31. [[0.0879, 0.1119, 0.1116, 0.1035, 0.1535],
  32. [0.1131, 0.0974, 0.1044, 0.1072, 0.0828],
  33. [0.0922, 0.0889, 0.0912, 0.0968, 0.0946],
  34. ...,
  35. [0.1334, 0.1385, 0.1326, 0.1384, 0.0903],
  36. [0.1371, 0.1779, 0.1686, 0.1574, 0.2161],
  37. [0.1746, 0.1779, 0.1770, 0.1825, 0.2685]],
  38. [[0.1131, 0.0974, 0.1044, 0.1072, 0.0828],
  39. [0.0922, 0.0889, 0.0912, 0.0968, 0.0946],
  40. [0.0854, 0.1041, 0.0948, 0.1053, 0.1107],
  41. ...,
  42. [0.1371, 0.1779, 0.1686, 0.1574, 0.2161],
  43. [0.1746, 0.1779, 0.1770, 0.1825, 0.2685],
  44. [0.1887, 0.2015, 0.1980, 0.1972, 0.3619]],
  45. ...,
  46. [[0.1739, 0.1766, 0.1716, 0.1868, 0.0406],
  47. [0.1696, 0.1615, 0.1656, 0.1715, 0.0157],
  48. [0.1666, 0.1785, 0.1848, 0.1837, 0.1040],
  49. ...,
  50. [0.2797, 0.2928, 0.2873, 0.2976, 0.2026],
  51. [0.3024, 0.3200, 0.3305, 0.3190, 0.2307],
  52. [0.3110, 0.3394, 0.3257, 0.3037, 0.1949]],
  53. [[0.1696, 0.1615, 0.1656, 0.1715, 0.0157],
  54. [0.1666, 0.1785, 0.1848, 0.1837, 0.1040],
  55. [0.1764, 0.1827, 0.1854, 0.1947, 0.0446],
  56. ...,
  57. [0.3024, 0.3200, 0.3305, 0.3190, 0.2307],
  58. [0.3110, 0.3394, 0.3257, 0.3037, 0.1949],
  59. [0.3405, 0.2910, 0.3263, 0.2915, 0.2281]],
  60. [[0.1666, 0.1785, 0.1848, 0.1837, 0.1040],
  61. [0.1764, 0.1827, 0.1854, 0.1947, 0.0446],
  62. [0.1746, 0.1658, 0.1662, 0.1696, 0.0811],
  63. ...,
  64. [0.3110, 0.3394, 0.3257, 0.3037, 0.1949],
  65. [0.3405, 0.2910, 0.3263, 0.2915, 0.2281],
  66. [0.2735, 0.2898, 0.2813, 0.2756, 0.0963]]]) tensor([[0.1887, 0.2015, 0.1980, 0.1972, 0.3619],
  67. [0.1942, 0.1960, 0.1914, 0.1972, 0.1444],
  68. [0.1967, 0.1791, 0.1866, 0.1929, 0.1737],
  69. [0.1746, 0.1863, 0.1758, 0.1813, 0.1802],
  70. [0.1758, 0.1924, 0.1872, 0.1911, 0.1506],
  71. [0.1881, 0.1821, 0.1806, 0.1966, 0.0773],
  72. [0.1789, 0.1803, 0.1710, 0.1886, 0.0965],
  73. [0.1746, 0.1900, 0.1806, 0.1898, 0.0883],
  74. [0.1838, 0.2160, 0.2148, 0.2033, 0.3752],
  75. [0.2127, 0.2220, 0.2172, 0.2205, 0.2398],
  76. [0.2194, 0.2849, 0.2873, 0.2382, 0.5093],
  77. [0.2803, 0.2880, 0.2801, 0.2866, 0.2703],
  78. [0.2895, 0.2898, 0.2813, 0.2719, 0.2978],
  79. [0.2797, 0.2928, 0.2873, 0.2976, 0.2026],
  80. [0.3024, 0.3200, 0.3305, 0.3190, 0.2307],
  81. [0.3110, 0.3394, 0.3257, 0.3037, 0.1949],
  82. [0.3405, 0.2910, 0.3263, 0.2915, 0.2281],
  83. [0.2735, 0.2898, 0.2813, 0.2756, 0.0963],
  84. [0.2926, 0.3261, 0.3245, 0.2988, 0.1541],
  85. [0.3227, 0.3285, 0.3221, 0.3221, 0.0866]])

b) 定义模型

  1. #b) 第二步: 定义模型
  2. import torch
  3. from torch import nn
  4. import importlib
  5. import torchkeras
  6. torch.random.seed()
  7. class Net(nn.Module):
  8. def __init__(self):
  9. super(Net, self).__init__()
  10. # 3层lstm,输入的是5个数据所以input_size为5
  11. self.lstm = nn.LSTM(input_size = 5,hidden_size = 20,num_layers = 2,batch_first = True)
  12. self.linear = nn.Linear(20,5)
  13. def forward(self,x_input):
  14. x = self.lstm(x_input)[0][:,-1,:]
  15. y = self.linear(x)
  16. return y
  17. net = Net()
  18. model = torchkeras.Model(net)
  19. print(model)
  20. model.summary(input_shape=(60,5),input_dtype = torch.FloatTensor)
  21. out:
  22. Model(
  23. (net): Net(
  24. (lstm): LSTM(5, 20, num_layers=2, batch_first=True)
  25. (linear): Linear(in_features=20, out_features=5, bias=True)
  26. )
  27. )
  28. ----------------------------------------------------------------
  29. Layer (type) Output Shape Param #
  30. ================================================================
  31. LSTM-1 [-1, 60, 20] 5,520
  32. Linear-2 [-1, 5] 105
  33. ================================================================
  34. Total params: 5,625
  35. Trainable params: 5,625
  36. Non-trainable params: 0
  37. ----------------------------------------------------------------
  38. Input size (MB): 0.001144
  39. Forward/backward pass size (MB): 0.009193
  40. Params size (MB): 0.021458
  41. Estimated Total Size (MB): 0.031796
  42. ----------------------------------------------------------------

c)  训练模型

  1. #c) 第三步:训练模型
  2. #定义一个损失函数
  3. def mspe(y_pred,y_true):
  4. err_percent = (y_true - y_pred)**2/(torch.max(y_true**2,torch.tensor(1e-7)))
  5. return torch.mean(err_percent)
  6. model.compile(loss_func = mspe,optimizer = torch.optim.Adagrad(model.parameters(),lr = 0.01))
  7. dfhistory = model.fit(100,dl_train,log_step_freq=10)
  8. out:
  9. #训练过程略
  10. 。。。。。。
  11. +-------+-------+
  12. | epoch | loss |
  13. +-------+-------+
  14. | 100 | 0.924 |
  15. +-------+-------+
  16. ================================================================================2022-03-27 16:55:23
  17. Finished Training...
  1. #使用dfresult记录数据
  2. dfresult = dfguiyi[["open","close","high","low","volume"]].copy()
  3. dfresult.tail()
  4. #定义一个反归一化函数
  5. def fanguiyi(df,dfdata):
  6. result=df*(dfdata.max()-dfdata.min())+dfdata.min()
  7. return result
  8. #反归一化,得到股价数据
  9. dfgujia = fanguiyi(dfresult,dfdata)
  10. print(dfgujia)
  11. out:
  12. open close high low volume
  13. 0 17.340000 17.660000 18.070000 17.340000 368556.0
  14. 1 17.740000 18.059999 18.270000 17.660000 335319.0
  15. 2 18.150000 17.820000 18.150000 17.719999 208593.0
  16. 3 17.809999 17.680000 17.930000 17.549999 229796.0
  17. 4 17.700001 17.930000 17.990000 17.690001 258683.0
  18. ... ... ... ... ... ...
  19. 1266 17.600000 17.190001 17.650000 17.110001 310405.0
  20. 1267 17.150000 17.309999 17.370001 17.090000 195424.0
  21. 1268 17.309999 17.209999 17.410000 17.129999 208347.0
  22. 1269 17.110001 17.120001 17.219999 17.059999 152451.0
  23. 1270 17.139999 16.980000 17.209999 16.910000 219721.0
  24. [1271 rows x 5 columns]

d) 评估模型

  1. #d) 第四步:评估模型
  2. %matplotlib inline
  3. %config InlineBackend.figure_format = 'svg'
  4. import matplotlib.pyplot as plt
  5. def plot_metric(dfhistory, metric):
  6. train_metrics = dfhistory[metric]
  7. epochs = range(1, len(train_metrics) + 1)
  8. plt.plot(epochs, train_metrics, 'bo--')
  9. plt.title('Training '+ metric)
  10. plt.xlabel("Epochs")
  11. plt.ylabel(metric)
  12. plt.legend(["train_"+metric])
  13. plt.show()
  14. plot_metric(dfhistory,"loss")
  15. out:
  16. #输出图略

e)  使用模型

  1. # e) 使用模型
  2. #预测此后10天的股价走势,将其结果添加到dfresult中
  3. for i in range(10):
  4. arr_input = torch.unsqueeze(torch.from_numpy(dfresult.values[-60:,:]),axis=0)
  5. arr_predict = model.forward(arr_input)
  6. dfpredict = pd.DataFrame(arr_predict.data.numpy(),columns = dfresult.columns)
  7. dfresult = dfresult.append(dfpredict,ignore_index=True)
  8. dfgujia = fanguiyi(dfresult,dfdata)
  9. print(dfgujia)
  10. out:
  11. open close high low volume
  12. 0 17.340000 17.660000 18.070000 17.340000 368556.000000
  13. 1 17.740000 18.059999 18.270000 17.660000 335319.000000
  14. 2 18.150000 17.820000 18.150000 17.719999 208593.000000
  15. 3 17.809999 17.680000 17.930000 17.549999 229796.000000
  16. 4 17.700001 17.930000 17.990000 17.690001 258683.000000
  17. ... ... ... ... ... ...
  18. 1276 16.181501 15.864144 16.290152 15.571153 189594.421875
  19. 1277 16.147591 15.849412 16.265371 15.561488 191846.609375
  20. 1278 16.118704 15.836608 16.244844 15.553810 193725.046875
  21. 1279 16.094416 15.825689 16.228113 15.547682 195277.750000
  22. 1280 16.074209 15.816536 16.214634 15.542784 196551.828125
  23. [1281 rows x 5 columns]

呃,这个股价走势,还真让人伤心啊,赶紧清仓跑路吧

免责声明:影响股票价格的因素太多太多,这里只是一个循环网络的案例演示,不构成投资建议啊,大家不要拿着这个去做股票,亏了钱我不负责。

f)  保存模型

  1. f) 第六步:保存模型
  2. print(model.net.state_dict().keys())
  3. # 保存模型参数
  4. torch.save(model.net.state_dict(), "./data/6-4_model_parameter.pkl")
  5. net_clone = Net()
  6. net_clone.load_state_dict(torch.load("./data/6-4_model_parameter.pkl"))
  7. model_clone = torchkeras.Model(net_clone)
  8. model_clone.compile(loss_func = mspe)
  9. # 评估模型
  10. model_clone.evaluate(dl_train)
  11. out:
  12. odict_keys(['lstm.weight_ih_l0', 'lstm.weight_hh_l0', 'lstm.bias_ih_l0', 'lstm.bias_hh_l0', 'lstm.weight_ih_l1', 'lstm.weight_hh_l1', 'lstm.bias_ih_l1', 'lstm.bias_hh_l1', 'linear.weight', 'linear.bias'])
  13. {'val_loss': 0.9112453353209574}

7、其他

7.1 TensorBoard可视化

        TensorBoard是一个可视化辅助工具。它原是TensorFlow的小弟,但它也能够很好地和Pytorch进行配合。甚至在Pytorch中使用TensorBoard比TensorFlow中使用 TensorBoard还要来的更加简单和自然。

        Pytorch中利用TensorBoard可视化的大概过程如下:

        首先在Pytorch中指定一个目录创建一个torch.utils.tensorboard.SummaryWriter日志写入器。

        然后根据需要可视化的信息,利用日志写入器将相应信息日志写入我们指定的目录。

        最后就可以传入日志目录作为参数启动TensorBoard,然后就可以在TensorBoard中查看了。

        Pytorch中利用TensorBoard进行信息的可视化的方法: 

        可视化模型结构: writer.add_graph

        可视化指标变化: writer.add_scalar

        可视化参数分布: writer.add_histogram

        可视化原始图像: writer.add_image 或 writer.add_images

        可视化人工绘图: writer.add_figure

7.1.1 可视化模型结构

  1. #例7-1-1 TensorBoard可视化模型结构
  2. import torch
  3. from torch import nn
  4. from torch.utils.tensorboard import SummaryWriter
  5. from torchkeras import Model,summary
  6. class Net(nn.Module):
  7. def __init__(self):
  8. super(Net, self).__init__()
  9. self.conv1 = nn.Conv2d(in_channels=3,out_channels=32,kernel_size = 3)
  10. self.pool = nn.MaxPool2d(kernel_size = 2,stride = 2)
  11. self.conv2 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5)
  12. self.dropout = nn.Dropout2d(p = 0.1)
  13. self.adaptive_pool = nn.AdaptiveMaxPool2d((1,1))
  14. self.flatten = nn.Flatten()
  15. self.linear1 = nn.Linear(64,32)
  16. self.relu = nn.ReLU()
  17. self.linear2 = nn.Linear(32,1)
  18. self.sigmoid = nn.Sigmoid()
  19. def forward(self,x):
  20. x = self.conv1(x)
  21. x = self.pool(x)
  22. x = self.conv2(x)
  23. x = self.pool(x)
  24. x = self.dropout(x)
  25. x = self.adaptive_pool(x)
  26. x = self.flatten(x)
  27. x = self.linear1(x)
  28. x = self.relu(x)
  29. x = self.linear2(x)
  30. y = self.sigmoid(x)
  31. return y
  32. net = Net()
  33. print(net)
  34. out:
  35. Net(
  36. (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
  37. (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  38. (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  39. (dropout): Dropout2d(p=0.1, inplace=False)
  40. (adaptive_pool): AdaptiveMaxPool2d(output_size=(1, 1))
  41. (flatten): Flatten(start_dim=1, end_dim=-1)
  42. (linear1): Linear(in_features=64, out_features=32, bias=True)
  43. (relu): ReLU()
  44. (linear2): Linear(in_features=32, out_features=1, bias=True)
  45. (sigmoid): Sigmoid()
  46. )
  47. summary(net,input_shape= (3,32,32))
  48. out:
  49. ----------------------------------------------------------------
  50. Layer (type) Output Shape Param #
  51. ================================================================
  52. Conv2d-1 [-1, 32, 30, 30] 896
  53. MaxPool2d-2 [-1, 32, 15, 15] 0
  54. Conv2d-3 [-1, 64, 11, 11] 51,264
  55. MaxPool2d-4 [-1, 64, 5, 5] 0
  56. Dropout2d-5 [-1, 64, 5, 5] 0
  57. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  58. Flatten-7 [-1, 64] 0
  59. Linear-8 [-1, 32] 2,080
  60. ReLU-9 [-1, 32] 0
  61. Linear-10 [-1, 1] 33
  62. Sigmoid-11 [-1, 1] 0
  63. ================================================================
  64. Total params: 54,273
  65. Trainable params: 54,273
  66. Non-trainable params: 0
  67. ----------------------------------------------------------------
  68. Input size (MB): 0.011719
  69. Forward/backward pass size (MB): 0.359634
  70. Params size (MB): 0.207035
  71. Estimated Total Size (MB): 0.578388
  72. ----------------------------------------------------------------
  73. writer = SummaryWriter('./data/tensorboard')
  74. writer.add_graph(net,input_to_model = torch.rand(1,3,32,32))
  75. writer.close()
  76. %load_ext tensorboard
  77. #%tensorboard --logdir logs/fit --port=6007
  78. #%tensorboard --logdir ./data/tensorboard
  79. from tensorboard import notebook
  80. #查看启动的tensorboard程序
  81. notebook.list()
  82. #启动tensorboard程序
  83. notebook.start("--logdir ./data/tensorboard")
  84. #等价于在命令行中执行 tensorboard --logdir ./data/tensorboard
  85. #可以在浏览器中打开 http://localhost:6006/ 查看
  86. out:

7.1.2 可视化指标变化

        有时候在训练过程中,如果能够实时动态地查看loss和各种metric的变化曲线,那么无疑可以帮助我们更加直观地了解模型的训练情况。

        注意,writer.add_scalar仅能对标量的值的变化进行可视化。因此它一般用于对loss和metric的变化进行可视化分析。

  1. #例7-1-2 可视化指标变化
  2. import numpy as np
  3. import torch
  4. from torch.utils.tensorboard import SummaryWriter
  5. # f(x) = a*x**2 + b*x + c的最小值
  6. x = torch.tensor(0.0,requires_grad = True) # x需要被求导
  7. a = torch.tensor(1.0)
  8. b = torch.tensor(-2.0)
  9. c = torch.tensor(1.0)
  10. optimizer = torch.optim.SGD(params=[x],lr = 0.01)
  11. def f(x):
  12. result = a*torch.pow(x,2) + b*x + c
  13. return(result)
  14. writer = SummaryWriter('./data/tensorboard')
  15. for i in range(500):
  16. optimizer.zero_grad()
  17. y = f(x)
  18. y.backward()
  19. optimizer.step()
  20. writer.add_scalar("x",x.item(),i) #日志中记录x在第step i 的值
  21. writer.add_scalar("y",y.item(),i) #日志中记录y在第step i 的值
  22. writer.close()
  23. print("y=",f(x).data,";","x=",x.data)
  24. out:
  25. y= tensor(0.) ; x= tensor(1.0000)
  26. 图略

7.1.3 可视化参数分布

        如果需要对模型的参数(一般非标量)在训练过程中的变化进行可视化,可以使用 writer.add_histogram。 它能够观测张量值分布的直方图随训练步骤的变化趋势。

  1. #例 7-1-3  可视化参数分布
  2. import numpy as np
  3. import torch
  4. from torch.utils.tensorboard import SummaryWriter
  5. # 创建正态分布的张量模拟参数矩阵
  6. def norm(mean,std):
  7. t = std*torch.randn((100,20))+mean
  8. return t
  9. writer = SummaryWriter('./data/tensorboard')
  10. for step,mean in enumerate(range(-10,10,1)):
  11. w = norm(mean,1)
  12. writer.add_histogram("w",w, step)
  13. writer.flush()
  14. writer.close()

7.1.4 可视化原始图像

如果我们做图像相关的任务,也可以将原始的图片在tensorboard中进行可视化展示。

如果只写入一张图片信息,可以使用writer.add_image。 如果要写入多张图片信息,可以使用writer.add_images。 也可以用 torchvision.utils.make_grid将多张图片拼成一张图片,然后用writer.add_image写入。 注意,传入的是代表图片信息的Pytorch中的张量数据。

  1. #例7-1-4  可视化原始图像
  2. import torch
  3. import torchvision
  4. from torch import nn
  5. from torch.utils.data import Dataset,DataLoader
  6. from torchvision import transforms,datasets
  7. transform_train = transforms.Compose(
  8. [transforms.ToTensor()])
  9. transform_valid = transforms.Compose(
  10. [transforms.ToTensor()])
  11. ds_train = datasets.ImageFolder("./data/animal/train/",
  12. transform = transform_train,target_transform= lambda
  13. t:torch.tensor([t]).float())
  14. ds_valid = datasets.ImageFolder("./data/animal/test/",
  15. transform = transform_train,target_transform= lambda
  16. t:torch.tensor([t]).float())
  17. print(ds_train.class_to_idx)
  18. dl_train = DataLoader(ds_train,batch_size = 50,shuffle = True,num_workers=3)
  19. dl_valid = DataLoader(ds_valid,batch_size = 50,shuffle = True,num_workers=3)
  20. dl_train_iter = iter(dl_train)
  21. images, labels = dl_train_iter.next()
  22. # 仅查看一张图片
  23. writer = SummaryWriter('./data/tensorboard')
  24. writer.add_image('images[0]', images[0])
  25. writer.close()
  26. # 将多张图片拼接成一张图片,中间用黑色网格分割
  27. writer = SummaryWriter('./data/tensorboard')
  28. # create grid of images
  29. img_grid = torchvision.utils.make_grid(images)
  30. writer.add_image('image_grid', img_grid)
  31. writer.close()
  32. # 将多张图片直接写入
  33. writer = SummaryWriter('./data/tensorboard')
  34. writer.add_images("images",images,global_step = 0)
  35. writer.close()

7.1.5 可视化人工绘图

如果我们将matplotlib绘图的结果再 tensorboard中展示,可以使用 add_figure. 注意,和writer.add_image不同的是,writer.add_figure需要传入matplotlib的figure对象。

  1. #例7-1-5  可视化人工绘图
  2. import torch
  3. import torchvision
  4. from torch import nn
  5. from torch.utils.data import Dataset,DataLoader
  6. from torchvision import transforms,datasets
  7. transform_train = transforms.Compose(
  8. [transforms.ToTensor()])
  9. transform_valid = transforms.Compose(
  10. [transforms.ToTensor()])
  11. ds_train = datasets.ImageFolder("./data/animal/train/",
  12. transform = transform_train,target_transform= lambda
  13. t:torch.tensor([t]).float())
  14. ds_valid = datasets.ImageFolder("./data/animal/test/",
  15. transform = transform_train,target_transform= lambda
  16. t:torch.tensor([t]).float())
  17. print(ds_train.class_to_idx)
  18. out:
  19. {'bird': 0, 'car': 1, 'cat': 2, 'deer': 3, 'dog': 4, 'plane': 5}
  20. %matplotlib inline
  21. %config InlineBackend.figure_format = 'svg'
  22. from matplotlib import pyplot as plt
  23. figure = plt.figure(figsize=(8,8))
  24. for i in range(9):
  25. img,label = ds_train[i]
  26. img = img.permute(1,2,0)
  27. ax=plt.subplot(3,3,i+1)
  28. ax.imshow(img.numpy())
  29. ax.set_title("label = %d"%label.item())
  30. ax.set_xticks([])
  31. ax.set_yticks([])
  32. plt.show()
  33. out:
  34. 图略
  35. writer = SummaryWriter('./data/tensorboard')
  36. writer.add_figure('figure',figure,global_step=0)
  37. writer.close()

7.2  使用GPU训练模型

        深度学习的训练过程常常非常耗时,一个模型训练几个小时是家常便饭,训练几天也是常有的事 情,有时候甚至要训练几十天。

        训练过程的耗时主要来自于两个部分,一部分来自数据准备,另一部分来自参数迭代。

        当数据准备过程还是模型训练时间的主要瓶颈时,我们可以使用更多进程来准备数据。

        当参数迭代过程成为训练时间的主要瓶颈时,我们通常的方法是应用GPU来进行加速。

        Pytorch中使用GPU加速模型非常简单,只要将模型和数据移动到GPU上。

核心代码如下:

  1. # 定义模型
  2. ...
  3. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  4. model.to(device) # 移动模型到cuda
  5. # 训练模型
  6. ...
  7. features = features.to(device) # 移动数据到cuda
  8. labels = labels.to(device) # 或者 labels = labels.cuda() if
  9. torch.cuda.is_available() else labels

        如果要使用多个GPU训练模型,也非常简单。只需要在将模型设置为数据并行风格模型。 则模 型移动到GPU上之后,会在每一个GPU上拷贝一个副本,并把数据平分到各个GPU上进行训练。 核心代码如下。

  1. # 定义模型
  2. ...
  3. if torch.cuda.device_count() > 1:
  4. model = nn.DataParallel(model) # 包装为并行风格模型
  5. # 训练模型
  6. ...
  7. features = features.to(device) # 移动数据到cuda
  8. labels = labels.to(device) # 或者 labels = labels.cuda() if
  9. torch.cuda.is_available() else labels
  10. ...

以下是一些和GPU有关的基本操作汇总 在Colab笔记本中:修改->笔记本设置->硬件加速器 中选择 GPU 

7.2.1 pytorch中关于GPU的基本操作 

  1. #例7-2-1 GPU基本操作
  2. import torch
  3. from torch import nn
  4. # 1,查看gpu信息
  5. if_cuda = torch.cuda.is_available()
  6. print("if_cuda=",if_cuda)
  7. gpu_count = torch.cuda.device_count()
  8. print("gpu_count=",gpu_count)
  9. out:
  10. if_cuda= True
  11. gpu_count= 1
  12. # 2,将张量在gpu和cpu间移动
  13. tensor = torch.rand((100,100))
  14. tensor_gpu = tensor.to("cuda:0") # 或者 tensor_gpu = tensor.cuda()
  15. print(tensor_gpu.device)
  16. print(tensor_gpu.is_cuda)
  17. tensor_cpu = tensor_gpu.to("cpu") # 或者 tensor_cpu = tensor_gpu.cpu()
  18. print(tensor_cpu.device)
  19. out:
  20. cuda:0
  21. True
  22. cpu
  23. # 3,将模型中的全部张量移动到gpu上
  24. net = nn.Linear(2,1)
  25. print(next(net.parameters()).is_cuda)
  26. net.to("cuda:0") # 将模型中的全部参数张量依次到GPU上,注意,无需重新赋值为 net =
  27. net.to("cuda:0")
  28. print(next(net.parameters()).is_cuda)
  29. print(next(net.parameters()).device)
  30. out:
  31. False
  32. True
  33. cuda:0
  34. # 4,创建支持多个gpu数据并行的模型
  35. linear = nn.Linear(2,1)
  36. print(next(linear.parameters()).device)
  37. model = nn.DataParallel(linear)
  38. print(model.device_ids)
  39. print(next(model.module.parameters()).device)
  40. #注意保存参数时要指定保存model.module的参数
  41. torch.save(model.module.state_dict(), "./data/7-2-1_model_parameter.pkl")
  42. linear = nn.Linear(2,1)
  43. linear.load_state_dict(torch.load("./data/7-2-1_model_parameter.pkl"))
  44. out:
  45. cpu
  46. [0]
  47. cuda:0
  48. # 5,清空cuda缓存
  49. # 该方法在cuda超内存时十分有用
  50. torch.cuda.empty_cache()

7.2.2 用GPU训练矩阵乘法

下面分别使用CPU和GPU运算一个矩阵乘法,并比较其计算效率。

  1. #例7-2-2 矩阵乘法示例
  2. import time
  3. import torch
  4. from torch import nn
  5. # 使用cpu
  6. a = torch.rand((10000,200))
  7. b = torch.rand((200,10000))
  8. tic = time.time()
  9. c = torch.matmul(a,b)
  10. toc = time.time()
  11. print(toc-tic)
  12. print(a.device)
  13. print(b.device)
  14. out:
  15. 0.38155651092529297
  16. cpu
  17. cpu
  18. # 使用gpu
  19. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  20. a = torch.rand((10000,200),device = device) #可以指定在GPU上创建张量
  21. b = torch.rand((200,10000)) #也可以在CPU上创建张量后移动到GPU上
  22. b = b.to(device) #或者 b = b.cuda() if torch.cuda.is_available() else b
  23. tic = time.time()
  24. c = torch.matmul(a,b)
  25. toc = time.time()
  26. print(toc-tic)
  27. print(a.device)
  28. print(b.device)
  29. out:
  30. 0.09093642234802246
  31. cuda:0
  32. cuda:0

7.2.3 torchkeras.Model使用单GPU示例

          以下示例使用torchkeras.Model来应用GPU训练模型的方法,其实就是在model.compile时指定 device即可。

  1. #例7-2-3 torchkeras.Model使用单GPU示例
  2. import torch
  3. from torch import nn
  4. import torchvision
  5. from torchvision import transforms
  6. import torchkeras
  7. #准备数据
  8. transform = transforms.Compose([transforms.ToTensor()])
  9. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  10. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  11. dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128,shuffle=True, num_workers=4)
  12. dl_valid = torch.utils.data.DataLoader(ds_valid, batch_size=128,shuffle=False, num_workers=4)
  13. print(len(ds_train))
  14. print(len(ds_valid))
  15. out:
  16. 60000
  17. 10000
  18. %matplotlib inline
  19. %config InlineBackend.figure_format = 'svg'
  20. #查看部分样本
  21. from matplotlib import pyplot as plt
  22. plt.figure(figsize=(8,8))
  23. for i in range(9):
  24. img,label = ds_train[i]
  25. img = torch.squeeze(img)
  26. ax=plt.subplot(3,3,i+1)
  27. ax.imshow(img.numpy())
  28. ax.set_title("label = %d"%label)
  29. ax.set_xticks([])
  30. ax.set_yticks([])
  31. plt.show()
  32. out:
  33. <Figure size 576x576 with 9 Axes>
  34. #定义模型
  35. class CnnModel(nn.Module):
  36. def __init__(self):
  37. super().__init__()
  38. self.layers = nn.ModuleList([
  39. nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  40. nn.MaxPool2d(kernel_size = 2,stride = 2),
  41. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  42. nn.MaxPool2d(kernel_size = 2,stride = 2),
  43. nn.Dropout2d(p = 0.1),
  44. nn.AdaptiveMaxPool2d((1,1)),
  45. nn.Flatten(),
  46. nn.Linear(64,32),
  47. nn.ReLU(),
  48. nn.Linear(32,10)]
  49. )
  50. def forward(self,x):
  51. for layer in self.layers:
  52. x = layer(x)
  53. return x
  54. net = CnnModel()
  55. model = torchkeras.Model(net)
  56. model.summary(input_shape=(1,32,32))
  57. out:
  58. ----------------------------------------------------------------
  59. Layer (type) Output Shape Param #
  60. ================================================================
  61. Conv2d-1 [-1, 32, 30, 30] 320
  62. MaxPool2d-2 [-1, 32, 15, 15] 0
  63. Conv2d-3 [-1, 64, 11, 11] 51,264
  64. MaxPool2d-4 [-1, 64, 5, 5] 0
  65. Dropout2d-5 [-1, 64, 5, 5] 0
  66. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  67. Flatten-7 [-1, 64] 0
  68. Linear-8 [-1, 32] 2,080
  69. ReLU-9 [-1, 32] 0
  70. Linear-10 [-1, 10] 330
  71. ================================================================
  72. Total params: 53,994
  73. Trainable params: 53,994
  74. Non-trainable params: 0
  75. ----------------------------------------------------------------
  76. Input size (MB): 0.003906
  77. Forward/backward pass size (MB): 0.359695
  78. Params size (MB): 0.205971
  79. Estimated Total Size (MB): 0.569572
  80. ----------------------------------------------------------------
  81. #训练模型
  82. from sklearn.metrics import accuracy_score
  83. def accuracy(y_pred,y_true):
  84. y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  85. return accuracy_score(y_true.cpu().numpy(),y_pred_cls.cpu().numpy())
  86. # 注意此处要将数据先移动到cpu上,然后才能转换成numpy数组
  87. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  88. model.compile(loss_func = nn.CrossEntropyLoss(),
  89. optimizer= torch.optim.SGD(model.parameters(),lr = 0.02),
  90. metrics_dict={"accuracy":accuracy},device = device) # 注意此处compile时指定了device
  91. dfhistory = model.fit(3,dl_train = dl_train, dl_val=dl_valid,log_step_freq=100)
  92. out:
  93. #训练过程略
  94. 。。。。。。
  95. +-------+-------+----------+----------+--------------+
  96. | epoch | loss | accuracy | val_loss | val_accuracy |
  97. +-------+-------+----------+----------+--------------+
  98. | 3 | 0.315 | 0.911 | 0.196 | 0.944 |
  99. +-------+-------+----------+----------+--------------+
  100. ================================================================================2022-03-29 13:12:16
  101. Finished Training...
  102. #评估模型
  103. %matplotlib inline
  104. %config InlineBackend.figure_format = 'svg'
  105. import matplotlib.pyplot as plt
  106. def plot_metric(dfhistory, metric):
  107. train_metrics = dfhistory[metric]
  108. val_metrics = dfhistory['val_'+metric]
  109. epochs = range(1, len(train_metrics) + 1)
  110. plt.plot(epochs, train_metrics, 'bo--')
  111. plt.plot(epochs, val_metrics, 'ro-')
  112. plt.title('Training and validation '+ metric)
  113. plt.xlabel("Epochs")
  114. plt.ylabel(metric)
  115. plt.legend(["train_"+metric, 'val_'+metric])
  116. plt.show()
  117. plot_metric(dfhistory,"loss")
  118. plot_metric(dfhistory,"accuracy")
  119. model.evaluate(dl_valid)
  120. out:
  121. #训练图略
  122. {'val_loss': 0.1961400840855852, 'val_accuracy': 0.9440268987341772}
  123. #使用模型
  124. model.predict(dl_valid)[0:10]
  125. out:
  126. tensor([[-3.0589, 0.5913, 2.9332, 1.9190, -0.4598, -0.9629, -7.2336, 10.3429,
  127. -4.1452, 1.7773],
  128. [-1.1582, -4.3752, 6.6064, 1.4136, 0.7964, -1.7493, 0.3708, 2.5375,
  129. -3.1236, 1.1933],
  130. [-0.2913, 7.6946, -0.5591, -3.4781, 1.8302, -3.5013, -0.2864, 0.8970,
  131. -1.4603, -1.6221],
  132. [ 7.8830, -3.4793, 3.0594, -3.5699, -0.7118, -1.5213, 3.4239, -5.2019,
  133. 1.5311, -1.1321],
  134. [-0.5955, -0.0648, 2.1269, -3.1870, 7.0876, -3.2265, -1.4909, 1.1239,
  135. -3.2128, 2.2488],
  136. [ 0.2199, 8.2553, -1.2405, -5.6407, 2.7766, -4.2880, -0.8283, 2.3202,
  137. -2.1941, -0.5836],
  138. [-1.4086, -2.2477, 2.0612, -1.8534, 6.2273, -1.8892, -1.8793, 1.4700,
  139. -2.0889, 3.3411],
  140. [-3.4105, -6.1126, 2.9690, 1.4873, 2.9547, 1.4985, -1.3198, 0.5871,
  141. 0.7181, 5.7719],
  142. [-1.7162, -5.8831, 4.2534, 0.3954, 1.3270, 3.6996, -0.8770, 1.2742,
  143. -1.1617, 3.4104],
  144. [-2.5201, -5.2390, 2.0075, 0.6812, 0.6092, 0.8451, -2.8697, 3.6119,
  145. 0.7925, 6.0172]])
  146. #保存模型
  147. # save the model parameters
  148. torch.save(model.state_dict(), "data/7-2-3_model_parameter.pkl")
  149. model_clone = torchkeras.Model(CnnModel())
  150. model_clone.load_state_dict(torch.load("data/7-2-3_model_parameter.pkl"))
  151. model_clone.compile(loss_func = nn.CrossEntropyLoss(),
  152. optimizer= torch.optim.Adam(model.parameters(),lr = 0.02),
  153. metrics_dict={"accuracy":accuracy},device = device) # 注意此处compile时指定了device
  154. model_clone.evaluate(dl_valid)
  155. out:
  156. {'val_loss': 0.1961400840855852, 'val_accuracy': 0.9440268987341772}

7.2.4 torchkeras.Model使用多GPU示例

        以下示例需要在有多个GPU的机器上跑。如果在单GPU的机器上跑,也能跑通,但是实际上 使用的是单个GPU。

  1. #例7-2-4 torchkeras.Model使用多GPU示例
  2. #准备数据
  3. import torch
  4. from torch import nn
  5. import torchvision
  6. from torchvision import transforms
  7. import torchkeras
  8. transform = transforms.Compose([transforms.ToTensor()])
  9. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  10. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  11. dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128,shuffle=True, num_workers=4)
  12. dl_valid = torch.utils.data.DataLoader(ds_valid, batch_size=128,shuffle=False, num_workers=4)
  13. print(len(ds_train))
  14. print(len(ds_valid))
  15. out:
  16. 60000
  17. 10000
  18. #定义模型
  19. class CnnModel(nn.Module):
  20. def __init__(self):
  21. super().__init__()
  22. self.layers = nn.ModuleList([
  23. nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  24. nn.MaxPool2d(kernel_size = 2,stride = 2),
  25. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  26. nn.MaxPool2d(kernel_size = 2,stride = 2),
  27. nn.Dropout2d(p = 0.1),
  28. nn.AdaptiveMaxPool2d((1,1)),
  29. nn.Flatten(),
  30. nn.Linear(64,32),
  31. nn.ReLU(),
  32. nn.Linear(32,10)]
  33. )
  34. def forward(self,x):
  35. for layer in self.layers:
  36. x = layer(x)
  37. return x
  38. net = CnnModel()
  39. model = torchkeras.Model(net)
  40. model.summary(input_shape=(1,32,32))
  41. out:
  42. ----------------------------------------------------------------
  43. Layer (type) Output Shape Param #
  44. ================================================================
  45. Conv2d-1 [-1, 32, 30, 30] 320
  46. MaxPool2d-2 [-1, 32, 15, 15] 0
  47. Conv2d-3 [-1, 64, 11, 11] 51,264
  48. MaxPool2d-4 [-1, 64, 5, 5] 0
  49. Dropout2d-5 [-1, 64, 5, 5] 0
  50. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  51. Flatten-7 [-1, 64] 0
  52. Linear-8 [-1, 32] 2,080
  53. ReLU-9 [-1, 32] 0
  54. Linear-10 [-1, 10] 330
  55. ================================================================
  56. Total params: 53,994
  57. Trainable params: 53,994
  58. Non-trainable params: 0
  59. ----------------------------------------------------------------
  60. Input size (MB): 0.003906
  61. Forward/backward pass size (MB): 0.359695
  62. Params size (MB): 0.205971
  63. Estimated Total Size (MB): 0.569572
  64. ----------------------------------------------------------------
  65. #训练模型
  66. from sklearn.metrics import accuracy_score
  67. def accuracy(y_pred,y_true):
  68. y_pred_cls = torch.argmax(nn.Softmax(dim=1)(y_pred),dim=1).data
  69. return accuracy_score(y_true.cpu().numpy(),y_pred_cls.cpu().numpy()) # 注意此处要将数据先移动到cpu上,然后才能转换成numpy数组
  70. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  71. model.compile(loss_func = nn.CrossEntropyLoss(),
  72. optimizer= torch.optim.SGD(model.parameters(),lr = 0.02),
  73. metrics_dict={"accuracy":accuracy},device = device) # 注意此处compile时指定了device
  74. dfhistory = model.fit(3,dl_train = dl_train, dl_val=dl_valid,log_step_freq=100)
  75. out:
  76. #训练过程略
  77. 。。。。。。
  78. +-------+-------+----------+----------+--------------+
  79. | epoch | loss | accuracy | val_loss | val_accuracy |
  80. +-------+-------+----------+----------+--------------+
  81. | 3 | 0.328 | 0.908 | 0.202 | 0.945 |
  82. +-------+-------+----------+----------+--------------+
  83. ================================================================================2022-03-29 13:39:01
  84. Finished Training...
  85. #评估模型
  86. %matplotlib inline
  87. %config InlineBackend.figure_format = 'svg'
  88. import matplotlib.pyplot as plt
  89. def plot_metric(dfhistory, metric):
  90. train_metrics = dfhistory[metric]
  91. val_metrics = dfhistory['val_'+metric]
  92. epochs = range(1, len(train_metrics) + 1)
  93. plt.plot(epochs, train_metrics, 'bo--')
  94. plt.plot(epochs, val_metrics, 'ro-')
  95. plt.title('Training and validation '+ metric)
  96. plt.xlabel("Epochs")
  97. plt.ylabel(metric)
  98. plt.legend(["train_"+metric, 'val_'+metric])
  99. plt.show()
  100. plot_metric(dfhistory,"loss")
  101. plot_metric(dfhistory,"accuracy")
  102. model.evaluate(dl_valid)
  103. out:
  104. #训练效果图略
  105. {'val_loss': 0.20189599447612522, 'val_accuracy': 0.9454113924050633}
  106. #保存模型
  107. # save the model parameters
  108. torch.save(model.state_dict(), "data/7-2-4_model_parameter.pkl")
  109. model_clone = torchkeras.Model(CnnModel())
  110. model_clone.load_state_dict(torch.load("data/7-2-4_model_parameter.pkl"))
  111. model_clone.compile(loss_func = nn.CrossEntropyLoss(),
  112. optimizer= torch.optim.Adam(model.parameters(),lr = 0.02),
  113. metrics_dict={"accuracy":accuracy},device = device) # 注意此处compile时指定了device
  114. model_clone.evaluate(dl_valid)
  115. out:
  116. {'val_loss': 0.20189599447612522, 'val_accuracy': 0.9454113924050633}

7.2.5 torchkeras.LightModel使用GPU/TPU示例

  1. #例7-2-5 torchkeras.LightModel使用GPU/TPU示例
  2. #准备数据
  3. import torch
  4. from torch import nn
  5. import torchvision
  6. from torchvision import transforms
  7. import torchkeras
  8. transform = transforms.Compose([transforms.ToTensor()])
  9. ds_train = torchvision.datasets.MNIST(root="./data/minist/",train=True,download=True,transform=transform)
  10. ds_valid = torchvision.datasets.MNIST(root="./data/minist/",train=False,download=True,transform=transform)
  11. dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128,shuffle=True, num_workers=4)
  12. dl_valid = torch.utils.data.DataLoader(ds_valid, batch_size=128,shuffle=False, num_workers=4)
  13. print(len(ds_train))
  14. print(len(ds_valid))
  15. out:
  16. 60000
  17. 10000
  1. #定义模型
  2. import torchkeras
  3. import torchmetrics
  4. import pytorch_lightning as pl
  5. from sklearn.metrics import accuracy_score
  6. #定义一个CNN网络
  7. class CnnNet(nn.Module):
  8. def __init__(self):
  9. super().__init__()
  10. self.layers = nn.ModuleList([
  11. nn.Conv2d(in_channels=1,out_channels=32,kernel_size = 3),
  12. nn.MaxPool2d(kernel_size = 2,stride = 2),
  13. nn.Conv2d(in_channels=32,out_channels=64,kernel_size = 5),
  14. nn.MaxPool2d(kernel_size = 2,stride = 2),
  15. nn.Dropout2d(p = 0.1),
  16. nn.AdaptiveMaxPool2d((1,1)),
  17. nn.Flatten(),
  18. nn.Linear(64,32),
  19. nn.ReLU(),
  20. nn.Linear(32,10)]
  21. )
  22. def forward(self,x):
  23. for layer in self.layers:
  24. x = layer(x)
  25. return x
  26. #定义一个LightModel模型
  27. class Model(torchkeras.LightModel):
  28. def shared_step(self,batch)->dict:
  29. x, y = batch
  30. prediction = self(x)
  31. loss = nn.CrossEntropyLoss()(prediction,y)
  32. preds = torch.argmax(nn.Softmax(dim=1)(prediction),dim=1).data
  33. acc = accuracy_score(preds.cpu(), y.cpu())
  34. dic = {"loss":loss,"acc":acc}
  35. return dic
  36. def configure_optimizers(self):
  37. self=self.cuda()
  38. optimizer = torch.optim.Adam(self.parameters(), lr=1e-2)
  39. lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,step_size=10, gamma=0.0001)
  40. return {"optimizer":optimizer,"lr_scheduler":lr_scheduler}
  41. pl.seed_everything(8888)
  42. net = CnnNet()
  43. model = Model(net)
  44. torchkeras.summary(model,input_shape=(1,32,32))
  45. print(model)
  46. out:
  47. Global seed set to 8888
  48. ----------------------------------------------------------------
  49. Layer (type) Output Shape Param #
  50. ================================================================
  51. Conv2d-1 [-1, 32, 30, 30] 320
  52. MaxPool2d-2 [-1, 32, 15, 15] 0
  53. Conv2d-3 [-1, 64, 11, 11] 51,264
  54. MaxPool2d-4 [-1, 64, 5, 5] 0
  55. Dropout2d-5 [-1, 64, 5, 5] 0
  56. AdaptiveMaxPool2d-6 [-1, 64, 1, 1] 0
  57. Flatten-7 [-1, 64] 0
  58. Linear-8 [-1, 32] 2,080
  59. ReLU-9 [-1, 32] 0
  60. Linear-10 [-1, 10] 330
  61. ================================================================
  62. Total params: 53,994
  63. Trainable params: 53,994
  64. Non-trainable params: 0
  65. ----------------------------------------------------------------
  66. Input size (MB): 0.003906
  67. Forward/backward pass size (MB): 0.359695
  68. Params size (MB): 0.205971
  69. Estimated Total Size (MB): 0.569572
  70. ----------------------------------------------------------------
  71. Model(
  72. (net): CnnNet(
  73. (layers): ModuleList(
  74. (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  75. (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  76. (2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
  77. (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  78. (4): Dropout2d(p=0.1, inplace=False)
  79. (5): AdaptiveMaxPool2d(output_size=(1, 1))
  80. (6): Flatten(start_dim=1, end_dim=-1)
  81. (7): Linear(in_features=64, out_features=32, bias=True)
  82. (8): ReLU()
  83. (9): Linear(in_features=32, out_features=10, bias=True)
  84. )
  85. )
  86. )
  1. #训练模型
  2. ckpt_cb = pl.callbacks.ModelCheckpoint(monitor='val_loss')
  3. # set gpus=0 will use cpu,
  4. # set gpus=1 will use 1 gpu
  5. # set gpus=2 will use 2gpus
  6. # set gpus = -1 will use all gpus
  7. # you can also set gpus = [0,1] to use the given gpus
  8. # you can even set tpu_cores=2 to use two tpus
  9. trainer = pl.Trainer(max_epochs=10,gpus = 1, callbacks=[ckpt_cb])
  10. trainer.fit(model,dl_train,dl_valid)
  11. out:
  12. #训练过程略
  13. #评估模型
  14. import pandas as pd
  15. history = model.history
  16. dfhistory = pd.DataFrame(history)
  17. dfhistory
  18. out:
  19. val_loss val_acc loss acc epoch
  20. 0 0.110953 0.966574 0.314087 0.898299 0
  21. 1 0.086179 0.972805 0.111056 0.966251 1
  22. 2 0.074870 0.976068 0.098939 0.970471 2
  23. 3 0.099406 0.972607 0.092866 0.972920 3
  24. 4 0.060055 0.981705 0.082293 0.976601 4
  25. 5 0.077828 0.977947 0.072885 0.978428 5
  26. 6 0.062522 0.983782 0.077495 0.977951 6
  27. 7 0.060598 0.982991 0.069507 0.979800 7
  28. 8 0.106252 0.975672 0.077156 0.978478 8
  29. 9 0.061882 0.982892 0.068871 0.980821 9
  30. %matplotlib inline
  31. %config InlineBackend.figure_format = 'svg'
  32. import matplotlib.pyplot as plt
  33. def plot_metric(dfhistory, metric):
  34. train_metrics = dfhistory[metric]
  35. val_metrics = dfhistory['val_'+metric]
  36. epochs = range(1, len(train_metrics) + 1)
  37. plt.plot(epochs, train_metrics, 'bo--')
  38. plt.plot(epochs, val_metrics, 'ro-')
  39. plt.title('Training and validation '+ metric)
  40. plt.xlabel("Epochs")
  41. plt.ylabel(metric)
  42. plt.legend(["train_"+metric, 'val_'+metric])
  43. plt.show()
  44. plot_metric(dfhistory,"loss")
  45. plot_metric(dfhistory,"acc")
  46. results = trainer.test(model, test_dataloaders=dl_valid, verbose = False)
  47. print(results[0])
  48. out:
  49. #图片略
  50. Testing: 100%|█████████████████████████████████████████████████████████████████████████| 79/79 [00:02<00:00, 26.77it/s]
  51. {'test_loss': 0.06257420778274536, 'test_acc': 0.9827}
  52. #使用模型
  53. def predict(model,dl):
  54. model.eval()
  55. preds = torch.cat([model.forward(t[0].to(model.device)) for t in dl])
  56. result = torch.argmax(nn.Softmax(dim=1)(preds),dim=1).data
  57. return(result.data)
  58. result = predict(model,dl_valid)
  59. result
  60. out:
  61. tensor([7, 2, 1, ..., 4, 5, 6])
  62. #保存模型
  63. print(ckpt_cb.best_model_score)
  64. model.load_from_checkpoint(ckpt_cb.best_model_path)
  65. best_net = model.net
  66. torch.save(best_net.state_dict(),"./data/7-2-5_net.pt")
  67. net_clone = CnnNet()
  68. net_clone.load_state_dict(torch.load("./data/7-2-5_net.pt"))
  69. model_clone = Model(net_clone)
  70. trainer = pl.Trainer()
  71. result = trainer.test(model_clone,test_dataloaders=dl_valid, verbose = False)
  72. print(result)
  73. out:
  74. GPU available: True, used: False
  75. TPU available: False, using: 0 TPU cores
  76. IPU available: False, using: 0 IPUs
  77. tensor(0.0607, device='cuda:0')
  78. Testing: 100%|█████████████████████████████████████████████████████████████████████████| 79/79 [00:04<00:00, 17.98it/s]
  79. [{'test_loss': 0.06257420033216476, 'test_acc': 0.9827}]

7.3  本笔记所用到的资料链接

7.3.1 data文件夹(数据集)

数据集是深度学习的资料基础,一些常见数据集网上可以收集到,我在这里做了一下整理,都放在data 文件夹下,其中一些数据集我做了适合自己训练的处理,可能跟网上直接下载的不太一样。

包括 animal 数据集、cifar10数据集、douban数据集、minist数据集、Titanic数据集、stock数据、tensorboard看板的文件、各个练习保存的pkl文件等等

链接:https://pan.baidu.com/s/17BVjAqZEZsAcdQhJ7Yyb3A 
提取码:sspa 
--来自百度网盘超级会员V4的分享

7.3.2 pytorchStudy文件夹(代码)

这里整理了各个章节所示例的代码,都是ipynb格式, 因为jupyter适合代码分段测试,很方便演示,所以整个笔记的代码都是用jupyter来实现的。

链接:https://pan.baidu.com/s/1i1jTkU4xm29pw-tyNVwSQw 
提取码:mltf 
--来自百度网盘超级会员V4的分享

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/671093
推荐阅读
相关标签
  

闽ICP备14008679号