赞
踩
学会使用深度学习框架是解决深度学习任务的基本能力,这里我们推荐成熟易用的国产开源框架PaddlePaddle。以下内容将带你你快速入门PaddlePaddle,你可以尝试跑通一个小demo来熟悉PaddlePaddle的基本命令。
PaddlePaddle是百度开源的深度学习框架,类似的深度学习框架还有谷歌的Tensorflow、Facebook的Pytorch等,在入门深度学习时,学会并使用一门常见的框架,可以让学习效率大大提升。在PaddlePaddle中,计算的对象是张量,我们可以先使用PaddlePaddle来计算一个[[1, 1], [1, 1]] * [[1, 1], [1, 1]]。
首先导入PaddlePaddle库
In [1]
- import paddle
-
- paddle.__version__
'2.0.0'
定义两个张量的常量x1和x2,并指定它们的形状是[2, 2],并赋值为1铺满整个张量,类型为int64.
In [2]
- # 定义两个张量
- x1 = paddle.ones([2,2], dtype='int64')
- x2 = paddle.ones([2,2], dtype='int64')
接着定义一个操作,该计算是将上面两个张量进行加法计算,并返回一个求和的算子。PaddlePaddle提供了大量的操作,比如加减乘除、三角函数等。
In [3]
- # 将两个张量求和
- y1 = paddle.add(x1, x2)
-
- # 查看结果
- print(y1)
- Tensor(shape=[2, 2], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
- [[2, 2],
- [2, 2]])
在上面的教学中,教大家学会用PaddlePaddle做基本的算子运算,下面来教大家如何用PaddlePaddle来做简单的线性回归,包括从定义网络到使用自定义的数据进行训练,最后验证我们网络的预测能力。
首先导入PaddlePaddle库和一些工具类库。
In [4]
- import paddle
- import numpy as np
-
- paddle.__version__
'2.0.0'
我们使用numpy定义一组数据,这组数据的每一条数据有13个,为了做示例,其中除了第一个数外都填充了0。这组数据是符合y = 2 * x + 1,但是程序是不知道的,我们之后使用这组数据进行训练,看看强大的神经网络是否能够训练出一个拟合这个函数的模型。最后定义了一个预测数据,是在训练完成,使用这个数据作为x输入,看是否能够预测于正确值相近结果。
In [5]
- # 定义训练和测试数据
- x_data = np.array([[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
- [2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
- [3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
- [4.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
- [5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]).astype('float32')
- y_data = np.array([[3.0], [5.0], [7.0], [9.0], [11.0]]).astype('float32')
- test_data = np.array([[6.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]).astype('float32')
定义一个简单的线性网络,这个网络非常简单,结构是:
[输入层] --> [隐层] --> [激活函数] --> [输出层]
更具体的就是一个输出大小为100的全连接层、之后接激活函数ReLU和一个输出大小为1的全连接层,就这样构建了一个非常简单的网络。
这里定义输入层的形状为13,这是因为波士顿房价数据集的每条数据有13个属性,我们之后自定义的数据集也是为了符合这一个维度。
In [6]
- # 定义一个简单的线性网络
- net = paddle.nn.Sequential(
- paddle.nn.Linear(13, 100),
- paddle.nn.ReLU(),
- paddle.nn.Linear(100, 1)
- )
接着是定义训练使用的优化方法,这里使用的是随机梯度下降优化方法。PaddlePaddle提供了大量的优化函数接口,除了本项目使用的随机梯度下降法(SGD),还有Momentum、Adagrad、Adagrad等等,读者可以更加自己项目的需求使用不同的优化方法。
In [8]
- # 定义优化方法
- optimizer = paddle.optimizer.SGD(learning_rate=0.01, parameters=net.parameters())
我们可以使用数据进行训练了,我们这次训练了10个pass,读者可根据情况设置更多的训练轮数,通常来说训练的次数和模型收敛有一定的关系。
因为本项目是一个线性回归任务,所以我们在训练的时候使用的是平方差损失函数。因为paddle.nn.functional.square_error_cost
求的是一个Batch的损失值,所以我们还要对他求一个平均值。PaddlePaddle提供了很多的损失函数的接口,比如交叉熵损失函数paddle.nn.CrossEntropyLoss
。
在训练过程中,我们可以看到输出的损失值在不断减小,证明我们的模型在不断收敛。
In [12]
- # 将numpy类型数据转换成tensor之后才能用于模型训练
- inputs = paddle.to_tensor(x_data)
- labels = paddle.to_tensor(y_data)
-
- # 开始训练100个pass
- for pass_id in range(10):
- out = net(inputs)
- loss = paddle.mean(paddle.nn.functional.square_error_cost(out, labels))
-
- loss.backward()
- optimizer.step()
- optimizer.clear_grad()
-
- print("Pass:%d, Cost:%0.5f" % (pass_id, loss))
- Pass:0, Cost:0.02406
- Pass:1, Cost:0.02354
- Pass:2, Cost:0.02302
- Pass:3, Cost:0.02252
- Pass:4, Cost:0.02202
- Pass:5, Cost:0.02154
- Pass:6, Cost:0.02107
- Pass:7, Cost:0.02061
- Pass:8, Cost:0.02016
- Pass:9, Cost:0.01972
训练完成之后,我们使用上面克隆主程序得到的预测程序了预测我们刚才定义的预测数据。根据我们上面定义数据时,满足规律y = 2 * x + 1,所以当x为6时,y应该时13,最后输出的结果也是应该接近13的。
In [14]
- # 开始预测
- predict_inputs = paddle.to_tensor(test_data)
- result = net(predict_inputs)
-
- print("当x为6.0时,y为:%0.5f" % result)
当x为6.0时,y为:13.21323
数据集共506行,每行14列。前13列用来描述房屋的各种信息,最后一列为该类房屋价格中位数。
PaddlePaddle提供了读取uci_housing数据集的接口,paddle.text.datasets.UCIHousing
。
PaddlePaddle中使用paddle.io.DataLoader
来进行数据的加载操作,通过参数batch_size控制批次大小,shuffle控制是否打乱顺序。
In [15]
- # 导入基本的库
- import os
- import paddle
- import numpy as np
-
-
- BATCH_SIZE=20
-
- train_dataset = paddle.text.datasets.UCIHousing(mode='train')
- valid_dataset = paddle.text.datasets.UCIHousing(mode='test')
-
- #用于训练的数据加载器,每次随机读取批次大小的数据,剩余不满足批大小的数据丢弃
- train_loader = paddle.io.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)
-
- #用于测试的数据加载器,每次随机读取批次大小的数据
- valid_loader = paddle.io.DataLoader(valid_dataset, batch_size=BATCH_SIZE, shuffle=True)
- Cache file /home/aistudio/.cache/paddle/dataset/uci_housing/housing.data not found, downloading http://paddlemodels.bj.bcebos.com/uci_housing/housing.data
- Begin to download
- ............
- Download finished
打印查看uci_housing数据
In [16]
- #用于打印,查看uci_housing数据
- print(train_dataset[0])
- (array([-0.0405441 , 0.06636363, -0.32356226, -0.06916996, -0.03435197,
- 0.05563625, -0.03475696, 0.02682186, -0.37171334, -0.21419305,
- -0.33569506, 0.10143217, -0.21172912], dtype=float32), array([24.], dtype=float32))
(1)网络搭建:对于线性回归来讲,它就是一个从输入到输出的简单的全连接层。
对于波士顿房价数据集,假设属性和房价之间的关系可以被属性间的线性组合描述。
In [17]
- # 输入数据形状为[13],输出形状[1]
- net = paddle.nn.Linear(13, 1)
(2)定义损失函数
此处使用均方差损失函数。
square_error_cost(input,lable):接受输入预测值和目标值,并返回方差估计,即为(y-y_predict)的平方
(3)定义优化函数
此处使用的是随机梯度下降。
In [18]
optimizer = paddle.optimizer.SGD(learning_rate=0.001, parameters=net.parameters())
(1)定义绘制训练过程的损失值变化趋势的方法draw_train_process
In [31]
- import matplotlib.pyplot as plt
-
-
- iter = 0
- iters = []
- train_costs = []
-
- def draw_train_process(iters, train_costs):
- title="training cost"
- plt.title(title, fontsize=24)
- plt.xlabel("iter", fontsize=14)
- plt.ylabel("cost", fontsize=14)
- plt.plot(iters, train_costs, color='red', label='training cost')
- plt.grid()
- plt.show()
(2)训练并保存模型
遍历轮次和数据集loader,将批次数据送入net里面进行计算,最终经过loss计算,在进行反向传播和参数优化。
注:enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标,
In [37]
- EPOCH_NUM=50
-
- #训练EPOCH_NUM轮
- for pass_id in range(EPOCH_NUM):
- # 开始训练并输出最后一个batch的损失值
- train_cost = 0
-
- #遍历train_reader迭代器
- for batch_id, data in enumerate(train_loader()):
- inputs = paddle.to_tensor(data[0])
- labels = paddle.to_tensor(data[1])
- out = net(inputs)
- train_loss = paddle.mean(paddle.nn.functional.square_error_cost(out, labels))
- train_loss.backward()
- optimizer.step()
- optimizer.clear_grad()
-
- #打印最后一个batch的损失值
- if batch_id % 40 == 0:
- print("Pass:%d, Cost:%0.5f" % (pass_id, train_loss))
-
- iter = iter + BATCH_SIZE
- iters.append(iter)
- train_costs.append(train_loss.numpy()[0])
-
- # 开始测试并输出最后一个batch的损失值
- test_loss = 0
-
- #遍历test_reader迭代器
- for batch_id, data in enumerate(valid_loader()):
- inputs = paddle.to_tensor(data[0])
- labels = paddle.to_tensor(data[1])
- out = net(inputs)
- test_loss = paddle.mean(paddle.nn.functional.square_error_cost(out, labels))
-
- #打印最后一个batch的损失值
- print('Test:%d, Cost:%0.5f' % (pass_id, test_loss))
-
- #保存模型
- paddle.save(net.state_dict(), 'fit_a_line.pdparams')
-
- draw_train_process(iters,train_costs)
Pass:0, Cost:15.57779 Test:0, Cost:4.89339 Pass:1, Cost:48.41581 Test:1, Cost:21.15764 Pass:2, Cost:29.67388 Test:2, Cost:92.65568 Pass:3, Cost:41.30642 Test:3, Cost:46.82151 Pass:4, Cost:15.56921 Test:4, Cost:8.24148 Pass:5, Cost:18.54000 Test:5, Cost:0.87809 Pass:6, Cost:9.46092 Test:6, Cost:8.06008 Pass:7, Cost:24.51844 Test:7, Cost:0.95794 Pass:8, Cost:18.74012 Test:8, Cost:12.82451 Pass:9, Cost:18.47519 Test:9, Cost:57.29168 Pass:10, Cost:18.08232 Test:10, Cost:10.81169 Pass:11, Cost:7.42231 Test:11, Cost:2.60274 Pass:12, Cost:17.55565 Test:12, Cost:0.99522 Pass:13, Cost:16.62274 Test:13, Cost:14.18472 Pass:14, Cost:18.56692 Test:14, Cost:10.12059 Pass:15, Cost:86.13505 Test:15, Cost:8.95990 Pass:16, Cost:14.57046 Test:16, Cost:3.64157 Pass:17, Cost:62.77230 Test:17, Cost:51.83986 Pass:18, Cost:23.19917 Test:18, Cost:10.84731 Pass:19, Cost:18.04765 Test:19, Cost:0.85779 Pass:20, Cost:73.51922 Test:20, Cost:20.57847 Pass:21, Cost:23.24675 Test:21, Cost:8.72573 Pass:22, Cost:17.37405 Test:22, Cost:0.15808 Pass:23, Cost:46.25659 Test:23, Cost:8.15246 Pass:24, Cost:20.21064 Test:24, Cost:2.24031 Pass:25, Cost:22.79083 Test:25, Cost:6.86873 Pass:26, Cost:9.23403 Test:26, Cost:14.55947 Pass:27, Cost:61.32742 Test:27, Cost:2.94818 Pass:28, Cost:24.68878 Test:28, Cost:9.47826 Pass:29, Cost:23.85690 Test:29, Cost:5.67390 Pass:30, Cost:47.65511 Test:30, Cost:4.89907 Pass:31, Cost:36.04547 Test:31, Cost:4.19894 Pass:32, Cost:32.32786 Test:32, Cost:45.13474 Pass:33, Cost:28.32195 Test:33, Cost:4.24918 Pass:34, Cost:51.95972 Test:34, Cost:56.99276 Pass:35, Cost:31.54094 Test:35, Cost:2.11291 Pass:36, Cost:17.42282 Test:36, Cost:21.94140 Pass:37, Cost:19.15705 Test:37, Cost:10.25884 Pass:38, Cost:18.17450 Test:38, Cost:63.03344 Pass:39, Cost:27.46397 Test:39, Cost:1.55113 Pass:40, Cost:12.79984 Test:40, Cost:4.21451 Pass:41, Cost:25.78148 Test:41, Cost:29.08631 Pass:42, Cost:28.93359 Test:42, Cost:86.07359 Pass:43, Cost:23.41241 Test:43, Cost:8.75794 Pass:44, Cost:47.23532 Test:44, Cost:4.14107 Pass:45, Cost:22.02799 Test:45, Cost:10.58521 Pass:46, Cost:44.48309 Test:46, Cost:2.41765 Pass:47, Cost:17.67356 Test:47, Cost:1.28649 Pass:48, Cost:25.02098 Test:48, Cost:5.62093 Pass:49, Cost:45.29571 Test:49, Cost:16.49093
<Figure size 432x288 with 1 Axes>
(1)可视化真实值与预测值方法定义
In [1]
- infer_results = []
- groud_truths = []
-
- #绘制真实值和预测值对比图
- def draw_infer_result(groud_truths, infer_results):
- title='Boston'
- plt.title(title, fontsize=24)
- x = np.arange(1,20)
- y = x
- plt.plot(x, y)
- plt.xlabel('ground truth', fontsize=14)
- plt.ylabel('infer result', fontsize=14)
- plt.scatter(groud_truths, infer_results, color='green',label='training cost')
- plt.grid()
- plt.show()
(2)开始预测
通过paddle.load加载已经训练好的模型,来对从未遇见过的数据进行预测。
In [6]
- import paddle
- import numpy as np
- import matplotlib.pyplot as plt
-
-
- valid_dataset = paddle.text.UCIHousing(mode='test')
- infer_loader = paddle.io.DataLoader(valid_dataset, batch_size=200)
-
- infer_net = paddle.nn.Linear(13, 1)
- param = paddle.load('fit_a_line.pdparams')
- infer_net.set_dict(param)
-
-
- data = next(infer_loader())
- inputs = paddle.to_tensor(data[0])
- results = infer_net(inputs)
-
- for idx, item in enumerate(zip(results, data[1])):
- print("Index:%d, Infer Result: %.2f, Ground Truth: %.2f" % (idx, item[0], item[1]))
- infer_results.append(item[0].numpy()[0])
- groud_truths.append(item[1].numpy()[0])
-
- draw_infer_result(groud_truths, infer_results)
Index:0, Infer Result: 11.26, Ground Truth: 8.50 Index:1, Infer Result: 13.48, Ground Truth: 5.00 Index:2, Infer Result: 10.24, Ground Truth: 11.90 Index:3, Infer Result: 18.45, Ground Truth: 27.90 Index:4, Infer Result: 12.69, Ground Truth: 17.20 Index:5, Infer Result: 17.57, Ground Truth: 27.50 Index:6, Infer Result: 17.02, Ground Truth: 15.00 Index:7, Infer Result: 15.50, Ground Truth: 17.20 Index:8, Infer Result: 4.74, Ground Truth: 17.90 Index:9, Infer Result: 13.01, Ground Truth: 16.30 Index:10, Infer Result: 2.98, Ground Truth: 7.00 Index:11, Infer Result: 11.15, Ground Truth: 7.20 Index:12, Infer Result: 13.49, Ground Truth: 7.50 Index:13, Infer Result: 9.90, Ground Truth: 10.40 Index:14, Infer Result: 12.47, Ground Truth: 8.80 Index:15, Infer Result: 14.93, Ground Truth: 8.40 Index:16, Infer Result: 18.76, Ground Truth: 16.70 Index:17, Infer Result: 17.51, Ground Truth: 14.20 Index:18, Infer Result: 17.32, Ground Truth: 20.80 Index:19, Infer Result: 12.84, Ground Truth: 13.40 Index:20, Infer Result: 14.07, Ground Truth: 11.70 Index:21, Infer Result: 11.56, Ground Truth: 8.30 Index:22, Infer Result: 15.51, Ground Truth: 10.20 Index:23, Infer Result: 16.20, Ground Truth: 10.90 Index:24, Infer Result: 14.25, Ground Truth: 11.00 Index:25, Infer Result: 13.36, Ground Truth: 9.50 Index:26, Infer Result: 16.39, Ground Truth: 14.50 Index:27, Infer Result: 16.72, Ground Truth: 14.10 Index:28, Infer Result: 18.97, Ground Truth: 16.10 Index:29, Infer Result: 16.78, Ground Truth: 14.30 Index:30, Infer Result: 16.53, Ground Truth: 11.70 Index:31, Infer Result: 14.34, Ground Truth: 13.40 Index:32, Infer Result: 15.52, Ground Truth: 9.60 Index:33, Infer Result: 11.17, Ground Truth: 8.70 Index:34, Infer Result: 7.99, Ground Truth: 8.40 Index:35, Infer Result: 14.02, Ground Truth: 12.80 Index:36, Infer Result: 14.57, Ground Truth: 10.50 Index:37, Infer Result: 17.19, Ground Truth: 17.10 Index:38, Infer Result: 18.07, Ground Truth: 18.40 Index:39, Infer Result: 17.71, Ground Truth: 15.40 Index:40, Infer Result: 13.09, Ground Truth: 10.80 Index:41, Infer Result: 13.25, Ground Truth: 11.80 Index:42, Infer Result: 17.33, Ground Truth: 14.90 Index:43, Infer Result: 18.01, Ground Truth: 12.60 Index:44, Infer Result: 17.26, Ground Truth: 14.10 Index:45, Infer Result: 16.82, Ground Truth: 13.00 Index:46, Infer Result: 16.38, Ground Truth: 13.40 Index:47, Infer Result: 18.37, Ground Truth: 15.20 Index:48, Infer Result: 17.85, Ground Truth: 16.10 Index:49, Infer Result: 20.72, Ground Truth: 17.80 Index:50, Infer Result: 15.72, Ground Truth: 14.90 Index:51, Infer Result: 15.85, Ground Truth: 14.10 Index:52, Infer Result: 13.73, Ground Truth: 12.70 Index:53, Infer Result: 14.29, Ground Truth: 13.50 Index:54, Infer Result: 17.30, Ground Truth: 14.90 Index:55, Infer Result: 18.24, Ground Truth: 20.00 Index:56, Infer Result: 18.27, Ground Truth: 16.40 Index:57, Infer Result: 19.10, Ground Truth: 17.70 Index:58, Infer Result: 19.14, Ground Truth: 19.50 Index:59, Infer Result: 21.11, Ground Truth: 20.20 Index:60, Infer Result: 19.28, Ground Truth: 21.40 Index:61, Infer Result: 17.39, Ground Truth: 19.90 Index:62, Infer Result: 14.58, Ground Truth: 19.00 Index:63, Infer Result: 15.43, Ground Truth: 19.10 Index:64, Infer Result: 16.49, Ground Truth: 19.10 Index:65, Infer Result: 17.63, Ground Truth: 20.10 Index:66, Infer Result: 18.14, Ground Truth: 19.90 Index:67, Infer Result: 19.81, Ground Truth: 19.60 Index:68, Infer Result: 19.70, Ground Truth: 23.20 Index:69, Infer Result: 22.16, Ground Truth: 29.80 Index:70, Infer Result: 15.39, Ground Truth: 13.80 Index:71, Infer Result: 14.64, Ground Truth: 13.30 Index:72, Infer Result: 18.19, Ground Truth: 16.70 Index:73, Infer Result: 12.23, Ground Truth: 12.00 Index:74, Infer Result: 17.56, Ground Truth: 14.60 Index:75, Infer Result: 19.79, Ground Truth: 21.40 Index:76, Infer Result: 20.60, Ground Truth: 23.00 Index:77, Infer Result: 23.14, Ground Truth: 23.70 Index:78, Infer Result: 24.21, Ground Truth: 25.00 Index:79, Infer Result: 19.22, Ground Truth: 21.80 Index:80, Infer Result: 18.18, Ground Truth: 20.60 Index:81, Infer Result: 20.38, Ground Truth: 21.20 Index:82, Infer Result: 18.23, Ground Truth: 19.10 Index:83, Infer Result: 19.39, Ground Truth: 20.60 Index:84, Infer Result: 13.52, Ground Truth: 15.20 Index:85, Infer Result: 10.66, Ground Truth: 7.00 Index:86, Infer Result: 7.29, Ground Truth: 8.10 Index:87, Infer Result: 14.80, Ground Truth: 13.60 Index:88, Infer Result: 16.70, Ground Truth: 20.10 Index:89, Infer Result: 21.15, Ground Truth: 21.80 Index:90, Infer Result: 21.09, Ground Truth: 24.50 Index:91, Infer Result: 18.57, Ground Truth: 23.10 Index:92, Infer Result: 16.33, Ground Truth: 19.70 Index:93, Infer Result: 20.22, Ground Truth: 18.30 Index:94, Infer Result: 21.54, Ground Truth: 21.20 Index:95, Infer Result: 19.47, Ground Truth: 17.50 Index:96, Infer Result: 20.90, Ground Truth: 16.80 Index:97, Infer Result: 22.76, Ground Truth: 22.40 Index:98, Infer Result: 21.90, Ground Truth: 20.60 Index:99, Infer Result: 25.55, Ground Truth: 23.90 Index:100, Infer Result: 24.60, Ground Truth: 22.00 Index:101, Infer Result: 22.05, Ground Truth: 11.90
- /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
- return list(data) if isinstance(data, collections.MappingView) else data
<Figure size 432x288 with 1 Axes>
到这里你已经学会了PaddlePaddle的基本命令和第一个小例子!恭喜你已经入门啦~ 如果想学习更多入门内容欢迎查看AI Studio 上的更多内容,有精选项目,也有优质课程,加油哦!
帮助文档 飞桨官方文档:https://www.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html
Github中飞桨高层API板块:https://github.com/PaddlePaddle/hapi
初步使用Github的方法,例如使用star保存代码库:https://docs.github.com/en/github/getting-started-with-github/saving-repositories-with-stars
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。