赞
踩
模型训练:
下面展示模型训练的代码。
这里用到的是线性回归模型最常用的损失函数–均方误差(MSE),用来衡量模型预测的房价和真实房价的差异。
对损失函数进行优化所采用的方法是梯度下降法.
- # 将训练数据集和测试数据集按照8:2的比例分开
- ratio = 0.8
- offset = int(housing_data.shape[0] * ratio)
- train_data = housing_data[:offset]
- test_data = housing_data[offset:]
-
-
- import paddle.nn.functional as F
- y_preds = []
- labels_list = []
-
- def train(model):
- print('start training ... ')
- # 开启模型训练模式
- model.train()
- EPOCH_NUM = 500
- train_num = 0
- optimizer = paddle.optimizer.SGD(learning_rate=0.001, parameters=model.parameters())
- for epoch_id in range(EPOCH_NUM):
- # 在每轮迭代开始之前,将训练数据的顺序随机的打乱
- np.random.shuffle(train_data)
- # 将训练数据进行拆分,每个batch包含20条数据
- mini_batches = [train_data[k: k+BATCH_SIZE] for k in range(0, len(train_data), BATCH_SIZE)]
- for batch_id, data in enumerate(mini_batches):
- features_np = np.array(data[:, :13], np.float32)
- labels_np = np.array(data[:, -1:], np.float32)
- features = paddle.to_tensor(features_np)
- labels = paddle.to_tensor(labels_np)
- # 前向计算
- y_pred = model(features)
- cost = F.mse_loss(y_pred, label=labels)
- train_cost = cost.numpy()[0]
- # 反向传播
- cost.backward()
- # 最小化loss,更新参数
- optimizer.step()
- # 清除梯度
- optimizer.clear_grad()
-
- if batch_id%30 == 0 and epoch_id%50 == 0:
- print("Pass:%d,Cost:%0.5f"%(epoch_id, train_cost))
-
- train_num = train_num + BATCH_SIZE
- train_nums.append(train_num)
- train_costs.append(train_cost)
-
- model = Regressor()
- train(model)
-
-
-
- matplotlib.use('TkAgg')
- # matplotlib inline
- draw_train_process(train_nums, train_costs)

如果你想成功运行这段代码,请参考我的paddle练习(一)种的开始数据集house.data加载部分代码。
运行结果:
- start training ...
- Pass:0,Cost:507.42090
- Pass:50,Cost:47.54215
- Pass:100,Cost:83.45570
- Pass:150,Cost:86.61785
- Pass:200,Cost:32.05870
- Pass:250,Cost:15.67683
- Pass:300,Cost:23.19898
- Pass:350,Cost:48.89576
- Pass:400,Cost:56.87611
- Pass:450,Cost:33.11672
然后进行模型预测
- # 获取预测数据
- INFER_BATCH_SIZE = 100
-
- infer_features_np = np.array([data[:13] for data in test_data]).astype("float32")
- infer_labels_np = np.array([data[-1] for data in test_data]).astype("float32")
-
- infer_features = paddle.to_tensor(infer_features_np)
- infer_labels = paddle.to_tensor(infer_labels_np)
- fetch_list = model(infer_features)
-
- sum_cost = 0
- for i in range(INFER_BATCH_SIZE):
- infer_result = fetch_list[i][0]
- ground_truth = infer_labels[i]
- if i % 10 == 0:
- print("No.%d: infer result is %.2f,ground truth is %.2f" % (i, infer_result, ground_truth))
- cost = paddle.pow(infer_result - ground_truth, 2)
- sum_cost += cost
- mean_loss = sum_cost / INFER_BATCH_SIZE
- print("Mean loss is:", mean_loss.numpy())

预测结果:
- No.0: infer result is 11.91,ground truth is 8.50
- No.10: infer result is 5.13,ground truth is 7.00
- No.20: infer result is 14.46,ground truth is 11.70
- No.30: infer result is 16.31,ground truth is 11.70
- No.40: infer result is 13.45,ground truth is 10.80
- No.50: infer result is 15.81,ground truth is 14.90
- No.60: infer result is 18.66,ground truth is 21.40
- No.70: infer result is 15.23,ground truth is 13.80
- No.80: infer result is 17.96,ground truth is 20.60
- No.90: infer result is 21.41,ground truth is 24.50
画图显示:
- def plot_pred_ground(pred, ground):
- plt.figure()
- plt.title("Predication v.s. Ground truth", fontsize=24)
- plt.xlabel("ground truth price(unit:$1000)", fontsize=14)
- plt.ylabel("predict price", fontsize=14)
- plt.scatter(ground, pred, alpha=0.5) # scatter:散点图,alpha:"透明度"
- plt.plot(ground, ground, c='red')
- plt.show()
欢迎点赞,收藏,关注。
plot_pred_ground(fetch_list, infer_labels_np)
欢迎点赞,收藏,关注。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。