赞
踩
之前已经介绍了一个服装分类项目:https://blog.csdn.net/caoyuan666/article/details/105390193
本项目同样将用采用sklearn自带的数据集,带大家轻松入门TensorFlow的回归问题。
开发环境:
tensorflow 2.0.0
sklearn 0.21.3
首先,导入所需模块,并打印各个模块的版本号等信息,有些模块本节可能未涉及到,但是未来会用,这里也先导入了,之后写程序直接复制粘贴即可,很方便。
import matplotlib as mpl import matplotlib.pyplot as plt #为了在jupyter notebook中画图 %matplotlib inline import numpy as np import sklearn import pandas as pd import os import sys import time import tensorflow as tf from tensorflow import keras print(tf.__version__) print(sys.version_info) for module in mpl,np,pd,sklearn,tf,keras: print(module.__name__,module.__version__)
如果出现那个库没有安装的话,可以参考下面链接进行安装:
https://blog.csdn.net/caoyuan666/article/details/104935862
这里利用的是sklearn自带的一个数据集fetch_california_housing,该数据集是加利福尼亚住房数据集的修改版本,可从LuísTorgo的页面(波尔图大学)获得。LuísTorgo从StatLib存储库(现已关闭)获取它。也可以从StatLib镜像下载数据集。
该数据集出现在1997年由Pace,R.Kelley和Ronald Barry撰写的名为Sparse Spatial Autoregressions的论文中,该论文发表于“ 统计与概率快报 ”期刊。他们使用1990年加州人口普查数据建立了它。每个人口普查区块组包含一行。区块组是美国人口普查局发布样本数据的最小地理单位(区块组通常拥有600至3,000人口)。
该数据集是从StatLib存储库获得的。http://lib.stat.cmu.edu/datasets/
这里如果是第一次运行,请保持联网,执行第一条语句时将会自动从网络中下载fashion_mnist数据集,可能会花费一段时间,一般几分钟即可,之后再执行就会直接调用,无需等待。
from sklearn.datasets import fetch_california_housing
housing=fetch_california_housing()
print(housing.DESCR)
print(housing.data.shape)
print(housing.target.shape)
import pprint
pprint.pprint(housing.data[0:5])
pprint.pprint(housing.target[0:5])
结果:
array([[ 8.32520000e+00, 4.10000000e+01, 6.98412698e+00, 1.02380952e+00, 3.22000000e+02, 2.55555556e+00, 3.78800000e+01, -1.22230000e+02], [ 8.30140000e+00, 2.10000000e+01, 6.23813708e+00, 9.71880492e-01, 2.40100000e+03, 2.10984183e+00, 3.78600000e+01, -1.22220000e+02], [ 7.25740000e+00, 5.20000000e+01, 8.28813559e+00, 1.07344633e+00, 4.96000000e+02, 2.80225989e+00, 3.78500000e+01, -1.22240000e+02], [ 5.64310000e+00, 5.20000000e+01, 5.81735160e+00, 1.07305936e+00, 5.58000000e+02, 2.54794521e+00, 3.78500000e+01, -1.22250000e+02], [ 3.84620000e+00, 5.20000000e+01, 6.28185328e+00, 1.08108108e+00, 5.65000000e+02, 2.18146718e+00, 3.78500000e+01, -1.22250000e+02]]) array([4.526, 3.585, 3.521, 3.413, 3.422])
from sklearn.model_selection import train_test_split
x_train_all,x_test,y_train_all,y_test=train_test_split(
housing.data,housing.target,random_state=7)
x_train,x_valid,y_train,y_valid=train_test_split(
x_train_all,y_train_all,random_state=11)
print(x_train.shape,y_train.shape)
print(x_valid.shape,y_valid.shape)
print(x_test.shape,y_test.shape)
X_train,X_test, y_train, y_test =cross_validation.train_test_split(train_data,train_target,test_size=0.3, random_state=0)
参数解释:
随机数的产生取决于种子,随机数和种子之间的关系遵从以下两个规则:
种子不同,产生不同的随机数;种子相同,即使实例不同也产生相同的随机数。
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_valid_scaled = scaler.transform(x_valid)
x_test_scaled = scaler.transform(x_test)
model=keras.models.Sequential([
keras.layers.Dense(30,activation='relu',
input_shape=x_train.shape[1:]),
keras.layers.Dense(1),
])
model.summary()
model.compile(loss='mean_squared_error',
optimizer='adam',)
可查看模型结果如下:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 30) 270
_________________________________________________________________
dense_1 (Dense) (None, 1) 31
=================================================================
Total params: 301
Trainable params: 301
Non-trainable params: 0
callbacks=[keras.callbacks.EarlyStopping(patience=5,min_delta=1e-2)]
history=model.fit(x_train_scaled,y_train,
epochs=100,
validation_data=(x_valid_scaled,y_valid),
callbacks = callbacks )
def plot_learning_curves(history):
#设置画布大小为8和5
pd.DataFrame(history.history).plot(figsize=(8,5))
#显示网格
plt.grid(True)
#set_ylim为设置y坐标轴的范围
plt.gca().set_ylim(0,1)
plt.show()
plot_learning_curves(history)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。