赞
踩
prophet 是facebook 开源的一款时间序列预测工具包,直接用 conda 安装 fbprophet 即可
prophet 的官网:https://facebook.github.io/prophet/
prophet 中文意思是“先知”
prophet 的输入一般具有两列:ds
和y
ds
(datestamp) 列应为 Pandas 可以识别的日期格式,日期应为YYYY-MM-DD,时间戳则应为YYYY-MM-DD HH:MM:SS
y
列必须是数值
Metro Interstate Traffic Volume Data Set
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.metrics import mean_squared_error, mean_absolute_error %matplotlib inline plt.rcParams['font.sans-serif'] = 'SimHei' #显示中文 plt.rcParams['axes.unicode_minus'] = False #显示负号 plt.rcParams['figure.dpi'] = 200 plt.rcParams['text.color'] = 'black' plt.rcParams['font.size'] = 20 plt.style.use('ggplot') print(plt.style.available) # ['bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark-palette', 'seaborn-dark', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'seaborn', 'Solarize_Light2', 'tableau-colorblind10', '_classic_test']
csv_files = 'Metro_Interstate_Traffic_Volume.csv'
df = pd.read_csv(csv_files)
df.set_index('date_time',inplace=True)
df.index = pd.to_datetime(df.index)
df.head()
略扫一眼表格内容,主要有假期、气温、降雨、降雪、天气类型等因素,因变量是交通流量traffic_volume
df.info() ''' <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 48204 entries, 2012-10-02 09:00:00 to 2018-09-30 23:00:00 Data columns (total 8 columns): holiday 48204 non-null object temp 48204 non-null float64 rain_1h 48204 non-null float64 snow_1h 48204 non-null float64 clouds_all 48204 non-null int64 weather_main 48204 non-null object weather_description 48204 non-null object traffic_volume 48204 non-null int64 dtypes: float64(3), int64(2), object(3) memory usage: 3.3+ MB ''' df.describe()
原来少了一点数据,不过影响不大
traffic = df[['traffic_volume']]
traffic[:].plot(style='--', figsize=(15,5), title='traffic_volume')
plt.show()
知识点:pandas 中筛选日期
traffic_train = traffic.loc[(traffic.index >='2017-01') & (traffic.index <= '2018-03')].copy()
traffic_test = traffic.loc[traffic.index > '2018-03'].copy()
_ = traffic_test.rename(columns={'traffic_volume': 'TEST SET'})\
.join(traffic_train.rename(columns={'traffic_volume': 'TRAINING SET'}),how='outer') \
.plot(figsize=(20,5), title='traffic_volume', style='.')
因为是逐小时统计的数据,只选两年的量就已经够多了
虽然 prophet 不需要我们手工提取特征,但我们还是可以自己试试
def create_features(df, label=None): """ Creates time series features from datetime index. """ df = df.copy() df['date'] = df.index df['hour'] = df['date'].dt.hour df['dayofweek'] = df['date'].dt.dayofweek df['quarter'] = df['date'].dt.quarter df['month'] = df['date'].dt.month df['year'] = df['date'].dt.year df['dayofyear'] = df['date'].dt.dayofyear df['dayofmonth'] = df['date'].dt.day df['weekofyear'] = df['date'].dt.weekofyear X = df[['hour','dayofweek','quarter','month','year', 'dayofyear','dayofmonth','weekofyear']] if label: y = df[label] return X, y return X X, y = create_features(traffic, label='traffic_volume') features_and_target = pd.concat([X, y], axis=1) features_and_target.head()
自己体会一下不同特征对预测变量的影响
sns.pairplot(features_and_target.dropna(),
hue='hour',
x_vars=['hour','dayofweek',
'dayofmonth','month'],
y_vars='traffic_volume',
height=5,
plot_kws={'alpha':0.15, 'linewidth':0}
)
plt.suptitle('Traffic Volume by Hour, Day of Week, Day of Month and Month')
plt.show()
上面的 pairplot 可以得出什么信息呢?
首先颜色是按照小时取,所以每种颜色代表一个时辰
后三幅图的竖条上的颜色分布代表不同时间段的流量分布
有意义的信息主要来自散点的分布范围,可以看出:
from fbprophet import Prophet
# Setup and train model and fit
model = Prophet()
model.fit(traffic_train.reset_index().rename(columns={'date_time':'ds','traffic_volume':'y'}))
traffic_test_pred = model.predict(df=traffic_test.reset_index() \
.rename(columns={'date_time':'ds'}))
画出预测结果
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
ax.scatter(traffic_test.index, traffic_test['traffic_volume'], color='r')
fig = model.plot(traffic_test_pred, ax=ax)
造成这种现象是因为:
感兴趣的朋友可以自己玩玩
从下图可以看出:
fig = model.plot_components(traffic_test_pred)
看看模型对测试集中第一个月的预测情况:
# Plot the forecast with the actuals
f, ax = plt.subplots(1)
f.set_figheight(5)
f.set_figwidth(15)
plt.plot(traffic_test.index, traffic_test['traffic_volume'], color='r')
fig = model.plot(traffic_test_pred, ax=ax)
ax.set_xbound(lower='03-01-2018',
upper='04-01-2018')
ax.set_ylim(-1000, 8000)
plot = plt.suptitle('Forecast vs Actuals')
是不是有模有样的
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。