赞
踩
来自参考
也就是两列数据,一列是时间,一列是电力消耗量:
Datetime,PJME_MW 2002-12-31 01:00:00,26498.0 2002-12-31 02:00:00,25147.0 2002-12-31 03:00:00,24574.0 2002-12-31 04:00:00,24393.0 2002-12-31 05:00:00,24860.0 2002-12-31 06:00:00,26222.0 2002-12-31 07:00:00,28702.0 2002-12-31 08:00:00,30698.0 ... 2018-01-01 19:00:00,44343.0 2018-01-01 20:00:00,44284.0 2018-01-01 21:00:00,43751.0 2018-01-01 22:00:00,42402.0 2018-01-01 23:00:00,40164.0 2018-01-02 00:00:00,38608.0
以2015-01-01切分训练集和测试集:
pjme = pd.read_csv('PJME_hourly.csv', index_col=[0], parse_dates=[0])
split_date = '2015-01-01'
pjme_train = pjme.loc[pjme.index <= split_date].copy()
pjme_test = pjme.loc[pjme.index > split_date].copy()
构造特征:
def create_features(df, label=None): df['date'] = df.index # index: DatetimeIndex df['hour'] = df['date'].dt.hour # dt: DatetimeProperties, hour: Series df['day_of_week'] = df['date'].dt.dayofweek df['quarter'] = df['date'].dt.quarter df['month'] = df['date'].dt.month df['year'] = df['date'].dt.year df['day_of_year'] = df['date'].dt.dayofyear df['day_of_month'] = df['date'].dt.day df['week_of_year'] = df['date'].dt.weekofyear X = df[['hour', 'day_of_week', 'quarter', 'month', 'year', 'day_of_year', 'day_of_month', 'week_of_year']] if label: y = df[label] return X, y return X # 训练集 X_train, y_train = create_features(pjme_train, label='PJME_MW') # 测试集 X_test, y_test = create_features(pjme_test, label='PJME_MW')
X_train:
hour day_of_week quarter month year day_of_year day_of_month week_of_year
Datetime
2002-12-31 01:00:00 1 1 4 12 2002 365 31 1
2002-12-31 02:00:00 2 1 4 12 2002 365 31 1
2002-12-31 03:00:00 3 1 4 12 2002 365 31 1
2002-12-31 04:00:00 4 1 4 12 2002 365 31 1
2002-12-31 05:00:00 5 1 4 12 2002 365 31 1
...
# 模型
reg = xgb.XGBRegressor(n_estimators=1000)
# 训练
reg.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)], early_stopping_rounds=50)
[0] validation_0-rmse:29710.4 validation_1-rmse:28762.5 Multiple eval metrics have been passed: 'validation_1-rmse' will be used for early stopping. Will train until validation_1-rmse hasn't improved in 50 rounds. [1] validation_0-rmse:26822.6 validation_1-rmse:25892.2 [2] validation_0-rmse:24211.2 validation_1-rmse:23286.6 [3] validation_0-rmse:21885.1 validation_1-rmse:20967.5 [4] validation_0-rmse:19780.3 validation_1-rmse:18868.5 ... [195] validation_0-rmse:2844.33 validation_1-rmse:3754.45 [196] validation_0-rmse:2842.94 validation_1-rmse:3754.73 [197] validation_0-rmse:2840.57 validation_1-rmse:3754.88 [198] validation_0-rmse:2838.73 validation_1-rmse:3754.71 [199] validation_0-rmse:2837.81 validation_1-rmse:3753.66 Stopping. Best iteration: [149] validation_0-rmse:2923.17 validation_1-rmse:3712.2
# 预测
y_pred = reg.predict(X_test)
[28804.365 27663.098 27125.912 ... 34988.7 32725.598 31440.66 ]
RMSE: 均方根误差(Root Mean Square Error)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。