赞
踩
下载yfinance来使用雅虎上的市场数据。
%pip install yfinance
引入所需的package。
import math
from pandas_datareader import data as pdr
import numpy as np
import pandas as pd
import yfinance as yfin
import datetime as dt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
提取苹果(AAPL)由2013年7月1日到现在的股票价格数据。
yfin.pdr_override()
df = pdr.get_data_yahoo('AAPL', start='2013-07-01', end=dt.datetime.today())
print(df)
输出如下:
我们只对收盘价感兴趣,先将收盘价历史数据可视化。
#Visualize the closing price history
plt.figure(figsize=(16,8))
plt.title('Closing Price History')
plt.plot(df['Close'])
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()
输出如下:
将收盘价提取出来并转换成一个numpy array。我们把数据的80%用作training data。
#Create a new dataframe with only the 'Close' column
data = df.filter(['Close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil(len(dataset)*0.8)
training_data_len
使用MinMaxScaler处理数据,将所有数据normalize成0到1之间。
#Scale the data
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
scaled_data
数据变为0到1之间:
接下来,建立training dataset。这里我们把window size设定为60,model会用60天的数据来预测第61天的数据。
#Create the training data set
#Create the scaled training data set
train_data = scaled_data[0:training_data_len, :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
if i<=61:
print(x_train)
print(y_train)
print()
这里的输入如下。我们在for loop的第一步,将数据集的前60个数据作为training data的第一组数据,并将第61个数据作为对应的y。
下一步我们把数据类型转换为numpy array,并改变数据的shape,令它符合LSTM模型的输入要求。
# Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)
#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
x_train.shape
数据处理好后,我们可以开始构建模型。
#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
开始训练。
#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
训练完毕后,我们开始构建testing data用于评估模型的效能。这里的x还是以60个为一组。
#Create the testing data set
#Create a new array containing scaled values from index 1955 to 2518
test_data = scaled_data[training_data_len - 60:, :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
还是一样,先将test数据转换成numpy array,再改变数据的shape,令它可以被输入到model中。
#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
使用模型得到predictions,并将predictions由的0到1的范围变为原来的范围。
#Get the models predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
使用均方根误差(RMSE)评估模型,RMSE越小,model的表现越好。
#Get the root mean squared error (RMSE)
from sklearn.metrics import mean_squared_error
import math
MSE = mean_squared_error(y_test, predictions)
RMSE = math.sqrt(MSE)
RMSE
绘画出原数据和预测值的曲线。
#Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
#Visualize the data
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')
plt.show()
输出如下:
我们可以看看真实数据和predictions之间的差别。
#Show the valid and predicted prices
valid
我们可以使用一支股票最后60天的数据来预测未来一天的股价。
#Get the quote apple_quote = pdr.get_data_yahoo('AAPL', start='2013-07-01', end=dt.datetime.today()) #Create a new dataframe new_df = apple_quote.filter(['Close']) #Get the last 60 day closing price values and convert the dataframe to an array last_60_days = new_df[-60:].values #Scale the data to be values betweem 0 and 1 last_60_days_scaled = scaler.transform(last_60_days) #Create an empty list X_test = [] #Append the past 60 days X_test.append(last_60_days_scaled) #Convert the X_test data set to a numpy array X_test = np.array(X_test) #Reshape the data X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1)) #Get the predicted scaled price pred_price = model.predict(X_test) #Undo the scaling pred_price = scaler.inverse_transform(pred_price) print(pred_price)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。