赞
踩
时间序列分析(4-5)
时间序列分析(Time-Series Analysis)是一种对按时间顺序排列的数据序列进行统计分析和预测的方法。这种方法通常用于研究某个现象随时间的变化规律,并据此预测未来的发展趋势。以下是时间序列分析的一些关键方面和常用方法(4-5点):
时间序列分析在金融、经济、气象、销售等多个领域都有广泛的应用。通过时间序列分析,我们可以更好地理解数据背后的规律,预测未来的发展趋势,并据此做出更明智的决策。
pip install statsmodels
使用pd.read_csv()读取并对日期数据进行转换
- # 读取数据并进行预处理
- data = """
- "Month","Sales"
- "1-01",266.0
- "1-02",145.9
- "1-03",183.1
- "1-04",119.3
- "1-05",180.3
- "1-06",168.5
- "1-07",231.8
- "1-08",224.5
- "1-09",192.8
- "1-10",122.9
- "1-11",336.5
- "1-12",185.9
- "2-01",194.3
- "2-02",149.5
- "2-03",210.1
- "2-04",273.3
- "2-05",191.4
- "2-06",287.0
- "2-07",226.0
- "2-08",303.6
- "2-09",289.9
- "2-10",421.6
- "2-11",264.5
- "2-12",342.3
- "3-01",339.7
- "3-02",440.4
- "3-03",315.9
- "3-04",439.3
- "3-05",401.3
- "3-06",437.4
- "3-07",575.5
- "3-08",407.6
- "3-09",682.0
- "3-10",475.3
- "3-11",581.3
- "3-12",646.9
- """
-
- # 将字符串转换为DataFrame
- data = pd.read_csv(io.StringIO(data))
-
- # 将'Month'列转换为日期类型
- data['Month'] = pd.to_datetime(data['Month'], format='%m-%d')
- # 将'Month'列转换为日期类型,并设置年份为2024年
- data['Month'] = pd.to_datetime(data['Month'], format='%y-%m', yearfirst=True).dt.strftime('2024-%m')
- print(data)
- # 将日期列设置为索引
- data.set_index('Month', inplace=True)
数据实例:
"Month","Sales" "1-01",266.0 "1-02",145.9 "1-03",183.1 "1-04",119.3 "1-05",180.3 "1-06",168.5 "1-07",231.8 "1-08",224.5 "1-09",192.8 "1-10",122.9 "1-11",336.5 "1-12",185.9 "2-01",194.3 "2-02",149.5 "2-03",210.1 "2-04",273.3 "2-05",191.4 "2-06",287.0 "2-07",226.0 "2-08",303.6 "2-09",289.9 "2-10",421.6 "2-11",264.5 "2-12",342.3 "3-01",339.7 "3-02",440.4 "3-03",315.9 "3-04",439.3 "3-05",401.3 "3-06",437.4 "3-07",575.5 "3-08",407.6 "3-09",682.0 "3-10",475.3 "3-11",581.3 "3-12",646.9
- # 绘制时序图
- data.plot(figsize=(10, 6))
- plt.xlabel('Month')
- plt.ylabel('Sales')
- plt.title('Time Series of Sales')
- plt.show()
- # 检测序列的平稳性
- # 自相关图
- plot_acf(data, lags=9)
- plt.title('Autocorrelation Plot')
- plt.show()
-
- # 偏相关图
- plot_pacf(data, lags=9)
- plt.title('Partial Autocorrelation Plot')
- plt.show()
注意:p>0.05即为非平稳序列
- # ADF检验
- adf_result = adfuller(data['Sales'])
- print('ADF Statistic:', adf_result[0])
- print('p-value:', adf_result[1])
- print('Critical Values:', adf_result[4])
注意:根据上一步结果判断数据序列为非平稳序列,如想使用模型对数据进行建模,则需将数据转换为平稳序列。所以在这一步使用差分处理对序列进行处理。
- # 差分处理
- diff_data = data.diff().dropna()
-
- # 绘制差分后的时序图
- diff_data.plot(figsize=(10, 6))
- plt.xlabel('Month')
- plt.ylabel('Sales (Differenced)')
- plt.title('Differenced Time Series of Sales')
- plt.show()
-
- # 差分后的序列平稳性检测
- # 自相关图
- plot_acf(diff_data, lags=9)
- plt.title('Autocorrelation Plot (Differenced)')
- plt.show()
-
- # 偏相关图
- plot_pacf(diff_data, lags=9)
- plt.title('Partial Autocorrelation Plot (Differenced)')
- plt.show()
-
- # ADF检验
- adf_result_diff = adfuller(diff_data['Sales'])
- print('ADF Statistic (Differenced):', adf_result_diff[0])
- print('p-value (Differenced):', adf_result_diff[1])
- print('Critical Values (Differenced):', adf_result_diff[4])
对处理后的序列进行平稳性检测(自相关图法、偏相关图法、ADF检测法)
- # 使用ARIMA模型建模
- model = ARIMA(data, order=(1, 1, 1))
- model_fit = model.fit()
- # 预测未来5个月的销售额
- forecast_steps = 5
- forecast = model_fit.forecast(steps=5)
完整代码:
- import pandas as pd
- import matplotlib.pyplot as plt
- from statsmodels.tsa.stattools import adfuller
- from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
- from statsmodels.tsa.arima.model import ARIMA
- import io
- # 读取数据并进行预处理
- data = """
- "Month","Sales"
- "1-01",266.0
- "1-02",145.9
- "1-03",183.1
- "1-04",119.3
- "1-05",180.3
- "1-06",168.5
- "1-07",231.8
- "1-08",224.5
- "1-09",192.8
- "1-10",122.9
- "1-11",336.5
- "1-12",185.9
- "2-01",194.3
- "2-02",149.5
- "2-03",210.1
- "2-04",273.3
- "2-05",191.4
- "2-06",287.0
- "2-07",226.0
- "2-08",303.6
- "2-09",289.9
- "2-10",421.6
- "2-11",264.5
- "2-12",342.3
- "3-01",339.7
- "3-02",440.4
- "3-03",315.9
- "3-04",439.3
- "3-05",401.3
- "3-06",437.4
- "3-07",575.5
- "3-08",407.6
- "3-09",682.0
- "3-10",475.3
- "3-11",581.3
- "3-12",646.9
- """
-
- # 将字符串转换为DataFrame
- data = pd.read_csv(io.StringIO(data))
-
- # 将'Month'列转换为日期类型
- data['Month'] = pd.to_datetime(data['Month'], format='%m-%d')
- # 将'Month'列转换为日期类型,并设置年份为2024年
- data['Month'] = pd.to_datetime(data['Month'], format='%y-%m', yearfirst=True).dt.strftime('2024-%m')
- print(data)
- # 将日期列设置为索引
- data.set_index('Month', inplace=True)
-
- # 绘制时序图
- data.plot(figsize=(10, 6))
- plt.xlabel('Month')
- plt.ylabel('Sales')
- plt.title('Time Series of Sales')
- plt.show()
-
- # 检测序列的平稳性
- # 自相关图
- plot_acf(data, lags=9)
- plt.title('Autocorrelation Plot')
- plt.show()
-
- # 偏相关图
- plot_pacf(data, lags=9)
- plt.title('Partial Autocorrelation Plot')
- plt.show()
-
- # ADF检验
- adf_result = adfuller(data['Sales'])
- print('ADF Statistic:', adf_result[0])
- print('p-value:', adf_result[1])
- print('Critical Values:', adf_result[4])
-
- # 差分处理
- diff_data = data.diff().dropna()
-
- # 绘制差分后的时序图
- diff_data.plot(figsize=(10, 6))
- plt.xlabel('Month')
- plt.ylabel('Sales (Differenced)')
- plt.title('Differenced Time Series of Sales')
- plt.show()
-
- # 差分后的序列平稳性检测
- # 自相关图
- plot_acf(diff_data, lags=9)
- plt.title('Autocorrelation Plot (Differenced)')
- plt.show()
-
- # 偏相关图
- plot_pacf(diff_data, lags=9)
- plt.title('Partial Autocorrelation Plot (Differenced)')
- plt.show()
-
- # ADF检验
- adf_result_diff = adfuller(diff_data['Sales'])
- print('ADF Statistic (Differenced):', adf_result_diff[0])
- print('p-value (Differenced):', adf_result_diff[1])
- print('Critical Values (Differenced):', adf_result_diff[4])
-
- # 使用ARIMA模型建模
- model = ARIMA(data, order=(1, 1, 1))
- model_fit = model.fit()
-
- # 打印模型的概要信息
- print(model_fit.summary())
-
- # 预测未来5个月的销售额
- forecast_steps = 5
- forecast = model_fit.forecast(steps=5)
- # 生成未来几个月的日期
- last_month = data.index[-1]
- future_months = pd.date_range(start=last_month, periods=forecast_steps + 1, freq='M')[1:]
-
- # 创建包含日期和预测销售额的DataFrame
- forecast_df = pd.DataFrame({'Month': future_months, 'Forecasted Sales': forecast})
-
- # 输出预测结果
- print('Forecasted Sales for the next 5 months:')
- print(forecast_df)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。