赞
踩
ARIMA 模型为平稳时间序列定义的。 因此, 如果你从一个非平稳的时间序列开始, 首先你就需要做时间序列差分直到你得到一个平稳时间序列。如果你必须对时间序列做 d 阶差分才能得到一个平稳序列,那么你就使用ARIMA(p,d,q)模型,其中 d 是差分的阶数。
- #1、加载数据
- skirts <- scan("http://robjhyndman.com/tsdldata/roberts/skirts.dat",skip=5)
- str(skirts)
## num [1:46] 608 617 625 636 657 691 728 784 816 876 ...
- #从上可知:数据集为一个数字类型的向量
-
- #2、把数据转化为是时间序列
- (skirts_ts <- ts(skirts, start=c(1886), frequency = 1))
- ## Time Series:
- ## Start = 1886
- ## End = 1931
- ## Frequency = 1
- ## [1] 608 617 625 636 657 691 728 784 816 876 949 997 1027 1047
- ## [15] 1049 1018 1021 1012 1018 991 962 921 871 829 822 820 802 821
- ## [29] 819 791 746 726 661 620 588 568 542 551 541 557 556 534
- ## [43] 528 529 523 531
- #1)查看时间序列对应的时间
- time(skirts_ts)
- ## Time Series:
- ## Start = 1886
- ## End = 1931
- ## Frequency = 1
- ## [1] 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899
- ## [15] 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913
- ## [29] 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927
- ## [43] 1928 1929 1930 1931
- #2)画出时间序列图
- plot.ts(skirts_ts)
- #从上图可知:女人裙子边缘的直径做成的时间序列数据,从 1866 年到 1911 年在平均值上是不平稳的, 随着时间增加, 数值变化很大。
-
- #3、确定时间序列的差分参数d:做时间序列的一阶差分, 并画出差分序列的图
- skirts_diff <- diff(skirts_ts, differences=1)
- plot.ts(skirts_diff)
- #从一阶差分的图中可以看出,数据仍是不平稳的,继续差分
-
- #做时间序列的二阶差分
- skirts_diff2 <- diff(skirts_ts, differences=2)
- plot.ts(skirts_diff2)
- #二次差分(上面)后的时间序列在均值和方差上确实看起来像是平稳的, 随着时间推移, 时间序列的水平和方差大致保持不变。因此, 看起来我们需要对裙子直径进行两次差分以得到平稳序列。
-
- #4、找到合适的ARIMA模型:ARIMA要求时间序列是平稳的(差分来获得)
-
- #如果你的时间序列是平稳的,或者你通过做 n 次差分转化为一个平稳时间序列, 接下来就是要选择合适的 ARIMA模型,这意味着需要寻找 ARIMA(p,d,q)中合适的 p 值和 q 值。为了得到这些,通常需要检查平稳时间序列的(自)相关图和偏相关图。
-
- #方法一、使用auto.arima()函数,自动获取最佳的ARIMA模型
- library(forecast)
## Loading required package: zoo
- ##
- ## Attaching package: 'zoo'
- ## The following objects are masked from 'package:base':
- ##
- ## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 6.2
auto.arima(skirts_ts, ic=c("aicc", "aic", "bic"), trace = T)
- ##
- ## ARIMA(2,2,2) : Inf
- ## ARIMA(0,2,0) : 393.6216
- ## ARIMA(1,2,0) : 391.6212
- ## ARIMA(0,2,1) : 392.0664
- ## ARIMA(2,2,0) : 393.9273
- ## ARIMA(1,2,1) : 393.9276
- ## ARIMA(2,2,1) : Inf
- ##
- ## Best model: ARIMA(1,2,0)
- ## Series: skirts_ts
- ## ARIMA(1,2,0)
- ##
- ## Coefficients:
- ## ar1
- ## -0.2997
- ## s.e. 0.1424
- ##
- ## sigma^2 estimated as 388.7: log likelihood=-193.66
- ## AIC=391.33 AICc=391.62 BIC=394.9
- #最优模型为arima(1, 2, 0)
-
- #方法二、使用 R 中的“acf()”和“pacf” 函数来分别( 自) 相关图和偏相关图。“acf()”和“pacf 设定“plot=FALSE” 来得到自相关和偏相关的真实值
- #1)相关图
- acf(skirts_diff2, lag.max = 20)
acf(skirts_diff2, lag.max = 20, plot=F)
- ##
- ## Autocorrelations of series 'skirts_diff2', by lag
- ##
- ## 0 1 2 3 4 5 6 7 8 9
- ## 1.000 -0.303 0.096 0.009 0.102 -0.453 0.173 -0.025 -0.039 0.073
- ## 10 11 12 13 14 15 16 17 18 19
- ## -0.094 0.133 -0.089 -0.027 -0.102 0.207 -0.260 0.114 0.101 0.011
- ## 20
- ## -0.090
- #自相关图显示滞后1阶自相关值基本没有超过边界值,虽然5阶自相关值超出边界,那么很可能属于偶然出现的,而自相关值在其他上都没有超出显著边界, 而且我们可以期望 1 到 20 之间的会偶尔超出 95%的置信边界。 自相关图5阶后结尾
-
- #2)偏自相关图
- pacf(skirts_diff2, lag.max = 20)
pacf(skirts_diff2, lag.max = 20, plot=F)
- ##
- ## Partial autocorrelations of series 'skirts_diff2', by lag
- ##
- ## 1 2 3 4 5 6 7 8 9 10
- ## -0.303 0.005 0.043 0.128 -0.439 -0.110 0.073 0.028 0.128 -0.355
- ## 11 12 13 14 15 16 17 18 19 20
- ## 0.095 0.052 -0.094 -0.103 -0.034 -0.021 -0.002 0.074 0.020 -0.034
- #偏自相关值选1阶后结尾。故我们的ARMIA模型为armia(1,2,5)
-
- #5、建立ARIMA模型:并对比arima(1, 2, 0)与arima(1, 2, 5)模型
- #1)arima(1, 2, 0)模型
- (skirts_arima <- arima(skirts_ts, order=c(1, 2, 0)))
- ##
- ## Call:
- ## arima(x = skirts_ts, order = c(1, 2, 0))
- ##
- ## Coefficients:
- ## ar1
- ## -0.2997
- ## s.e. 0.1424
- ##
- ## sigma^2 estimated as 388.7: log likelihood = -193.66, aic = 391.33
- #)arima(1, 2, 5)模型
- (skirts_arima <- arima(skirts_ts, order=c(1, 2, 5)))
- ##
- ## Call:
- ## arima(x = skirts_ts, order = c(1, 2, 5))
- ##
- ## Coefficients:
- ## ar1 ma1 ma2 ma3 ma4 ma5
- ## -0.4345 0.2762 0.1033 0.1472 0.0267 -0.8384
- ## s.e. 0.1837 0.2171 0.2198 0.2716 0.1904 0.2888
- ##
- ## sigma^2 estimated as 206.1: log likelihood = -183.8, aic = 381.6
- #按AIC和SBC标准计算的统计量(9.496798)和(18.54752),这两个值都较小,表明对预测模型拟合得较好
- #AIC是赤池消息准则SC是施瓦茨准则,当两个数值最小时,则是最优滞后分布的长度。我们进行模型选择时,AIC值越小越好。
-
- #从上可知:arima(1, 2, 0)模型的aic值比arima(1, 2, 5)模型的AIC值大,所以arima(1, 2, 5)模型较好
-
- #6、预测:预测5年后裙子的边缘直径
- library(foreach)
- (skirts_forecast <- forecast.Arima(skirts_arima, h=5, level=c(99.5)))
- ## Point Forecast Lo 99.5 Hi 99.5
- ## 1932 548.5762 507.1167 590.0357
- ## 1933 545.1793 459.3292 631.0295
- ## 1934 540.9354 396.3768 685.4940
- ## 1935 531.8838 316.2785 747.4892
- ## 1936 529.1296 233.2625 824.9968
- #画出预测的图形
- plot.forecast(skirts_forecast)
- #7、检验
- #在指数平滑模型下, 观察 ARIMA 模型的预测误差是否是平均值为 0 且方差为常数的正态分布(服从零均值、方差不变的正态分布) 是个好主意,同时也要观察连续预测误差是否(自)相关
- #1)检验预测误差的自相关性
- #方法一、做tsdiag检验
- tsdiag(skirts_arima)
- #Acf检验说明:残差没有明显的自相关性,Ljung-Box测试显示:所有的P-value>0.05,??说明残差为白噪声。
-
- #第一个图表代表估计模型误差的绘图。英文叫做Standardized Residuals, 上面有很多竖线在横向坐标的上下分布。如果这个估计的模型比较可信,竖线的长度是比较相似的。如果竖线的长度互相有很大出入或者根本就不同,估计模型的可信度就非常差。下面误差绘图中竖线的长度比较相似,都处在稳定范围之内,即估计的模型没产生不符合要求的误差分布。??
-
- #再介绍输出的第二张绘图,标题是ACF of Residuals。ACF指数据点相互之间的关系,当然在生成这个数据时,数据点之间互相独立,并不存在任何关系。所以在这张图上,只有位于0刻度上的竖线最高,其ACF值为1。 这个0代表数据点与自己相比较, 即数据点永远和它自己有关系,这种关系数值为1。其他横向数轴上的刻度代表一个数据点于其他数据点之间的关系,这些刻度上竖线的长度几乎等于0,即这个数据点与其他数据点没明显关系。这张ACF图代表估计的模型没造成误差之间的任何关系。这是符合数据生成时每个数据都是独立的这个前提的。由此可见,这ACF图符合检测要求。
-
- #下面来介绍第三张图,也就是Ljung-Box 指标。这个指标可对每一个时间序列的延迟进行显著性的评估。这张图的横坐标代表时间序列的延迟,纵坐标代表P-value,即显著性。如果P-value十分小,就说明在其相对应的延迟点上是显著的。我们就需要抛弃所假设的模型,并且结论在所假设的模型不可信。需要注意的是,他们使用假设的模型对一个时间序列进行估计,如果P-value是显著的话,我们使用的模型就不可信,需要尝试其他新模型。具体判定技巧是,P-value点的高度越高,我们的模型越可信。
-
- #方法二、判断预测误差的(自)相关性
- acf(skirts_forecast$residuals, lag.max = 20)
Box.test(skirts_forecast$residuals, lag=20, type = "Ljung-Box")
- ##
- ## Box-Ljung test
- ##
- ## data: skirts_forecast$residuals
- ## X-squared = 8.5974, df = 20, p-value = 0.9871
- #既然相关图显示出在滞后1??-??20阶(??l??a??g??s??1??-??20??)中样本自相关值都没有超出显著(置信)边界,而且Ljung-Box检验的p值为0.99,所以我们推断在滞后1-20阶(lags1-20)中没明显证据说明预测误差是非零自相关的。??
-
- #2)判断预测误差是否是平均值为零且方差为常数的正态分布
- #为了调查预测误差是否是平均值为零且方差为常数的正态分布(服从零均值、方差不变的正态分布),我们可以做预测误差的时间曲线图和直方图(具有正态分布曲线)
-
- # 预测误差的均值是否为0
- plot.ts(skirts_forecast$residuals)
- #自定义判断预测误差的方差是正态分布的函数
- plotForecastErrors <- function(forecasterrors) {
- #画预测误差的直方图
- hist(forecasterrors, col="red", freq=F)
- #??freq=FALSE??ensures??the??area??under??the??histogram??=??1
- #??generate??normally??distributed??data??with??mean??0??and??standard??deviation??mysd
- mysd <- sd(forecasterrors)
- mynorm <- rnorm(10000, mean=0, sd=mysd)
- myhist <- hist(mynorm, plot=F)
- #??plot??the??normal??curve??as??a??blue??line??on??top??of??the??histogram??of??forecast??errors:
- points(myhist$mids, myhist$density, type="l", col="blue", lwd=2)
- }
- plotForecastErrors(skirts_forecast$residuals)
- #上图预测中的时间曲线图显示出对着时间增加,方差大致为常数(大致不变)(尽管上半部分的时间序列方差看起来稍微高一些)。时间序列的直方图显示预测误大致是正态分布的且平均值接近于 0(服从零均值的正态分布的)。因此,把预测误差看作平均值为0方差为常数正态分布(服从零均值、方差不变的正态分布)是合理的。??
-
- #既然依次连续的预测误差看起来不是相关,而且看起来是平均值为 0 方差为常数的正态分布(服从零均值、方差不变的正态分布),那么对于裙子直径的数据, ARIMA(1,2,5)看起来是可以提供非常合适预测的模型。
- #1、读取数据
- source<- c(10930,10318,10595,10972,7706,6756,9092,10551,9722,10913,11151,8186,6422,
- 6337,11649,11652,10310,12043,7937,6476,9662,9570,9981,9331,9449,6773,6304,9355,10477,
- 10148,10395,11261,8713,7299,10424,10795,11069,11602,11427,9095,7707,10767,12136,12812,
- 12006,12528,10329,7818,11719,11683,12603,11495,13670,11337,10232,13261,13230,15535,
- 16837,19598,14823,11622,19391,18177,19994,14723,15694,13248,9543,12872,13101,15053,
- 12619,13749,10228,9725,14729,12518,14564,15085,14722,11999,9390,13481,14795,15845,
- 15271,14686,11054,10395,14775,14618,16029,15231,14246,12095,10473,15323,15381,14947)
-
- #2、利用xts包中的xts()函数,把数据转化为周期性(7天)的时间序列
- library(xts)
- data <- xts(source, seq(as.POSIXct("2014-01-01"), len=length(source), by="day"))
-
- plot(data)
head(data)
- ## [,1]
- ## 2014-01-01 10930
- ## 2014-01-02 10318
- ## 2014-01-03 10595
- ## 2014-01-04 10972
- ## 2014-01-05 7706
- ## 2014-01-06 6756
time(data)
- ## [1] "2014-01-01 CST" "2014-01-02 CST" "2014-01-03 CST" "2014-01-04 CST"
- ## [5] "2014-01-05 CST" "2014-01-06 CST" "2014-01-07 CST" "2014-01-08 CST"
- ## [9] "2014-01-09 CST" "2014-01-10 CST" "2014-01-11 CST" "2014-01-12 CST"
- ## [13] "2014-01-13 CST" "2014-01-14 CST" "2014-01-15 CST" "2014-01-16 CST"
- ## [17] "2014-01-17 CST" "2014-01-18 CST" "2014-01-19 CST" "2014-01-20 CST"
- ## [21] "2014-01-21 CST" "2014-01-22 CST" "2014-01-23 CST" "2014-01-24 CST"
- ## [25] "2014-01-25 CST" "2014-01-26 CST" "2014-01-27 CST" "2014-01-28 CST"
- ## [29] "2014-01-29 CST" "2014-01-30 CST" "2014-01-31 CST" "2014-02-01 CST"
- ## [33] "2014-02-02 CST" "2014-02-03 CST" "2014-02-04 CST" "2014-02-05 CST"
- ## [37] "2014-02-06 CST" "2014-02-07 CST" "2014-02-08 CST" "2014-02-09 CST"
- ## [41] "2014-02-10 CST" "2014-02-11 CST" "2014-02-12 CST" "2014-02-13 CST"
- ## [45] "2014-02-14 CST" "2014-02-15 CST" "2014-02-16 CST" "2014-02-17 CST"
- ## [49] "2014-02-18 CST" "2014-02-19 CST" "2014-02-20 CST" "2014-02-21 CST"
- ## [53] "2014-02-22 CST" "2014-02-23 CST" "2014-02-24 CST" "2014-02-25 CST"
- ## [57] "2014-02-26 CST" "2014-02-27 CST" "2014-02-28 CST" "2014-03-01 CST"
- ## [61] "2014-03-02 CST" "2014-03-03 CST" "2014-03-04 CST" "2014-03-05 CST"
- ## [65] "2014-03-06 CST" "2014-03-07 CST" "2014-03-08 CST" "2014-03-09 CST"
- ## [69] "2014-03-10 CST" "2014-03-11 CST" "2014-03-12 CST" "2014-03-13 CST"
- ## [73] "2014-03-14 CST" "2014-03-15 CST" "2014-03-16 CST" "2014-03-17 CST"
- ## [77] "2014-03-18 CST" "2014-03-19 CST" "2014-03-20 CST" "2014-03-21 CST"
- ## [81] "2014-03-22 CST" "2014-03-23 CST" "2014-03-24 CST" "2014-03-25 CST"
- ## [85] "2014-03-26 CST" "2014-03-27 CST" "2014-03-28 CST" "2014-03-29 CST"
- ## [89] "2014-03-30 CST" "2014-03-31 CST" "2014-04-01 CST" "2014-04-02 CST"
- ## [93] "2014-04-03 CST" "2014-04-04 CST" "2014-04-05 CST" "2014-04-06 CST"
- ## [97] "2014-04-07 CST" "2014-04-08 CST" "2014-04-09 CST" "2014-04-10 CST"
- #3、确定时间序列的差分参数d
- #1)做一阶差分
- data_diff1 <- diff(data, differences=1)
- plot(data_diff1)
str(data_diff1)
- ## An 'xts' object on 2014-01-01/2014-04-10 containing:
- ## Data: num [1:100, 1] NA -612 277 377 -3266 ...
- ## Indexed by objects of class: [POSIXct,POSIXt] TZ:
- ## xts Attributes:
- ## NULL
- #第一值为NA
-
- #2)做时间序列的二阶差分
- data_diff2 <- diff(data, differences=2)
- plot(data_diff2)
str(data_diff2)
- ## An 'xts' object on 2014-01-01/2014-04-10 containing:
- ## Data: num [1:100, 1] NA NA 889 100 -3643 ...
- ## Indexed by objects of class: [POSIXct,POSIXt] TZ:
- ## xts Attributes:
- ## NULL
- #第一个和第二值为NA
-
- #可以看出一次差分后的时间序列在均值和方差上看起来像是平稳的,与二次差分的图形相差不大,随着时间推移,时间序列大致保持不变,因此设置差分项d=1。
-
- #4、接下来需要选择合适的ARIMA模型,即确定ARIMA(p,d,q)中合适的 p、q 值
- #1)方法一、使用auto.arima()自动获取最佳的ARIMA模型
- auto.arima(data, trace=T)
- ##
- ## ARIMA(2,1,2) with drift : Inf
- ## ARIMA(0,1,0) with drift : 1819.931
- ## ARIMA(1,1,0) with drift : 1819.509
- ## ARIMA(0,1,1) with drift : 1813.218
- ## ARIMA(0,1,0) : 1817.877
- ## ARIMA(1,1,1) with drift : Inf
- ## ARIMA(0,1,2) with drift : 1799.305
- ## ARIMA(1,1,3) with drift : Inf
- ## ARIMA(0,1,2) : 1798.392
- ## ARIMA(1,1,2) : 1800.277
- ## ARIMA(0,1,1) : 1811.528
- ## ARIMA(0,1,3) : 1800.15
- ## ARIMA(1,1,3) : 1802.362
- ##
- ## Best model: ARIMA(0,1,2)
- ## Series: data
- ## ARIMA(0,1,2)
- ##
- ## Coefficients:
- ## ma1 ma2
- ## -0.4065 -0.3744
- ## s.e. 0.0964 0.0958
- ##
- ## sigma^2 estimated as 4222524: log likelihood=-896.07
- ## AIC=1798.14 AICc=1798.39 BIC=1805.92
- #最佳模型为arima(0, 1, 2)
-
- #2)方法二、通过R中的“acf()”和“pacf”函数来做判断。
- #①查看自相关图
- #acf(data_diff1, lag.max = 100, plot = F)
- #Error in na.fail.default(as.ts(x)) : 对象里有遺漏值 因为第一值为NA
-
- data_diff <- data_diff1[2:100]
- acf(data_diff, lag.max = 100)
- #自相关图衰减趋于0
-
- #②查看偏自相关图
- pacf(data_diff, lag.max = 100)
- #偏相关图滞后7阶后为0
-
- #因此我们的arima模型为arima(7,1,0)
-
- #5、创建ARIMA时间序列模型
- #1)arima(0, 1, 2)模型
- (data_arima1 <- arima(data, order = c(0, 1, 2), seasonal = list(order=c(1, 1, 0), period=7)))
- ##
- ## Call:
- ## arima(x = data, order = c(0, 1, 2), seasonal = list(order = c(1, 1, 0), period = 7))
- ##
- ## Coefficients:
- ## ma1 ma2 sar1
- ## -0.2362 -0.1553 -0.5600
- ## s.e. 0.1075 0.1028 0.0838
- ##
- ## sigma^2 estimated as 1850149: log likelihood = -795.75, aic = 1599.49
- #2)arima(7, 1, 0)模型
- (data_arima <- arima(data, order=c(7, 1, 0), seasonal = list(order=c(1, 1, 0), period=7)))
- ##
- ## Call:
- ## arima(x = data, order = c(7, 1, 0), seasonal = list(order = c(1, 1, 0), period = 7))
- ##
- ## Coefficients:
- ## ar1 ar2 ar3 ar4 ar5 ar6 ar7 sar1
- ## -0.2829 -0.2128 -0.0180 0.0606 0.3164 0.0415 -0.0883 -0.5075
- ## s.e. 0.1040 0.1076 0.1076 0.1048 0.1062 0.1066 0.1520 0.1449
- ##
- ## sigma^2 estimated as 1589493: log likelihood = -789.34, aic = 1596.68
- #6、预测:预测一周的值
- library(forecast)
- forecast <- forecast.Arima(data_arima, h=7, level=c(99.5))
- plot.forecast(forecast)
- #7、检验
- #1)做tsdiag检验
- tsdiag(data_arima)
- #Acf检验说明:残差没有明显的自相关性,Ljung-Box测试显示:所有的P-value>0.5,??说明残差为白噪声
-
- #2)判断预测误差是否是平均值为零且方差为常数的正态分布
- #为了调查预测误差是否是平均值为零且方差为常数的正态分布(服从零均值、方差不变的正态分布),我们可以做预测误差的时间曲线图和直方图(具有正态分布曲线)
- plot.ts(forecast$residuals)
- #时间序列:判断预测误差的方差是正态分布的函数
- plotForecastErrors(forecast$residuals)
#观察上图可见,预测的误差呈现均值为0 ,方差恒定的正态分布,可见预测算法无法改进了。
- #1、加载数据:共有132条数据,是以月为单位的销售数据
- airline <- read.table("F:\\R\\Rworkspace\\R时间序列教程/airline.txt")
-
- #将数据转化为时间序列格式(ts):由于将数据转化为时间序列格式,我们并不需要时间字段,因此只取airline数据的第二列,即销售数据,又因为该数据是以月为单位的,因此Period是12
- airline2 <- airline[2]
- air_ts <- ts(airline2, start=1, freq=12)
-
- #查看趋势图
- plot.ts(air_ts)
- #由图可见,该序列还不平稳
-
- #2、先做一次Log平滑,再做一次差分
- air_log <- log(air_ts)
- air_diff <- diff(air_log, differences=1)
- plot(air_diff)
- #这次看上去就比较平稳
-
- #3、查看acf和pacf
- acf(air_diff, lag.max = 30)
acf(air_diff, lag.max = 30, plot = F)
- ##
- ## Autocorrelations of series 'air_diff', by lag
- ##
- ## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
- ## 1.000 0.188 -0.127 -0.154 -0.326 -0.066 0.041 -0.098 -0.343 -0.109
- ## 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833
- ## -0.120 0.199 0.833 0.198 -0.143 -0.110 -0.288 -0.046 0.036 -0.104
- ## 1.6667 1.7500 1.8333 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167
- ## -0.313 -0.106 -0.085 0.185 0.714 0.175 -0.126 -0.077 -0.214 -0.046
- ## 2.5000
- ## 0.029
pacf(air_diff, lag.max = 30)
pacf(air_diff, lag.max = 30, plot = F)
- ##
- ## Partial autocorrelations of series 'air_diff', by lag
- ##
- ## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
- ## 0.188 -0.169 -0.101 -0.317 0.018 -0.072 -0.199 -0.509 -0.171 -0.553
- ## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
- ## -0.300 0.551 0.010 -0.200 0.164 -0.052 -0.037 -0.108 0.094 0.005
- ## 1.7500 1.8333 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167 2.5000
- ## -0.095 -0.001 0.057 -0.074 -0.048 0.024 0.073 0.047 0.010 0.033
- #从ACF和PACF可以看出来,该序列在lag=12和lag=24处有明显的spike,说明该序列需要再做一次diff=12的差分。且PACF比ACF呈现更明显的指数平滑的趋势,因此先猜测ARIMA模型为:ARIMA(0,1,1)(0,1,1)[12].
-
- #4利用auto.arima:获取最佳的ARIMA模型
- auto.arima(air_log, trace=T)
- ##
- ## ARIMA(2,1,2)(1,1,1)[12] : -354.4719
- ## ARIMA(0,1,0)(0,1,0)[12] : -316.8213
- ## ARIMA(1,1,0)(1,1,0)[12] : -356.4353
- ## ARIMA(0,1,1)(0,1,1)[12] : -359.7679
- ## ARIMA(0,1,1)(1,1,1)[12] : -354.9069
- ## ARIMA(0,1,1)(0,1,0)[12] : -327.5759
- ## ARIMA(0,1,1)(0,1,2)[12] : -357.6861
- ## ARIMA(0,1,1)(1,1,2)[12] : -363.2418
- ## ARIMA(1,1,1)(1,1,2)[12] : -359.6535
- ## ARIMA(0,1,0)(1,1,2)[12] : -346.1537
- ## ARIMA(0,1,2)(1,1,2)[12] : -361.1765
- ## ARIMA(1,1,2)(1,1,2)[12] : Inf
- ## ARIMA(0,1,1)(2,1,2)[12] : -368.8244
- ## ARIMA(0,1,1)(2,1,1)[12] : -368.1761
- ## ARIMA(1,1,1)(2,1,2)[12] : -367.0903
- ## ARIMA(0,1,0)(2,1,2)[12] : -363.7024
- ## ARIMA(0,1,2)(2,1,2)[12] : -366.6877
- ## ARIMA(1,1,2)(2,1,2)[12] : Inf
- ## Warning in auto.arima(air_log, trace = T): Unable to fit final model using
- ## maximum likelihood. AIC value approximated
- ##
- ##
- ## Best model: ARIMA(0,1,1)(2,1,2)[12]
- ## Series: air_log
- ## ARIMA(0,1,1)(2,1,2)[12]
- ##
- ## Coefficients:
- ## ma1 sar1 sar2 sma1 sma2
- ## -0.2710 -0.4764 -0.1066 -0.0098 -0.1987
- ## s.e. 0.0995 0.1432 0.1087 0.1567 0.1130
- ##
- ## sigma^2 estimated as 0.001188: log likelihood=231.88
- ## AIC=-369.57 AICc=-368.82 BIC=-352.9
- #最佳模型为:ARIMA(0,1,1)(2,1,2)[12]
-
- #5、参数估计:对两个模型进行参数比较
- (air_arima1 <- arima(air_log, order=c(0,1,1), seasonal=list(order=c(0,1,1), period=12), method="ML"))
- ##
- ## Call:
- ## arima(x = air_log, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12),
- ## method = "ML")
- ##
- ## Coefficients:
- ## ma1 sma1
- ## -0.3484 -0.5622
- ## s.e. 0.0943 0.0774
- ##
- ## sigma^2 estimated as 0.001313: log likelihood = 223.63, aic = -441.26
(air_arima2 <- arima(air_log, order=c(0,1,1), seasonal=list(order=c(2,1,2), period=12), method="ML"))
- ##
- ## Call:
- ## arima(x = air_log, order = c(0, 1, 1), seasonal = list(order = c(2, 1, 2), period = 12),
- ## method = "ML")
- ##
- ## Coefficients:
- ## ma1 sar1 sar2 sma1 sma2
- ## -0.3547 1.0601 -0.1165 -1.9158 0.9964
- ## s.e. 0.0995 0.1090 0.1812 0.3942 0.3877
- ##
- ## sigma^2 estimated as 0.0009784: log likelihood = 225.56, aic = -439.12
- #两个ARIMA模型都采用极大似然方法估计,计算系数对应的t值:
- #ARIMA(0,1,1)(0,1,1)[12] :t(ma1)=-39.1791, t(sma1)=-93.8445
- #ARIMA(0,1,1)(2,1,2)[12] : t(ma1)=-35.8173,t(sar1)=88.68383,t(sar2)=-3.56141,t(sma1)=-12.6615,t(sma2)= 6.855526
-
- #可见两个模型的系数都是显著的,而ARIMA(0,1,1)(0,1,1)[12]的AIC和BIC比ARIMA(0,1,1)(2,1,2)[12]的要小,因此选择模型ARIMA(0,1,1)(0,1,1)[12]。
-
- #6、预测五年后航空公司的销售额
- (air_forecast <- forecast.Arima(air_arima1, h=5, level=c(99.5)))
- ## Point Forecast Lo 99.5 Hi 99.5
- ## Jan 12 6.038649 5.936951 6.140348
- ## Feb 12 5.988762 5.867380 6.110143
- ## Mar 12 6.145428 6.007137 6.283719
- ## Apr 12 6.118993 5.965646 6.272340
- ## May 12 6.159657 5.992605 6.326709
plot.forecast(air_forecast)
- #7、检验
- #1)tsdiag检验误差的自相关性
- tsdiag(air_arima1)
- #检验预测误差是否是均值为0,方差为常数的正态分布
- plot.ts(forecast$residuals)
plotForecastErrors(air_forecast$residuals)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。