赞
踩
ARIMA 模型为平稳时间序列定义的。 因此, 如果你从一个非平稳的时间序列开始, 首先你就需要做时间序列差分直到你得到一个平稳时间序列。如果你必须对时间序列做 d 阶差分才能得到一个平稳序列,那么你就使用ARIMA(p,d,q)模型,其中 d 是差分的阶数。
-
#1、加载数据
-
skirts
<-
scan
(
"http://robjhyndman.com/tsdldata/roberts/skirts.dat",
skip
=
5
)
-
str
(
skirts
)
## num [1:46] 608 617 625 636 657 691 728 784 816 876 ...
-
#从上可知:数据集为一个数字类型的向量
-
-
#2、把数据转化为是时间序列
-
(
skirts_ts
<-
ts
(
skirts,
start
=
c
(
1886
),
frequency
=
1
)
)
-
## Time Series:
-
## Start = 1886
-
## End = 1931
-
## Frequency = 1
-
## [1] 608 617 625 636 657 691 728 784 816 876 949 997 1027 1047
-
## [15] 1049 1018 1021 1012 1018 991 962 921 871 829 822 820 802 821
-
## [29] 819 791 746 726 661 620 588 568 542 551 541 557 556 534
-
## [43] 528 529 523 531
-
#1)查看时间序列对应的时间
-
time
(
skirts_ts
)
-
## Time Series:
-
## Start = 1886
-
## End = 1931
-
## Frequency = 1
-
## [1] 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899
-
## [15] 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913
-
## [29] 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927
-
## [43] 1928 1929 1930 1931
-
#2)画出时间序列图
-
plot.ts
(
skirts_ts
)
-
#从上图可知:女人裙子边缘的直径做成的时间序列数据,从 1866 年到 1911 年在平均值上是不平稳的, 随着时间增加, 数值变化很大。
-
-
#3、确定时间序列的差分参数d:做时间序列的一阶差分, 并画出差分序列的图
-
skirts_diff
<-
diff
(
skirts_ts,
differences
=
1
)
-
plot.ts
(
skirts_diff
)
-
#从一阶差分的图中可以看出,数据仍是不平稳的,继续差分
-
-
#做时间序列的二阶差分
-
skirts_diff2
<-
diff
(
skirts_ts,
differences
=
2
)
-
plot.ts
(
skirts_diff2
)
-
#二次差分(上面)后的时间序列在均值和方差上确实看起来像是平稳的, 随着时间推移, 时间序列的水平和方差大致保持不变。因此, 看起来我们需要对裙子直径进行两次差分以得到平稳序列。
-
-
#4、找到合适的ARIMA模型:ARIMA要求时间序列是平稳的(差分来获得)
-
-
#如果你的时间序列是平稳的,或者你通过做 n 次差分转化为一个平稳时间序列, 接下来就是要选择合适的 ARIMA模型,这意味着需要寻找 ARIMA(p,d,q)中合适的 p 值和 q 值。为了得到这些,通常需要检查平稳时间序列的(自)相关图和偏相关图。
-
-
#方法一、使用auto.arima()函数,自动获取最佳的ARIMA模型
-
library
(
forecast
)
## Loading required package: zoo
-
##
-
## Attaching package: 'zoo'
-
## The following objects are masked from 'package:base':
-
##
-
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 6.2
auto.arima(skirts_ts, ic=c("aicc", "aic", "bic"), trace = T)
-
##
-
## ARIMA(2,2,2) : Inf
-
## ARIMA(0,2,0) : 393.6216
-
## ARIMA(1,2,0) : 391.6212
-
## ARIMA(0,2,1) : 392.0664
-
## ARIMA(2,2,0) : 393.9273
-
## ARIMA(1,2,1) : 393.9276
-
## ARIMA(2,2,1) : Inf
-
##
-
## Best model: ARIMA(1,2,0)
-
## Series: skirts_ts
-
## ARIMA(1,2,0)
-
##
-
## Coefficients:
-
## ar1
-
## -0.2997
-
## s.e. 0.1424
-
##
-
## sigma^2 estimated as 388.7: log likelihood=-193.66
-
## AIC=391.33 AICc=391.62 BIC=394.9
-
#最优模型为arima(1, 2, 0)
-
-
#方法二、使用 R 中的“acf()”和“pacf” 函数来分别( 自) 相关图和偏相关图。“acf()”和“pacf 设定“plot=FALSE” 来得到自相关和偏相关的真实值
-
#1)相关图
-
acf
(
skirts_diff2,
lag.max
=
20
)
acf(skirts_diff2, lag.max = 20, plot=F)
-
##
-
## Autocorrelations of series 'skirts_diff2', by lag
-
##
-
## 0 1 2 3 4 5 6 7 8 9
-
## 1.000 -0.303 0.096 0.009 0.102 -0.453 0.173 -0.025 -0.039 0.073
-
## 10 11 12 13 14 15 16 17 18 19
-
## -0.094 0.133 -0.089 -0.027 -0.102 0.207 -0.260 0.114 0.101 0.011
-
## 20
-
## -0.090
-
#自相关图显示滞后1阶自相关值基本没有超过边界值,虽然5阶自相关值超出边界,那么很可能属于偶然出现的,而自相关值在其他上都没有超出显著边界, 而且我们可以期望 1 到 20 之间的会偶尔超出 95%的置信边界。 自相关图5阶后结尾
-
-
#2)偏自相关图
-
pacf
(
skirts_diff2,
lag.max
=
20
)
pacf(skirts_diff2, lag.max = 20, plot=F)
-
##
-
## Partial autocorrelations of series 'skirts_diff2', by lag
-
##
-
## 1 2 3 4 5 6 7 8 9 10
-
## -0.303 0.005 0.043 0.128 -0.439 -0.110 0.073 0.028 0.128 -0.355
-
## 11 12 13 14 15 16 17 18 19 20
-
## 0.095 0.052 -0.094 -0.103 -0.034 -0.021 -0.002 0.074 0.020 -0.034
-
#偏自相关值选1阶后结尾。故我们的ARMIA模型为armia(1,2,5)
-
-
#5、建立ARIMA模型:并对比arima(1, 2, 0)与arima(1, 2, 5)模型
-
#1)arima(1, 2, 0)模型
-
(
skirts_arima
<-
arima
(
skirts_ts,
order
=
c
(
1,
2,
0
)
)
)
-
##
-
## Call:
-
## arima(x = skirts_ts, order = c(1, 2, 0))
-
##
-
## Coefficients:
-
## ar1
-
## -0.2997
-
## s.e. 0.1424
-
##
-
## sigma^2 estimated as 388.7: log likelihood = -193.66, aic = 391.33
-
#)arima(1, 2, 5)模型
-
(
skirts_arima
<-
arima
(
skirts_ts,
order
=
c
(
1,
2,
5
)
)
)
-
##
-
## Call:
-
## arima(x = skirts_ts, order = c(1, 2, 5))
-
##
-
## Coefficients:
-
## ar1 ma1 ma2 ma3 ma4 ma5
-
## -0.4345 0.2762 0.1033 0.1472 0.0267 -0.8384
-
## s.e. 0.1837 0.2171 0.2198 0.2716 0.1904 0.2888
-
##
-
## sigma^2 estimated as 206.1: log likelihood = -183.8, aic = 381.6
-
#按AIC和SBC标准计算的统计量(9.496798)和(18.54752),这两个值都较小,表明对预测模型拟合得较好
-
#AIC是赤池消息准则SC是施瓦茨准则,当两个数值最小时,则是最优滞后分布的长度。我们进行模型选择时,AIC值越小越好。
-
-
#从上可知:arima(1, 2, 0)模型的aic值比arima(1, 2, 5)模型的AIC值大,所以arima(1, 2, 5)模型较好
-
-
#6、预测:预测5年后裙子的边缘直径
-
library
(
foreach
)
-
(
skirts_forecast
<-
forecast.Arima
(
skirts_arima,
h
=
5,
level
=
c
(
99.5
)
)
)
-
##
Point
Forecast
Lo 99
.5
Hi 99
.5
-
## 1932 548
.5762 507
.1167 590
.0357
-
## 1933 545
.1793 459
.3292 631
.0295
-
## 1934 540
.9354 396
.3768 685
.4940
-
## 1935 531
.8838 316
.2785 747
.4892
-
## 1936 529
.1296 233
.2625 824
.9968
-
#画出预测的图形
-
plot.forecast
(
skirts_forecast
)
-
#7、检验
-
#在指数平滑模型下, 观察 ARIMA 模型的预测误差是否是平均值为 0 且方差为常数的正态分布(服从零均值、方差不变的正态分布) 是个好主意,同时也要观察连续预测误差是否(自)相关
-
#1)检验预测误差的自相关性
-
#方法一、做tsdiag检验
-
tsdiag
(
skirts_arima
)
-
#Acf检验说明:残差没有明显的自相关性,Ljung-Box测试显示:所有的P-value>0.05,??说明残差为白噪声。
-
-
#第一个图表代表估计模型误差的绘图。英文叫做Standardized Residuals, 上面有很多竖线在横向坐标的上下分布。如果这个估计的模型比较可信,竖线的长度是比较相似的。如果竖线的长度互相有很大出入或者根本就不同,估计模型的可信度就非常差。下面误差绘图中竖线的长度比较相似,都处在稳定范围之内,即估计的模型没产生不符合要求的误差分布。??
-
-
#再介绍输出的第二张绘图,标题是ACF of Residuals。ACF指数据点相互之间的关系,当然在生成这个数据时,数据点之间互相独立,并不存在任何关系。所以在这张图上,只有位于0刻度上的竖线最高,其ACF值为1。 这个0代表数据点与自己相比较, 即数据点永远和它自己有关系,这种关系数值为1。其他横向数轴上的刻度代表一个数据点于其他数据点之间的关系,这些刻度上竖线的长度几乎等于0,即这个数据点与其他数据点没明显关系。这张ACF图代表估计的模型没造成误差之间的任何关系。这是符合数据生成时每个数据都是独立的这个前提的。由此可见,这ACF图符合检测要求。
-
-
#下面来介绍第三张图,也就是Ljung-Box 指标。这个指标可对每一个时间序列的延迟进行显著性的评估。这张图的横坐标代表时间序列的延迟,纵坐标代表P-value,即显著性。如果P-value十分小,就说明在其相对应的延迟点上是显著的。我们就需要抛弃所假设的模型,并且结论在所假设的模型不可信。需要注意的是,他们使用假设的模型对一个时间序列进行估计,如果P-value是显著的话,我们使用的模型就不可信,需要尝试其他新模型。具体判定技巧是,P-value点的高度越高,我们的模型越可信。
-
-
#方法二、判断预测误差的(自)相关性
-
acf
(
skirts_forecast
$
residuals,
lag.max
=
20
)
Box.test(skirts_forecast$residuals, lag=20, type = "Ljung-Box")
-
##
-
## Box-Ljung test
-
##
-
## data: skirts_forecast$residuals
-
## X-squared = 8.5974, df = 20, p-value = 0.9871
-
#既然相关图显示出在滞后1??-??20阶(??l??a??g??s??1??-??20??)中样本自相关值都没有超出显著(置信)边界,而且Ljung-Box检验的p值为0.99,所以我们推断在滞后1-20阶(lags1-20)中没明显证据说明预测误差是非零自相关的。??
-
-
#2)判断预测误差是否是平均值为零且方差为常数的正态分布
-
#为了调查预测误差是否是平均值为零且方差为常数的正态分布(服从零均值、方差不变的正态分布),我们可以做预测误差的时间曲线图和直方图(具有正态分布曲线)
-
-
# 预测误差的均值是否为0
-
plot.ts
(
skirts_forecast
$
residuals
)
-
#自定义判断预测误差的方差是正态分布的函数
-
plotForecastErrors
<-
function
(
forecasterrors
)
{
-
#画预测误差的直方图
-
hist
(
forecasterrors,
col
=
"red",
freq
=
F
)
-
#??freq=FALSE??ensures??the??area??under??the??histogram??=??1
-
#??generate??normally??distributed??data??with??mean??0??and??standard??deviation??mysd
-
mysd
<-
sd
(
forecasterrors
)
-
-
myhist
<-
hist
(
mynorm,
plot
=
F
)
-
#??plot??the??normal??curve??as??a??blue??line??on??top??of??the??histogram??of??forecast??errors:
-
points
(
myhist
$
mids,
myhist
$
density,
type
=
"l",
col
=
"blue",
lwd
=
2
)
-
}
-
plotForecastErrors
(
skirts_forecast
$
residuals
)
-
#上图预测中的时间曲线图显示出对着时间增加,方差大致为常数(大致不变)(尽管上半部分的时间序列方差看起来稍微高一些)。时间序列的直方图显示预测误大致是正态分布的且平均值接近于 0(服从零均值的正态分布的)。因此,把预测误差看作平均值为0方差为常数正态分布(服从零均值、方差不变的正态分布)是合理的。??
-
-
#既然依次连续的预测误差看起来不是相关,而且看起来是平均值为 0 方差为常数的正态分布(服从零均值、方差不变的正态分布),那么对于裙子直径的数据, ARIMA(1,2,5)看起来是可以提供非常合适预测的模型。
-
#1、读取数据
-
source
<-
c
(
10930,
10318,
10595,
10972,
7706,
6756,
9092,
10551,
9722,
10913,
11151,
8186,
6422,
-
6337,
11649,
11652,
10310,
12043,
7937,
6476,
9662,
9570,
9981,
9331,
9449,
6773,
6304,
9355,
10477,
-
10148,
10395,
11261,
8713,
7299,
10424,
10795,
11069,
11602,
11427,
9095,
7707,
10767,
12136,
12812,
-
12006,
12528,
10329,
7818,
11719,
11683,
12603,
11495,
13670,
11337,
10232,
13261,
13230,
15535,
-
16837,
19598,
14823,
11622,
19391,
18177,
19994,
14723,
15694,
13248,
9543,
12872,
13101,
15053,
-
12619,
13749,
10228,
9725,
14729,
12518,
14564,
15085,
14722,
11999,
9390,
13481,
14795,
15845,
-
15271,
14686,
11054,
10395,
14775,
14618,
16029,
15231,
14246,
12095,
10473,
15323,
15381,
14947
)
-
-
#2、利用xts包中的xts()函数,把数据转化为周期性(7天)的时间序列
-
library
(
xts
)
-
data
<-
xts
(
source,
seq
(
as.POSIXct
(
"2014-01-01"
),
len
=
length
(
source
),
by
=
"day"
)
)
-
-
plot
(
data
)
head(data)
-
## [,1]
-
## 2014-01-01 10930
-
## 2014-01-02 10318
-
## 2014-01-03 10595
-
## 2014-01-04 10972
-
## 2014-01-05 7706
-
## 2014-01-06 6756
time(data)
-
## [1] "2014-01-01 CST" "2014-01-02 CST" "2014-01-03 CST" "2014-01-04 CST"
-
## [5] "2014-01-05 CST" "2014-01-06 CST" "2014-01-07 CST" "2014-01-08 CST"
-
## [9] "2014-01-09 CST" "2014-01-10 CST" "2014-01-11 CST" "2014-01-12 CST"
-
## [13] "2014-01-13 CST" "2014-01-14 CST" "2014-01-15 CST" "2014-01-16 CST"
-
## [17] "2014-01-17 CST" "2014-01-18 CST" "2014-01-19 CST" "2014-01-20 CST"
-
## [21] "2014-01-21 CST" "2014-01-22 CST" "2014-01-23 CST" "2014-01-24 CST"
-
## [25] "2014-01-25 CST" "2014-01-26 CST" "2014-01-27 CST" "2014-01-28 CST"
-
## [29] "2014-01-29 CST" "2014-01-30 CST" "2014-01-31 CST" "2014-02-01 CST"
-
## [33] "2014-02-02 CST" "2014-02-03 CST" "2014-02-04 CST" "2014-02-05 CST"
-
## [37] "2014-02-06 CST" "2014-02-07 CST" "2014-02-08 CST" "2014-02-09 CST"
-
## [41] "2014-02-10 CST" "2014-02-11 CST" "2014-02-12 CST" "2014-02-13 CST"
-
## [45] "2014-02-14 CST" "2014-02-15 CST" "2014-02-16 CST" "2014-02-17 CST"
-
## [49] "2014-02-18 CST" "2014-02-19 CST" "2014-02-20 CST" "2014-02-21 CST"
-
## [53] "2014-02-22 CST" "2014-02-23 CST" "2014-02-24 CST" "2014-02-25 CST"
-
## [57] "2014-02-26 CST" "2014-02-27 CST" "2014-02-28 CST" "2014-03-01 CST"
-
## [61] "2014-03-02 CST" "2014-03-03 CST" "2014-03-04 CST" "2014-03-05 CST"
-
## [65] "2014-03-06 CST" "2014-03-07 CST" "2014-03-08 CST" "2014-03-09 CST"
-
## [69] "2014-03-10 CST" "2014-03-11 CST" "2014-03-12 CST" "2014-03-13 CST"
-
## [73] "2014-03-14 CST" "2014-03-15 CST" "2014-03-16 CST" "2014-03-17 CST"
-
## [77] "2014-03-18 CST" "2014-03-19 CST" "2014-03-20 CST" "2014-03-21 CST"
-
## [81] "2014-03-22 CST" "2014-03-23 CST" "2014-03-24 CST" "2014-03-25 CST"
-
## [85] "2014-03-26 CST" "2014-03-27 CST" "2014-03-28 CST" "2014-03-29 CST"
-
## [89] "2014-03-30 CST" "2014-03-31 CST" "2014-04-01 CST" "2014-04-02 CST"
-
## [93] "2014-04-03 CST" "2014-04-04 CST" "2014-04-05 CST" "2014-04-06 CST"
-
## [97] "2014-04-07 CST" "2014-04-08 CST" "2014-04-09 CST" "2014-04-10 CST"
-
#3、确定时间序列的差分参数d
-
#1)做一阶差分
-
data_diff1
<-
diff
(
data,
differences
=
1
)
-
plot
(
data_diff1
)
str(data_diff1)
-
## An
'xts'
object on
2014
-01
-01/
2014
-04
-10 containing:
-
## Data: num [
1:
100,
1] NA
-612
277
377
-3266 ...
-
## Indexed
by objects of
class: [POSIXct,POSIXt] TZ:
-
## xts Attributes:
-
## NULL
-
#第一值为NA
-
-
#2)做时间序列的二阶差分
-
data_diff2
<-
diff
(
data,
differences
=
2
)
-
plot
(
data_diff2
)
str(data_diff2)
-
## An
'xts'
object on
2014
-01
-01/
2014
-04
-10 containing:
-
## Data: num [
1:
100,
1] NA NA
889
100
-3643 ...
-
## Indexed
by objects of
class: [POSIXct,POSIXt] TZ:
-
## xts Attributes:
-
## NULL
-
#第一个和第二值为NA
-
-
#可以看出一次差分后的时间序列在均值和方差上看起来像是平稳的,与二次差分的图形相差不大,随着时间推移,时间序列大致保持不变,因此设置差分项d=1。
-
-
#4、接下来需要选择合适的ARIMA模型,即确定ARIMA(p,d,q)中合适的 p、q 值
-
#1)方法一、使用auto.arima()自动获取最佳的ARIMA模型
-
auto.arima
(
data,
trace
=
T
)
-
##
-
## ARIMA(2,1,2) with drift : Inf
-
## ARIMA(0,1,0) with drift : 1819.931
-
## ARIMA(1,1,0) with drift : 1819.509
-
## ARIMA(0,1,1) with drift : 1813.218
-
## ARIMA(0,1,0) : 1817.877
-
## ARIMA(1,1,1) with drift : Inf
-
## ARIMA(0,1,2) with drift : 1799.305
-
## ARIMA(1,1,3) with drift : Inf
-
## ARIMA(0,1,2) : 1798.392
-
## ARIMA(1,1,2) : 1800.277
-
## ARIMA(0,1,1) : 1811.528
-
## ARIMA(0,1,3) : 1800.15
-
## ARIMA(1,1,3) : 1802.362
-
##
-
## Best model: ARIMA(0,1,2)
-
## Series: data
-
## ARIMA(0,1,2)
-
##
-
## Coefficients:
-
## ma1 ma2
-
## -0.4065 -0.3744
-
## s.e. 0.0964 0.0958
-
##
-
## sigma^2 estimated as 4222524: log likelihood=-896.07
-
## AIC=1798.14 AICc=1798.39 BIC=1805.92
-
#最佳模型为arima(0, 1, 2)
-
-
#2)方法二、通过R中的“acf()”和“pacf”函数来做判断。
-
#①查看自相关图
-
#acf(data_diff1, lag.max = 100, plot = F)
-
#Error in na.fail.default(as.ts(x)) : 对象里有遺漏值 因为第一值为NA
-
-
data_diff
<-
data_diff1
[
2
:
100
]
-
acf
(
data_diff,
lag.max
=
100
)
-
#自相关图衰减趋于0
-
-
#②查看偏自相关图
-
pacf
(
data_diff,
lag.max
=
100
)
-
#偏相关图滞后7阶后为0
-
-
#因此我们的arima模型为arima(7,1,0)
-
-
#5、创建ARIMA时间序列模型
-
#1)arima(0, 1, 2)模型
-
(
data_arima1
<-
arima
(
data,
order
=
c
(
0,
1,
2
),
seasonal
=
list
(
order
=
c
(
1,
1,
0
),
period
=
7
)
)
)
-
##
-
## Call:
-
## arima(x = data, order = c(0, 1, 2), seasonal = list(order = c(1, 1, 0), period = 7))
-
##
-
## Coefficients:
-
## ma1 ma2 sar1
-
## -0.2362 -0.1553 -0.5600
-
## s.e. 0.1075 0.1028 0.0838
-
##
-
## sigma^2 estimated as 1850149: log likelihood = -795.75, aic = 1599.49
-
#2)arima(7, 1, 0)模型
-
(
data_arima
<-
arima
(
data
,
order
=
c
(
7
,
1
,
0
)
,
seasonal
=
list
(
order
=
c
(
1
,
1
,
0
)
,
period
=
7
)
)
)
-
##
-
## Call:
-
## arima(x = data, order = c(7, 1, 0), seasonal = list(order = c(1, 1, 0), period = 7))
-
##
-
## Coefficients:
-
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 sar1
-
## -0.2829 -0.2128 -0.0180 0.0606 0.3164 0.0415 -0.0883 -0.5075
-
## s.e. 0.1040 0.1076 0.1076 0.1048 0.1062 0.1066 0.1520 0.1449
-
##
-
## sigma^2 estimated as 1589493: log likelihood = -789.34, aic = 1596.68
-
#6、预测:预测一周的值
-
library
(
forecast
)
-
forecast
<-
forecast.Arima
(
data_arima,
h
=
7,
level
=
c
(
99.5
)
)
-
plot.forecast
(
forecast
)
-
#7、检验
-
#1)做tsdiag检验
-
tsdiag
(
data_arima
)
-
#Acf检验说明:残差没有明显的自相关性,Ljung-Box测试显示:所有的P-value>0.5,??说明残差为白噪声
-
-
#2)判断预测误差是否是平均值为零且方差为常数的正态分布
-
#为了调查预测误差是否是平均值为零且方差为常数的正态分布(服从零均值、方差不变的正态分布),我们可以做预测误差的时间曲线图和直方图(具有正态分布曲线)
-
plot.ts
(
forecast
$
residuals
)
-
#时间序列:判断预测误差的方差是正态分布的函数
-
plotForecastErrors
(
forecast
$
residuals
)
#观察上图可见,预测的误差呈现均值为0 ,方差恒定的正态分布,可见预测算法无法改进了。
-
#1、加载数据:共有132条数据,是以月为单位的销售数据
-
airline
<-
read.table
(
"F:\\R\\Rworkspace\\R时间序列教程/airline.txt"
)
-
-
#将数据转化为时间序列格式(ts):由于将数据转化为时间序列格式,我们并不需要时间字段,因此只取airline数据的第二列,即销售数据,又因为该数据是以月为单位的,因此Period是12
-
airline2
<-
airline
[
2
]
-
air_ts
<-
ts
(
airline2,
start
=
1,
freq
=
12
)
-
-
#查看趋势图
-
plot.ts
(
air_ts
)
-
#由图可见,该序列还不平稳
-
-
#2、先做一次Log平滑,再做一次差分
-
air_log
<-
log
(
air_ts
)
-
air_diff
<-
diff
(
air_log,
differences
=
1
)
-
plot
(
air_diff
)
-
#这次看上去就比较平稳
-
-
#3、查看acf和pacf
-
acf
(
air_diff,
lag.max
=
30
)
acf(air_diff, lag.max = 30, plot = F)
-
##
-
## Autocorrelations of series 'air_diff', by lag
-
##
-
## 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
-
## 1.000 0.188 -0.127 -0.154 -0.326 -0.066 0.041 -0.098 -0.343 -0.109
-
## 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833
-
## -0.120 0.199 0.833 0.198 -0.143 -0.110 -0.288 -0.046 0.036 -0.104
-
## 1.6667 1.7500 1.8333 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167
-
## -0.313 -0.106 -0.085 0.185 0.714 0.175 -0.126 -0.077 -0.214 -0.046
-
## 2.5000
-
## 0.029
pacf(air_diff, lag.max = 30)
pacf(air_diff, lag.max = 30, plot = F)
-
##
-
## Partial autocorrelations of series 'air_diff', by lag
-
##
-
## 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333
-
## 0.188 -0.169 -0.101 -0.317 0.018 -0.072 -0.199 -0.509 -0.171 -0.553
-
## 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333 1.4167 1.5000 1.5833 1.6667
-
## -0.300 0.551 0.010 -0.200 0.164 -0.052 -0.037 -0.108 0.094 0.005
-
## 1.7500 1.8333 1.9167 2.0000 2.0833 2.1667 2.2500 2.3333 2.4167 2.5000
-
## -0.095 -0.001 0.057 -0.074 -0.048 0.024 0.073 0.047 0.010 0.033
-
#从ACF和PACF可以看出来,该序列在lag=12和lag=24处有明显的spike,说明该序列需要再做一次diff=12的差分。且PACF比ACF呈现更明显的指数平滑的趋势,因此先猜测ARIMA模型为:ARIMA(0,1,1)(0,1,1)[12].
-
-
#4利用auto.arima:获取最佳的ARIMA模型
-
auto.arima
(
air_log,
trace
=
T
)
-
##
-
##
ARIMA(2,1,2)(1,1,1)
[12] :
-354
.4719
-
##
ARIMA(0,1,0)(0,1,0)
[12] :
-316
.8213
-
##
ARIMA(1,1,0)(1,1,0)
[12] :
-356
.4353
-
##
ARIMA(0,1,1)(0,1,1)
[12] :
-359
.7679
-
##
ARIMA(0,1,1)(1,1,1)
[12] :
-354
.9069
-
##
ARIMA(0,1,1)(0,1,0)
[12] :
-327
.5759
-
##
ARIMA(0,1,1)(0,1,2)
[12] :
-357
.6861
-
##
ARIMA(0,1,1)(1,1,2)
[12] :
-363
.2418
-
##
ARIMA(1,1,1)(1,1,2)
[12] :
-359
.6535
-
##
ARIMA(0,1,0)(1,1,2)
[12] :
-346
.1537
-
##
ARIMA(0,1,2)(1,1,2)
[12] :
-361
.1765
-
##
ARIMA(1,1,2)(1,1,2)
[12] :
Inf
-
##
ARIMA(0,1,1)(2,1,2)
[12] :
-368
.8244
-
##
ARIMA(0,1,1)(2,1,1)
[12] :
-368
.1761
-
##
ARIMA(1,1,1)(2,1,2)
[12] :
-367
.0903
-
##
ARIMA(0,1,0)(2,1,2)
[12] :
-363
.7024
-
##
ARIMA(0,1,2)(2,1,2)
[12] :
-366
.6877
-
##
ARIMA(1,1,2)(2,1,2)
[12] :
Inf
-
## Warning in
auto.arima(air_log, trace = T): Unable to fit final model
using
-
#
# maximum likelihood. AIC value approximated
-
##
-
##
-
## Best model: ARIMA(0,1,1)(2,1,2)[12]
-
## Series: air_log
-
## ARIMA(0,1,1)(2,1,2)[12]
-
##
-
## Coefficients:
-
## ma1 sar1 sar2 sma1 sma2
-
## -0.2710 -0.4764 -0.1066 -0.0098 -0.1987
-
## s.e. 0.0995 0.1432 0.1087 0.1567 0.1130
-
##
-
## sigma^2 estimated as 0.001188: log likelihood=231.88
-
## AIC=-369.57 AICc=-368.82 BIC=-352.9
-
#最佳模型为:ARIMA(0,1,1)(2,1,2)[12]
-
-
#5、参数估计:对两个模型进行参数比较
-
(
air_arima1
<-
arima
(
air_log
,
order
=
c
(
0
,
1
,
1
)
,
seasonal
=
list
(
order
=
c
(
0
,
1
,
1
)
,
period
=
12
)
,
method
=
"ML"
)
)
-
##
-
## Call:
-
## arima(x = air_log, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12),
-
## method = "ML")
-
##
-
## Coefficients:
-
## ma1 sma1
-
## -0.3484 -0.5622
-
## s.e. 0.0943 0.0774
-
##
-
## sigma^2 estimated as 0.001313: log likelihood = 223.63, aic = -441.26
(air_arima2 <- arima(air_log, order=c(0,1,1), seasonal=list(order=c(2,1,2), period=12), method="ML"))
-
##
-
## Call:
-
## arima(x = air_log, order = c(0, 1, 1), seasonal = list(order = c(2, 1, 2), period = 12),
-
## method = "ML")
-
##
-
## Coefficients:
-
## ma1 sar1 sar2 sma1 sma2
-
## -0.3547 1.0601 -0.1165 -1.9158 0.9964
-
## s.e. 0.0995 0.1090 0.1812 0.3942 0.3877
-
##
-
## sigma^2 estimated as 0.0009784: log likelihood = 225.56, aic = -439.12
-
#两个ARIMA模型都采用极大似然方法估计,计算系数对应的t值:
-
#ARIMA(0,1,1)(0,1,1)[12] :t(ma1)=-39.1791, t(sma1)=-93.8445
-
#ARIMA(0,1,1)(2,1,2)[12] : t(ma1)=-35.8173,t(sar1)=88.68383,t(sar2)=-3.56141,t(sma1)=-12.6615,t(sma2)= 6.855526
-
-
#可见两个模型的系数都是显著的,而ARIMA(0,1,1)(0,1,1)[12]的AIC和BIC比ARIMA(0,1,1)(2,1,2)[12]的要小,因此选择模型ARIMA(0,1,1)(0,1,1)[12]。
-
-
#6、预测五年后航空公司的销售额
-
(
air_forecast
<-
forecast.Arima
(
air_arima1,
h
=
5,
level
=
c
(
99.5
)
)
)
-
##
Point
Forecast
Lo 99
.5
Hi 99
.5
-
##
Jan 12 6
.038649 5
.936951 6
.140348
-
##
Feb 12 5
.988762 5
.867380 6
.110143
-
##
Mar 12 6
.145428 6
.007137 6
.283719
-
##
Apr 12 6
.118993 5
.965646 6
.272340
-
##
May 12 6
.159657 5
.992605 6
.326709
plot.forecast(air_forecast)
-
#7、检验
-
#1)tsdiag检验误差的自相关性
-
tsdiag
(
air_arima1
)
-
#检验预测误差是否是均值为0,方差为常数的正态分布
-
plot.ts
(
forecast
$
residuals
)
plotForecastErrors(air_forecast$residuals)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。