交叉验证(Cross-validation)主要用于建模应用中,例如PCR 、PLS 回归建模中。在给定的建模样本中,拿出大部分样本进行建模型,留小部分样本用刚建立的模型进行预报,并求这小部分样本的预报误差,记录它们的平方加和。这个过程一直进行,直到所有的样本都被预报了一次而且仅被预报一次。把每个样本的预报误差平方加和,称为PRESS(predicted Error Sum of Squares)。交叉验证方法在克服过拟合(Over-Fitting)问题上非常有用。
K-fold cross-validation
{{K折交叉验证,初始采样分割成K个子样本,一个单独的子样本被保留作为验证模型的数据,其他K-1个样本用来训练。交叉验证重复K次,每个子样本验证一次,平均K次的结果或者使用其它结合方式,最终得到一个单一估测。这个方法的优势在于,同时重复运用随机产生的子样本进行训练和验证,每次的结果验证一次,10折交叉验证是最常用的。}}
- CVlm {DAAG}
- val=CVlm(df=cv,m=10,form.lm=formula(Y~X1+X2+X3+X4))# m=10(10-fold,df=cv为数据框文件为cv,拟和普通最小二乘法)
- Analysis of Variance Table Response: Y
- Df Sum Sq Mean Sq F value Pr(>F)
- X1 1 69.4 69.4 17.19 0.00042
- X2 1 4.1 4.1 1.03 0.32210
- X3 1 32.3 32.3 8.01 0.00974
- X4 1 27.8 27.8 6.88 0.01552
- Residuals 22 88.8 4.0
-
- X1 ***
- X2
- X3 **
- X4 *
- Residuals
- ---
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- fold 1
- Observations in test set: 2
- 13 16
- Predicted 12.03 10.180
- cvpred 13.49 10.768
- Y 8.40 10.100
- CV residual -5.09 -0.668
- Sum of squares = 26.4 Mean square = 13.2 n = 2
- fold 2
- Observations in test set: 3
- 8 19 26
- Predicted 13.52 12.03 8.85
- cvpred 13.67 12.02 7.78
- Y 12.10 10.80 13.30
- CV residual -1.57 -1.22 5.52
- Sum of squares = 34.4 Mean square = 11.5 n = 3
- fold 3
- Observations in test set: 3
- 9 22 25
- Predicted 7.87 13.16 17.79
- cvpred 8.09 13.22 15.15
- Y 9.60 14.90 20.00
- CV residual 1.51 1.68 4.85
- Sum of squares = 28.7 Mean square = 9.56 n = 3
- fold 4
- Observations in test set: 3
- 1 20 27
- Predicted 11.428 12.3 11.29
- cvpred 11.571 12.5 11.52
- Y 11.200 10.2 10.40
- CV residual -0.371 -2.3 -1.12
- Sum of squares = 6.71 Mean square = 2.24 n = 3
- fold 5
- Observations in test set: 3
- 5 17 18
- Predicted 11.10 13.05 9.167
- cvpred 10.73 12.89 9.229
- Y 13.40 14.80 9.100
- CV residual 2.67 1.91 -0.129
- Sum of squares = 10.8 Mean square = 3.59 n = 3
- fold 6
- Observations in test set: 3
- 6 10 21
- Predicted 15.33 9.58 12.25
- cvpred 13.63 9.76 12.27
- Y 18.30 8.40 13.60
- CV residual 4.67 -1.36 1.33
- Sum of squares = 25.4 Mean square = 8.48 n = 3
- fold 7
- Observations in test set: 3
- 12 23 24
- Predicted 10.436 15.963 15.21
- cvpred 10.486 16.445 15.81
- Y 10.600 16.000 13.20
- CV residual 0.114 -0.445 -2.61
- Sum of squares = 7.03 Mean square = 2.34 n = 3
- fold 8
- Observations in test set: 3
- 2 3 11
- Predicted 9.48 13.064 11.87
- cvpred 9.91 13.202 12.32
- Y 8.80 12.300 9.30
- CV residual -1.11 -0.902 -3.02
- Sum of squares = 11.2 Mean square = 3.72 n = 3
- fold 9
- Observations in test set: 2
- 4 7
- Predicted 10.716 11.64
- cvpred 10.646 12.21
- Y 11.600 11.10
- CV residual 0.954 -1.11
- Sum of squares = 2.13 Mean square = 1.07 n = 2
- fold 10
- Observations in test set: 2
- 14 15
- Predicted 11.26 11.441
- cvpred 11.75 11.373
- Y 9.60 10.900
- CV residual -2.15 -0.473
- Sum of squares = 4.84 Mean square = 2.42 n = 2
- Overall (Sum over all 2 folds)
- ms 5.83 #10折平均的均方为5.83