当前位置:   article > 正文

使用R语言进行判别分析

使用R语言进行判别分析

一、样本数据描述

2016年全国31个省、直辖市、自治区城镇居民人均消费支出水平划为2类,其中北京和上海划为一类,其余地区划为一类,将广东和西藏作为待判样本,具体划分数据如下表,试对数据进行判别分析,并将广东和西藏两个待判区域归类。

x1:食品烟酒支出,x2:衣着支出,x3:居住支出,x4:生活用品和服务支出,

x5:交通通信支出,x6:教育文化娱乐支出,x7:医疗保健支出,x8:其他用品和服务支出

City,X1,X2,X3,X4,X5,X6,X7,X8,Group
北京,8070.4,2643,12128,2511,5077.9,4054.7,2629.8,1140.6,1
天津,8679.6,2114,6187.3,1663.8,3991.9,2643.6,2172.2,892.2,2
河北,4991.6,1614.4,4483.2,1351.1,2664.1,1991.3,1549.9,460.4,2
山西,3862.8,1603,3633.8,951.6,2401,2439,1651.6,450.1,2
内蒙古,6445.8,2543.3,4006.1,1565.1,3045.2,2598.9,1840.2,699.9,2
辽宁,6901.6,2321.3,4632.8,1558.2,3447,3018.5,2313.6,802.8,2
吉林,4975.7,1819,3612,1107.1,2691,2367.5,2059.2,534.9,2
黑龙江,5019.3,1804.4,3352.4,1018.9,2462.9,2011.5,2007.5,468.3,2
上海,10014.8,1834.8,13216,1868.2,4447.5,4533.5,2839.9,1102.1,1
江苏,7389.2,1809.5,6140.6,1616.2,3952.4,3163.9,1624.5,736.6,2
浙江,8467.3,1903.9,7385.4,1420.7,5100.9,3452.3,1691.9,645.3,2
安徽,6381.7,1491,3931.2,1118.4,2748.4,2233.3,1269.3,432.9,2
福建,8299.6,1443.5,6530.5,1393.4,3205.7,2461.5,1178.5,492.8,2
江西,5667.5,1472.2,3915.9,1028.6,2310.6,1963.9,887.4,449.6,2
山东,5929.4,1977.7,4473.1,1576.5,3002.5,2399.3,1610,526.9,2
河南,5067.7,1746.6,3753.4,1430.2,1993.8,2078.8,1524.5,492.8,2
湖北,6294.3,1557.4,4176.7,1163.8,2391.9,2228.4,1792,435.6,2
湖南,6407.7,1666.4,3918.7,1384.1,2837.1,3406.1,1362.6,437.4,2
广西,5937.2,886.3,3784.3,1032.8,2259.8,2003,1065.9,299.3,2
海南,7419.7,859.6,3527.7,954,2582.3,1931.3,1399.8,341,2
重庆,6883.9,1939.2,3801.1,1466,2573.9,2232.4,1700,434.4,2
四川,7118.4,1767.5,3756.5,1311.1,2697.6,2008.4,1423.4,577.1,2
贵州,6010.3,1525.4,3793.1,1270.2,2684.4,2493.5,1050.1,374.6,2
云南,5528.2,1195.5,3814.4,1135.1,2791.2,2217,1526.7,414.3,2
陕西,5422,1542.2,3681.5,1367.7,2455.7,2474,2016.7,409,2
甘肃,5777.3,1776.9,3752.6,1329.1,2517.9,2322.1,1583.4,479.9,2
青海,5975.7,1963.5,3809.4,1322.1,3064.3,2352.9,1750.4,614.9,2
宁夏,4889.2,1726.7,3770.5,1245.1,3896.5,2415.7,1874,546.6,2
新疆,6179.4,1966.1,3543.9,1543.8,3074.1,2404.9,1934.8,581.5,2
广东,9421.6,1583.4,6410.4,1721.9,4198.1,3103.4,1304.5,870.1
西藏,8727.8,1812.5,3614.5,983.0,2198.4,922.5,585.3,596.5

二、读取数据

df<-read.csv('f:/桌面/各城市消费水平.csv')

head(df)

head(df)
    City     X1     X2      X3     X4     X5     X6     X7     X8 Group
1   北京 8070.4 2643.0 12128.0 2511.0 5077.9 4054.7 2629.8 1140.6     1
2   天津 8679.6 2114.0  6187.3 1663.8 3991.9 2643.6 2172.2  892.2     2
3   河北 4991.6 1614.4  4483.2 1351.1 2664.1 1991.3 1549.9  460.4     2
4   山西 3862.8 1603.0  3633.8  951.6 2401.0 2439.0 1651.6  450.1     2
5 内蒙古 6445.8 2543.3  4006.1 1565.1 3045.2 2598.9 1840.2  699.9     2
6   辽宁 6901.6 2321.3  4632.8 1558.2 3447.0 3018.5 2313.6  802.8     2

三、建立线下判别函数

library(MASS)
z<-lda(Group~X1+X2+X3+X4+X4+X5+X6+X7+X8,data=df,prior=c(1,1)/2)
z

library(MASS)
> z<-lda(group~X1+X2+X3+X4+X4+X5+X6+X7+X8,data=df,prior=c(1,1)/2)
Error in eval(predvars, data, env) : 找不到对象'group'
> z<-lda(Group~X1+X2+X3+X4+X4+X5+X6+X7+X8,data=df,prior=c(1,1)/2)
> z
Call:
lda(Group ~ X1 + X2 + X3 + X4 + X4 + X5 + X6 + X7 + X8, data = df, 
    prior = c(1, 1)/2)

Prior probabilities of groups:
  1   2 
0.5 0.5 

Group means:
        X1       X2        X3       X4       X5     X6       X7        X8
1 9042.600 2238.900 12672.000 2189.600 4762.700 4294.1 2734.850 1121.3500
2 6219.337 1705.056  4265.485 1308.322 2920.152 2419.0 1624.448  519.6704

Coefficients of linear discriminants:
             LD1
X1  0.0006388214
X2  0.0013250299
X3 -0.0015453976
X4 -0.0019589553
X5  0.0014962902
X6 -0.0003410774
X7 -0.0011726981
X8 -0.0020388069

使用程序包MASS中的lda函数进行判别分析,运行分别得到了,先验概率、两组每组的均值,线下判别函数的判别系数。

四、对原始数据进行回判

pred<-predict(z)
pred

运行得到:

pred<-predict(z)
> pred
$class
 [1] 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Levels: 1 2

$posterior
              1            2
1  1.000000e+00 4.027765e-31
2  1.773743e-27 1.000000e+00
3  6.975962e-28 1.000000e+00
4  8.482928e-32 1.000000e+00
5  6.559339e-39 1.000000e+00
6  5.599629e-32 1.000000e+00
7  1.009488e-34 1.000000e+00
8  8.724046e-38 1.000000e+00
9  1.000000e+00 1.761066e-39
10 1.285275e-25 1.000000e+00
11 4.239946e-31 1.000000e+00
12 2.243237e-41 1.000000e+00
13 5.122434e-26 1.000000e+00
14 5.160427e-39 1.000000e+00
15 2.364619e-32 1.000000e+00
16 1.394579e-28 1.000000e+00
17 1.107948e-32 1.000000e+00
18 7.036740e-38 1.000000e+00
19 8.146237e-37 1.000000e+00
20 5.359631e-45 1.000000e+00
21 1.520051e-39 1.000000e+00
22 8.930788e-43 1.000000e+00
23 6.492436e-41 1.000000e+00
24 5.489651e-36 1.000000e+00
25 1.934120e-30 1.000000e+00
26 5.493119e-36 1.000000e+00
27 1.798720e-39 1.000000e+00
28 8.845496e-42 1.000000e+00
29 3.042194e-39 1.000000e+00

$x
         LD1
1  -5.546457
2   4.881533
3   4.955489
4   5.669907
5   6.967645
6   5.702824
7   6.203559
8   6.762564
9  -7.071856
10  4.542101
11  5.542388
12  7.417637
13  4.615005
14  6.986655
15  5.771144
16  5.083072
17  5.831224
18  6.779598
19  6.585515
20  8.078531
21  7.083520
22  7.673106
23  7.333417
24  6.434315
25  5.422111
26  6.434265
27  7.070180
28  7.491387
29  7.028534

1、$class结果得到了对原始数据进行回判的结果,经比较和实际样本数据是一致的。

2、$posterior得到每个原始样本进行后验回判概率。

3、$x得到了线下判别函数的在每个样本的数值。

五、列联表表示回判的结果

table(pred$class,df$Group)
   
     1  2
  1  2  0
  2  0 27

可以得到了实际回判准确率为100%

六、对两个待判样本进行判别归类

newdata<-rbind(c(9421.6,1583.4,6410.4,1721.9,4198.1,3103.4,1304.5,870.1),
               c(8727.8,1812.5,3614.5,983.0,2198.4,922.5,585.3,596.5))
dimnames(newdata)<-list(NULL,c('X1','X2','X3','X4','X5','X6','X7','X8'))
newdata<-data.frame(newdata)
predict(z,newdata=newdata)

运行得到:

 predict(z,newdata=newdata)
$class
[1] 2 2
Levels: 1 2

$posterior
             1 2
1 2.305282e-30 1
2 1.443943e-56 1

$x
        LD1
1  5.408199
2 10.189745

从结果可以看到,待判的两个样本都归属于第二类城市和地区,$posterior得到了后验概率,$x得到了线性判别函数的样本数值。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/179564?site
推荐阅读
相关标签
  

闽ICP备14008679号