赞
踩
一、样本数据描述
2016年全国31个省、直辖市、自治区城镇居民人均消费支出水平划为2类,其中北京和上海划为一类,其余地区划为一类,将广东和西藏作为待判样本,具体划分数据如下表,试对数据进行判别分析,并将广东和西藏两个待判区域归类。
x1:食品烟酒支出,x2:衣着支出,x3:居住支出,x4:生活用品和服务支出,
x5:交通通信支出,x6:教育文化娱乐支出,x7:医疗保健支出,x8:其他用品和服务支出
City,X1,X2,X3,X4,X5,X6,X7,X8,Group
北京,8070.4,2643,12128,2511,5077.9,4054.7,2629.8,1140.6,1
天津,8679.6,2114,6187.3,1663.8,3991.9,2643.6,2172.2,892.2,2
河北,4991.6,1614.4,4483.2,1351.1,2664.1,1991.3,1549.9,460.4,2
山西,3862.8,1603,3633.8,951.6,2401,2439,1651.6,450.1,2
内蒙古,6445.8,2543.3,4006.1,1565.1,3045.2,2598.9,1840.2,699.9,2
辽宁,6901.6,2321.3,4632.8,1558.2,3447,3018.5,2313.6,802.8,2
吉林,4975.7,1819,3612,1107.1,2691,2367.5,2059.2,534.9,2
黑龙江,5019.3,1804.4,3352.4,1018.9,2462.9,2011.5,2007.5,468.3,2
上海,10014.8,1834.8,13216,1868.2,4447.5,4533.5,2839.9,1102.1,1
江苏,7389.2,1809.5,6140.6,1616.2,3952.4,3163.9,1624.5,736.6,2
浙江,8467.3,1903.9,7385.4,1420.7,5100.9,3452.3,1691.9,645.3,2
安徽,6381.7,1491,3931.2,1118.4,2748.4,2233.3,1269.3,432.9,2
福建,8299.6,1443.5,6530.5,1393.4,3205.7,2461.5,1178.5,492.8,2
江西,5667.5,1472.2,3915.9,1028.6,2310.6,1963.9,887.4,449.6,2
山东,5929.4,1977.7,4473.1,1576.5,3002.5,2399.3,1610,526.9,2
河南,5067.7,1746.6,3753.4,1430.2,1993.8,2078.8,1524.5,492.8,2
湖北,6294.3,1557.4,4176.7,1163.8,2391.9,2228.4,1792,435.6,2
湖南,6407.7,1666.4,3918.7,1384.1,2837.1,3406.1,1362.6,437.4,2
广西,5937.2,886.3,3784.3,1032.8,2259.8,2003,1065.9,299.3,2
海南,7419.7,859.6,3527.7,954,2582.3,1931.3,1399.8,341,2
重庆,6883.9,1939.2,3801.1,1466,2573.9,2232.4,1700,434.4,2
四川,7118.4,1767.5,3756.5,1311.1,2697.6,2008.4,1423.4,577.1,2
贵州,6010.3,1525.4,3793.1,1270.2,2684.4,2493.5,1050.1,374.6,2
云南,5528.2,1195.5,3814.4,1135.1,2791.2,2217,1526.7,414.3,2
陕西,5422,1542.2,3681.5,1367.7,2455.7,2474,2016.7,409,2
甘肃,5777.3,1776.9,3752.6,1329.1,2517.9,2322.1,1583.4,479.9,2
青海,5975.7,1963.5,3809.4,1322.1,3064.3,2352.9,1750.4,614.9,2
宁夏,4889.2,1726.7,3770.5,1245.1,3896.5,2415.7,1874,546.6,2
新疆,6179.4,1966.1,3543.9,1543.8,3074.1,2404.9,1934.8,581.5,2
广东,9421.6,1583.4,6410.4,1721.9,4198.1,3103.4,1304.5,870.1
西藏,8727.8,1812.5,3614.5,983.0,2198.4,922.5,585.3,596.5
二、读取数据
df<-read.csv('f:/桌面/各城市消费水平.csv')
head(df)
head(df) City X1 X2 X3 X4 X5 X6 X7 X8 Group 1 北京 8070.4 2643.0 12128.0 2511.0 5077.9 4054.7 2629.8 1140.6 1 2 天津 8679.6 2114.0 6187.3 1663.8 3991.9 2643.6 2172.2 892.2 2 3 河北 4991.6 1614.4 4483.2 1351.1 2664.1 1991.3 1549.9 460.4 2 4 山西 3862.8 1603.0 3633.8 951.6 2401.0 2439.0 1651.6 450.1 2 5 内蒙古 6445.8 2543.3 4006.1 1565.1 3045.2 2598.9 1840.2 699.9 2 6 辽宁 6901.6 2321.3 4632.8 1558.2 3447.0 3018.5 2313.6 802.8 2
三、建立线下判别函数
library(MASS)
z<-lda(Group~X1+X2+X3+X4+X4+X5+X6+X7+X8,data=df,prior=c(1,1)/2)
z
library(MASS) > z<-lda(group~X1+X2+X3+X4+X4+X5+X6+X7+X8,data=df,prior=c(1,1)/2) Error in eval(predvars, data, env) : 找不到对象'group' > z<-lda(Group~X1+X2+X3+X4+X4+X5+X6+X7+X8,data=df,prior=c(1,1)/2) > z Call: lda(Group ~ X1 + X2 + X3 + X4 + X4 + X5 + X6 + X7 + X8, data = df, prior = c(1, 1)/2) Prior probabilities of groups: 1 2 0.5 0.5 Group means: X1 X2 X3 X4 X5 X6 X7 X8 1 9042.600 2238.900 12672.000 2189.600 4762.700 4294.1 2734.850 1121.3500 2 6219.337 1705.056 4265.485 1308.322 2920.152 2419.0 1624.448 519.6704 Coefficients of linear discriminants: LD1 X1 0.0006388214 X2 0.0013250299 X3 -0.0015453976 X4 -0.0019589553 X5 0.0014962902 X6 -0.0003410774 X7 -0.0011726981 X8 -0.0020388069
使用程序包MASS中的lda函数进行判别分析,运行分别得到了,先验概率、两组每组的均值,线下判别函数的判别系数。
四、对原始数据进行回判
pred<-predict(z)
pred
运行得到:
pred<-predict(z) > pred $class [1] 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Levels: 1 2 $posterior 1 2 1 1.000000e+00 4.027765e-31 2 1.773743e-27 1.000000e+00 3 6.975962e-28 1.000000e+00 4 8.482928e-32 1.000000e+00 5 6.559339e-39 1.000000e+00 6 5.599629e-32 1.000000e+00 7 1.009488e-34 1.000000e+00 8 8.724046e-38 1.000000e+00 9 1.000000e+00 1.761066e-39 10 1.285275e-25 1.000000e+00 11 4.239946e-31 1.000000e+00 12 2.243237e-41 1.000000e+00 13 5.122434e-26 1.000000e+00 14 5.160427e-39 1.000000e+00 15 2.364619e-32 1.000000e+00 16 1.394579e-28 1.000000e+00 17 1.107948e-32 1.000000e+00 18 7.036740e-38 1.000000e+00 19 8.146237e-37 1.000000e+00 20 5.359631e-45 1.000000e+00 21 1.520051e-39 1.000000e+00 22 8.930788e-43 1.000000e+00 23 6.492436e-41 1.000000e+00 24 5.489651e-36 1.000000e+00 25 1.934120e-30 1.000000e+00 26 5.493119e-36 1.000000e+00 27 1.798720e-39 1.000000e+00 28 8.845496e-42 1.000000e+00 29 3.042194e-39 1.000000e+00 $x LD1 1 -5.546457 2 4.881533 3 4.955489 4 5.669907 5 6.967645 6 5.702824 7 6.203559 8 6.762564 9 -7.071856 10 4.542101 11 5.542388 12 7.417637 13 4.615005 14 6.986655 15 5.771144 16 5.083072 17 5.831224 18 6.779598 19 6.585515 20 8.078531 21 7.083520 22 7.673106 23 7.333417 24 6.434315 25 5.422111 26 6.434265 27 7.070180 28 7.491387 29 7.028534
1、$class结果得到了对原始数据进行回判的结果,经比较和实际样本数据是一致的。
2、$posterior得到每个原始样本进行后验回判概率。
3、$x得到了线下判别函数的在每个样本的数值。
五、列联表表示回判的结果
table(pred$class,df$Group) 1 2 1 2 0 2 0 27
可以得到了实际回判准确率为100%
六、对两个待判样本进行判别归类
newdata<-rbind(c(9421.6,1583.4,6410.4,1721.9,4198.1,3103.4,1304.5,870.1),
c(8727.8,1812.5,3614.5,983.0,2198.4,922.5,585.3,596.5))
dimnames(newdata)<-list(NULL,c('X1','X2','X3','X4','X5','X6','X7','X8'))
newdata<-data.frame(newdata)
predict(z,newdata=newdata)
运行得到:
predict(z,newdata=newdata) $class [1] 2 2 Levels: 1 2 $posterior 1 2 1 2.305282e-30 1 2 1.443943e-56 1 $x LD1 1 5.408199 2 10.189745
从结果可以看到,待判的两个样本都归属于第二类城市和地区,$posterior得到了后验概率,$x得到了线性判别函数的样本数值。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。