赞
踩
最近在利用熵权法选取最优指标数据时,计算权重得到的是全为nan值的权重,经过分析过程,找到问题所在。
数据展示:
熵权法步骤:
step 1 :标准化处理
step 2 : 计算每个维度的信息熵
step 3 :差异系数
step 4 :计算权重
step 5 : 计算综合评分
python实现:
- #导入数据
- data = pd.read_excel(r'data\data.xlsx',sheet_name = None)
- df5 = data['2021']
- df = df5.drop('class',axis = 1) #axis参数默认为0
- df.head()
-
- #调包
- from sklearn import preprocessing
- import pandas
-
- min_max_normalizer=preprocessing.MinMaxScaler(feature_range=(0,1))
- #feature_range设置最大最小变换值,默认(0,1)
- scaled_data=min_max_normalizer.fit_transform(df)
- #将数据缩放(映射)到设置固定区间
- df_normalized=pandas.DataFrame(scaled_data)
- #将变换后的数据转换为dataframe对象
- # print(df5_normalized)
- df_normalized
- df_normalized.columns=['X1','X2','X3','X4','X5','X6','X7','X8','X9','X10','X11','X12','X13','X14','X15','X16','X17','X18','X19','X20','X21','X22','X23','X24','X25','X26',
- 'X27','X28','X29','X30','X31','X32','X33','X34','X35','X36','X37','X38','X39','X40','X41','X42','X43','X44','X45','X46','X47','X48',
- 'X49','X50','X51','X52','X53','X54','X55','X56','X57','X58','X59','X60','X61','X62','X63','X64']
- df_normalized.to_excel('2021年标准化数据.xlsx',index=False)
- df_normalized
-
- import copy
- [m,n]=df_normalized.shape
- #计算信息熵
- df_normalized = np.array(df_normalized)
- p=copy.deepcopy(df_normalized)
- for j in range(0,n):
- p[0:,j]=df_normalized[0:,j]/np.sum(df_normalized[0:,j])
- print(p)
- E=copy.deepcopy(df_normalized[0,:])
- p=np.nan_to_num(p)
- # for j in range(0,n):
- E=(-1/np.log(m))*sum(p*np.log(p))#此处对p值加上一个极小值,否则p中的0.0取对数后讲产生一个-inf无穷大值
- # E = -1/np.log(m)*p*np.log(p)
- print(E)
- # #计算权重
- # w=(1-E)/sum(1-E)
- # print(w)
- print('np.log(p)维度:',np.log(p).shape)
- print('p维度:',p.shape)
- print('p*np.log(p)维度:',(p*np.log(p)).shape)
- # p*np.log(p)
- # np.log(p)
- #计算评分
- score=np.dot(p,w).round(5)
- print(score)
- score=pd.DataFrame(score,index=df.index,columns=['综合得分']).sort_values(by =['综合得分'],ascending = False)
- score
问题出现在64个指标输出时权重值全为nan.
经检查发现经过标准化处理后的数据中含有0.0值
对0取对数将会产生一个无穷小量,即 -inf。
所有利用权重公式计算后得到的权重都是 nan(Not A Number)值
解决方法:
在计算权重时对p加上1个极小值。
E=(-1/np.log(m))*sum(p*np.log(p+1e-10))
问题成功解决。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。