赞
踩
print(train_data.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]]) # (※1) 为什么test_data的列最后不是-1,是因为test_data没有价格这个列项 all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:])) print('-----------------------------------------------') print(all_features.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]]) # (※2) 获取到不是数值的列index] numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index # print('++++++++++++++++++++++++') # (※3) print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]]) # print('----------------------') all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std())) # print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]]) # input() # (※4) 在标准化数据之后,所有均值消失,因此我们可以将缺失值设置为0 all_features[numeric_features] = all_features[numeric_features].fillna(0) # (※5) dummies & pd to tensor print('++++++++++ demo test dummies +++++++++++') test = pd.DataFrame({'“x”':[1,2,3,4,5, 6], "seasion":['here', 'over', '', 'next', '', 'here']}) print(test) print('-------------------------------') test = pd.get_dummies(test, dummy_na=True) print(test) test = test*1 print(test) print('++++++++++ test trans to tensor +++++++++++') # test1 = torch.tensor(test) # 全部转化 test1 = torch.tensor(test.values, dtype=torch.float32) print(test1.shape) print(test1) print('-------------------------------') # 不用iloc的话就是光是行处理 test2 = torch.tensor(test[:3].values, dtype=torch.float32) print(test2.shape) print(test2) print('-------------------------------') # 特定行列转化需要熟练运动iloc test3 = torch.tensor(test.iloc[:2, :-1].values, dtype=torch.float32) print(test3.shape) print(test3) input() output-begin: (1460, 81) (1459, 80) Id MSSubClass MSZoning LotFrontage SaleType SaleCondition SalePrice 0 1 60 RL 65.0 WD Normal 208500 1 2 20 RL 80.0 WD Normal 181500 2 3 60 RL 68.0 WD Normal 223500 3 4 70 RL 60.0 WD Abnorml 140000 ----------------------------------------------- MSSubClass MSZoning LotFrontage LotArea YrSold SaleType SaleCondition 0 60 RL 65.0 8450 2008 WD Normal 1 20 RL 80.0 9600 2007 WD Normal 2 60 RL 68.0 11250 2008 WD Normal 3 70 RL 60.0 9550 2006 WD Abnorml ++++++++++ demo test dummies +++++++++++ “x” seasion 0 1 here 1 2 over 2 3 3 4 next 4 5 5 6 here ------------------------------- “x” seasion_ seasion_here seasion_next seasion_over seasion_nan 0 1 False True False False False 1 2 False False False True False 2 3 True False False False False 3 4 False False True False False 4 5 True False False False False 5 6 False True False False False “x” seasion_ seasion_here seasion_next seasion_over seasion_nan 0 1 0 1 0 0 0 1 2 0 0 0 1 0 2 3 1 0 0 0 0 3 4 0 0 1 0 0 4 5 1 0 0 0 0 5 6 0 1 0 0 0 ++++++++++ test trans to tensor +++++++++++ torch.Size([6, 6]) tensor([[1., 0., 1., 0., 0., 0.], [2., 0., 0., 0., 1., 0.], [3., 1., 0., 0., 0., 0.], [4., 0., 0., 1., 0., 0.], [5., 1., 0., 0., 0., 0.], [6., 0., 1., 0., 0., 0.]]) ------------------------------- torch.Size([3, 6]) tensor([[1., 0., 1., 0., 0., 0.], [2., 0., 0., 0., 1., 0.], [3., 1., 0., 0., 0., 0.]]) ------------------------------- torch.Size([2, 5]) tensor([[1., 0., 1., 0., 0.], [2., 0., 0., 0., 1.]]) output-end
concat — 合并.
iloc — 筛选行列.
apply — 处理列数据.
fillna — 填补数值空缺.
get_dummies — 独热编码(自行测试显示)
无
PS: 略。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。