赞
踩
Pandas 主要用 np.nan
表示缺失数据。 计算时,默认不包含空值。详见缺失数据。
data.isnull()
data.duplicated()
- (data.isnull()).sum()
-
- (data.duplicated()).sum()
-
- (data.duplicated(setsub = ['列名1', '列名2'])).sum()
- data.dropna(how='all', inplace = TURE)
-
- inplace = TURE 同 data = data.dropna(how='all') 直接在数据上生效
-
- # 删除所有的缺失值
- data.dropna(how='any', inplace = TURE)
-
- data.drop_duplicates(subset = ['列名1', '列名2'], inplace = TURE)
data.fillna(value=5)
data.sort_values(by = '列名')
nan
值的布尔掩码:- In [60]: pd.isna(df1)
- Out[60]:
- A B C D F E
- 2013-01-01 False False False False True False
- 2013-01-02 False False False False False False
- 2013-01-03 False False False False False True
- 2013-01-04 False False False False False True
- In [55]: df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
-
- In [56]: df1.loc[dates[0]:dates[1], 'E'] = 1
-
- In [57]: df1
- Out[57]:
- A B C D F E
- 2013-01-01 0.000000 0.000000 -1.509059 5 NaN 1.0
- 2013-01-02 1.212112 -0.173215 0.119209 5 1.0 1.0
- 2013-01-03 -0.861849 -2.104569 -0.494929 5 2.0 NaN
- 2013-01-04 0.721555 -0.706771 -1.039575 5 3.0 NaN
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。