赞
踩
多索引 | 自定义排序
最后修改日期:2021年11月17日
###### raw data set, 'data'
###### columns labled with 'A' and 'B'
###### group function, e.g. mean(), sum(), std()
data.groupby(['A', 'B'], as_index=False).mean()
###### raw data set, 'data'
###### multi-index
data.index.get_level_values(level=0) # here, level could not exceed index levels
###### raw data set, 'data'
# 增加名为‘月份’的列,新列的值为int(month)
data1['月份']=int(month)
###### raw data set, 'data'
###### subset, 用来指定特定的列,默认所有列
###### first, 删除重复项并保留第一次出现的项
###### inplace, 是直接在原来数据上修改还是保留一个副本
data.drop_duplicates(subset='A', keep='first', inplace=False) # keep='first'/'last'
###### raw data set, 'data'
# drop columns with labels 'column_name'
# axis=0, rows; axis=1, columns
# inplace, manipulate raw data; otherwise, return a copy of dataframe
data.drop('column_name',axis=1, inplace=True)
# example # print(data) ''' creator Num ... priority efforts group 0 Y 962 ... Highest 20.703253 P 1 R 979 ... Highest 15.027863 N ... 560 t 193 ... Low 44.173754 N ''' MultIndex = ['group', 'priority'] groupdata = data.groupby(MultIndex)['Num'].count() # <class 'pandas.core.series.Series'> # groupdata = pd.DataFrame(groupdata) groupdata = pd.DataFrame(data.groupby(MultIndex)['Num'].count()) # <class 'pandas.core.frame.DataFrame'> ''' group priority A High 5 Highest 7 Medium 36 B High 8 Highest 18 Medium 11 N High 94 Highest 88 Low 3 Medium 217 ... Name: Num, dtype: int64 '''
groupVal = groupdata.index.get_level_values(level=0)
'''
Index(['A', 'A', 'A', 'B', 'B', 'B', 'N', 'N', 'N', 'N', ...],
dtype='object', name='group')
'''
groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates() # <class 'pandas.core.indexes.base.Index'>
'''
Index(['A', 'B', 'N', ...], dtype='object', name='group')
'''
groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list() # <class 'list'>
'''
['A', 'B', 'N', ...]
'''
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3] # <class 'list'>
'''
['A', 'B', ..., 'N']
'''
# groupdata['groupOrder'] = groupVal groupdata['groupOrder'] = groupdata.index.get_level_values(level=0) ''' Num groupOrder group priority A High 5 A Highest 7 A Medium 36 A B High 8 B Highest 18 B Medium 11 B N High 94 N Highest 88 N Low 3 N Medium 217 N ... ''' groupdata['groupOrder'] = groupdata['groupOrder'].astype('category') groupdata['groupOrder'].cat.reorder_categories(groupReorder, ordered=True, inplace=True) # <class 'pandas.core.series.Series'> groupdata.sort_values('groupOrder', inplace=True, ascending=True) groupdata.drop(columns='groupOrder', inplace=True) ''' Num groupOrder group priority A High 5 A Highest 7 A Medium 36 A B High 8 B Highest 18 B Medium 11 B ... N High 94 N Highest 88 N Low 3 N Medium 217 N '''
### 此处index level有两级 groupVal = groupdata.index.get_level_values(level=0) ''' Index(['A', 'A', 'A', 'B', 'B', 'B', 'N', 'N', 'N', 'N', ...], dtype='object', name='group') ''' groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list() ''' ['A', 'B', 'N', 'P', 'W', 'X'] ''' priorityVal = groupdata.index.get_level_values(level=1) ''' Index(['High', 'Highest', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Low', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Medium'], dtype='object', name='priority') ''' priorityOrder = groupdata.index.get_level_values(level=1).drop_duplicates().to_list() ''' ['High', 'Highest', 'Medium', 'Low'] '''
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3]
'''
['A', 'B', 'P', 'W', 'X', 'N']
'''
priorityReorder = priorityOrder[1:2] + priorityOrder[0:1] + priorityOrder[2:4]
'''
['Highest', 'High', 'Medium', 'Low']
'''
# groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal] # the same with # groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal] groupdata = groupdata.assign(groupOrder=groupVal, priorityOrder=priorityVal) ''' Num groupOrder priorityOrder group priority A High 5 A High Highest 7 A Highest Medium 36 A Medium B High 8 B High Highest 18 B Highest Medium 11 B Medium N High 94 N High Highest 88 N Highest Low 3 N Low Medium 217 N Medium ... ''' # type(groupdata): <class 'pandas.core.frame.DataFrame'> groupdata['groupOrder'] = pd.Categorical(groupdata['groupOrder'], categories=groupReorder) groupdata['priorityOrder'] = pd.Categorical(groupdata['priorityOrder'], categories=priorityReorder) groupdata.sort_values(['groupOrder', 'priorityOrder'], inplace=True, ascending=[True, True]) groupdata.drop(columns=['groupOrder', 'priorityOrder'], axis=1, inplace=True) ''' Num group priority A Highest 7 High 5 Medium 36 B Highest 18 High 8 Medium 11 ... N Highest 88 High 94 Medium 217 Low 3 '''
写此文时有借鉴以下资源
python pandas同时对多列进行赋值
替换 Pandas DataFrame 中的列值
pandas dataframe 新增单列和多列
pandas 之 groupby 聚合函数
Pandas之drop_duplicates:去除重复项
python进行数据处理——pandas的drop函数
Pandas DataFrame sort by categorical column but by specific class ordering
Pandas教程 | 数据处理三板斧——map、apply、applymap详解
Pandas中的map(), apply()和applymap()的应用
赞
踩
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。