当前位置:   article > 正文

Python | Pandas | 多索引 | 自定义排序_pandas索引自定义排序

pandas索引自定义排序

自定义排序

多索引 | 自定义排序
最后修改日期:2021年11月17日


知识点

聚合

###### raw data set, 'data'
###### columns labled with 'A' and 'B'
###### group function, e.g. mean(), sum(), std()
data.groupby(['A', 'B'], as_index=False).mean()
  • 1
  • 2
  • 3
  • 4

多索引

提取索引值

###### raw data set, 'data'
###### multi-index
data.index.get_level_values(level=0) 	# here, level could not exceed index levels
  • 1
  • 2
  • 3

添加 / 删改

添加

###### raw data set, 'data'
# 增加名为‘月份’的列,新列的值为int(month)
data1['月份']=int(month)  
  • 1
  • 2
  • 3

删改

去重
###### raw data set, 'data'
###### subset, 用来指定特定的列,默认所有列
###### first, 删除重复项并保留第一次出现的项
###### inplace, 是直接在原来数据上修改还是保留一个副本
data.drop_duplicates(subset='A', keep='first', inplace=False) 	# keep='first'/'last'
  • 1
  • 2
  • 3
  • 4
  • 5
指定列
###### raw data set, 'data'
# drop columns with labels 'column_name'
# axis=0, rows; axis=1, columns
# inplace, manipulate raw data; otherwise, return a copy of dataframe
data.drop('column_name',axis=1, inplace=True)
  • 1
  • 2
  • 3
  • 4
  • 5

示例数据

# example
# print(data)
'''
          creator        Num   ...   priority     efforts             group
0               Y        962   ...    Highest   20.703253                 P
1               R        979   ...    Highest   15.027863                 N
...
560             t        193   ...        Low   44.173754                 N
'''


MultIndex = ['group', 'priority']
groupdata = data.groupby(MultIndex)['Num'].count()                   # <class 'pandas.core.series.Series'>
# groupdata = pd.DataFrame(groupdata)
groupdata = pd.DataFrame(data.groupby(MultIndex)['Num'].count())     # <class 'pandas.core.frame.DataFrame'>
    '''
group             priority
A                 High          5
                  Highest       7
                  Medium       36
B                 High          8
                  Highest      18
                  Medium       11
N                 High         94
                  Highest      88
                  Low           3
                  Medium      217
...
Name: Num, dtype: int64
    '''
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

自定义排序

乱序

低维索引

现排序列表
groupVal = groupdata.index.get_level_values(level=0)
	'''
Index(['A', 'A', 'A', 'B', 'B', 'B', 'N', 'N', 'N', 'N', ...],
  dtype='object', name='group')
	'''

groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates() # <class 'pandas.core.indexes.base.Index'>
	'''
Index(['A', 'B', 'N', ...], dtype='object', name='group')
	'''

groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list()  # <class 'list'>
	'''
['A', 'B', 'N', ...]
	'''
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
自定义排序列表
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3] # <class 'list'>
	'''
['A', 'B', ..., 'N']
	'''
  • 1
  • 2
  • 3
  • 4
自定义排序实现
# groupdata['groupOrder'] = groupVal
groupdata['groupOrder'] = groupdata.index.get_level_values(level=0)
	'''
                                  Num       groupOrder
group            priority                              
A                High               5                A
                 Highest            7                A
                 Medium            36                A
B                High               8                B
                 Highest           18                B
                 Medium            11                B
N                High              94                N
                 Highest           88                N
                 Low                3                N
                 Medium           217                N
...
    '''

groupdata['groupOrder'] = groupdata['groupOrder'].astype('category')
groupdata['groupOrder'].cat.reorder_categories(groupReorder, ordered=True, inplace=True) # <class 'pandas.core.series.Series'>
groupdata.sort_values('groupOrder', inplace=True, ascending=True)
groupdata.drop(columns='groupOrder', inplace=True)
    '''
                                  Num       groupOrder
group            priority                              
A                High               5                A
                 Highest            7                A
                 Medium            36                A
B                High               8                B
                 Highest           18                B
                 Medium            11                B
...
N                High              94                N
                 Highest           88                N
                 Low                3                N
                 Medium           217                N
'''
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

高维索引

现排序列表
### 此处index level有两级
groupVal = groupdata.index.get_level_values(level=0)
	'''
Index(['A', 'A', 'A', 'B', 'B', 'B', 'N',
   'N', 'N', 'N', ...],
  dtype='object', name='group')
	'''
	
groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list()
	'''
['A', 'B', 'N', 'P', 'W', 'X']
	'''


priorityVal = groupdata.index.get_level_values(level=1)
	'''
Index(['High', 'Highest', 'Medium', 'High', 'Highest', 'Medium', 'High',
   'Highest', 'Low', 'Medium', 'High', 'Highest', 'Medium', 'High',
   'Highest', 'Medium', 'High', 'Highest', 'Medium'],
  dtype='object', name='priority')
	'''
	
priorityOrder = groupdata.index.get_level_values(level=1).drop_duplicates().to_list()
	'''
['High', 'Highest', 'Medium', 'Low']
	'''
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
自定义排序列表
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3]
	'''
['A', 'B', 'P', 'W', 'X', 'N']
	'''

priorityReorder = priorityOrder[1:2] + priorityOrder[0:1] + priorityOrder[2:4]
	'''
['Highest', 'High', 'Medium', 'Low']
	'''
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
自定义排序实现
# groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal]
# the same with
# groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal]
groupdata = groupdata.assign(groupOrder=groupVal, priorityOrder=priorityVal)
    '''
                                  Num        groupOrder priorityOrder
group            priority
A                High               5                 A          High
                 Highest            7                 A       Highest
                 Medium            36                 A        Medium
B                High               8                 B          High
                 Highest           18                 B       Highest
                 Medium            11                 B        Medium
N                High              94                 N          High
                 Highest           88                 N       Highest
                 Low                3                 N           Low
                 Medium           217                 N        Medium
...
    '''

# type(groupdata): <class 'pandas.core.frame.DataFrame'>
groupdata['groupOrder'] = pd.Categorical(groupdata['groupOrder'], categories=groupReorder)
groupdata['priorityOrder'] = pd.Categorical(groupdata['priorityOrder'], categories=priorityReorder)
groupdata.sort_values(['groupOrder', 'priorityOrder'], inplace=True, ascending=[True, True])
groupdata.drop(columns=['groupOrder', 'priorityOrder'], axis=1, inplace=True)
    '''
                                  Num
group            priority
A                Highest            7
                 High               5
                 Medium            36
B                Highest           18
                 High               8
                 Medium            11
...
N                Highest           88
                 High              94
                 Medium           217
                 Low                3
    '''
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40

参考文章

写此文时有借鉴以下资源

添加

python pandas同时对多列进行赋值
替换 Pandas DataFrame 中的列值
pandas dataframe 新增单列和多列

删改

pandas 之 groupby 聚合函数
Pandas之drop_duplicates:去除重复项
python进行数据处理——pandas的drop函数

自定义排序实现

Pandas DataFrame sort by categorical column but by specific class ordering
Pandas教程 | 数据处理三板斧——map、apply、applymap详解
Pandas中的map(), apply()和applymap()的应用

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/261941
推荐阅读
相关标签
  

闽ICP备14008679号