当前位置:   article > 正文

Pandas-数据结构-Series(三):常用操作【数据查看(head)、排序(sort_values)、重新索引(reindex)、对齐(计算时根据标签自动对齐)、添加元素、修改元素、删除元素】_sort values 索引

sort values 索引

一、数据查看

  • .head()查看头部数据
  • .tail()查看尾部数据
  • 默认查看5条
import numpy as np
import pandas as pd

s = pd.Series(np.random.rand(50))

print("s.head() = \n", s.head())
print("-" * 100)
print("s.head(10) = \n", s.head(10))
print("-" * 100)
print("s.tail() = \n", s.tail())
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

打印结果:

s.head() = 
0    0.891778
1    0.575982
2    0.138742
3    0.101361
4    0.247216
dtype: float64
----------------------------------------------------------------------------------------------------
s.head(10) = 
0    0.891778
1    0.575982
2    0.138742
3    0.101361
4    0.247216
5    0.376180
6    0.117379
7    0.001082
8    0.769211
9    0.204997
dtype: float64
----------------------------------------------------------------------------------------------------
s.tail() = 
45    0.020636
46    0.062189
47    0.110146
48    0.958667
49    0.788788
dtype: float64

Process finished with exit code 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

二、排序

  • 使用series.sort_values(ascending=True)进行排序

series排序时,只有一列,不需要参数

data['p_change'].sort_values(ascending=True).head()

2015-09-01   -10.03
2015-09-14   -10.02
2016-01-11   -10.02
2015-07-15   -10.02
2015-08-26   -10.01
Name: p_change, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 使用series.sort_index()进行排序

与df一致

# 对索引进行排序
data['p_change'].sort_index().head()

2015-03-02    2.62
2015-03-03    1.44
2015-03-04    1.57
2015-03-05    2.02
2015-03-06    8.51
Name: p_change, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

三、重新索引

.reindex将会根据索引重新排序,如果当前索引不存在,则引入缺失值

  • .reindex()中也是写列表
  • 这里’d’索引不存在,所以值为NaN
  • fill_value参数:填充缺失值的值
import numpy as np
import pandas as pd

# 重新索引reindex
# .reindex将会根据索引重新排序,如果当前索引不存在,则引入缺失值

s = pd.Series(np.random.rand(3), index=['a', 'b', 'c'])
print("s = \n", s)
print("-" * 100)

# .reindex()中也是写列表
# 这里'd'索引不存在,所以值为NaN
s1 = s.reindex(['c', 'b', 'a', 'd'])
print("s1 = \n", s1)
print("-" * 100)

# fill_value参数:填充缺失值的值
s2 = s.reindex(['c', 'b', 'a', 'd'], fill_value=0)
print("s2 = \n", s2)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

打印结果:

s = 
a    0.496666
b    0.828771
c    0.363888
dtype: float64
----------------------------------------------------------------------------------------------------
s1 = 
c    0.363888
b    0.828771
a    0.496666
d         NaN
dtype: float64
----------------------------------------------------------------------------------------------------
s2 = 
c    0.363888
b    0.828771
a    0.496666
d    0.000000
dtype: float64

Process finished with exit code 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

四、Series对齐(操作会根据标签自动对齐)

Series 和 ndarray 之间的主要区别是,Series 上的操作会根据标签自动对齐

  • index顺序不会影响数值计算,以标签来计算
  • 空值和任何值计算结果扔为空值
import numpy as np
import pandas as pd

# Series对齐

s1 = pd.Series(np.random.rand(3), index = ['Jack','Marry','Tom'])
s2 = pd.Series(np.random.rand(3), index = ['Wang','Jack','Marry'])
print("s1 = \n", s1)
print("s2 = \n", s2)
print("-" * 100)
print("s1+s2 = \n", s1+s2)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

打印结果:

s1 = 
Jack     0.965087
Marry    0.088279
Tom      0.369567
dtype: float64
s2 = 
Wang     0.398997
Jack     0.082579
Marry    0.856640
dtype: float64
----------------------------------------------------------------------------------------------------
s1+s2 = 
Jack     1.047665
Marry    0.944919
Tom           NaN
Wang          NaN
dtype: float64

Process finished with exit code 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

五、添加元素/数组

直接通过下标索引/标签index添加值

  • 通过.append方法,直接添加一个数组
  • .append方法生成一个新的数组,不改变之前的数组
import numpy as np
import pandas as pd

# 添加

s1 = pd.Series(np.random.rand(5))
s2 = pd.Series(np.random.rand(5), index=list('ngjur'))
print("s1 = \n", s1)
print("s2 = \n", s2)
print("-" * 100)

# 直接通过下标索引/标签index添加值
s1[5] = 100
s2['a'] = 100
print("s1 = \n", s1)
print("s2 = \n", s2)
print("-" * 100)

s3 = s1.append(s2)
print("s1 = \n", s1)
print("s3 = \n", s3)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

打印结果:

s1 = 
0    0.418343
1    0.611628
2    0.793579
3    0.643884
4    0.062399
dtype: float64
s2 = 
 n    0.178642
g    0.360007
j    0.287545
u    0.016724
r    0.126153
dtype: float64
----------------------------------------------------------------------------------------------------
s1 = 
0      0.418343
1      0.611628
2      0.793579
3      0.643884
4      0.062399
5    100.000000
dtype: float64
s2 = 
n      0.178642
g      0.360007
j      0.287545
u      0.016724
r      0.126153
a    100.000000
dtype: float64
----------------------------------------------------------------------------------------------------
s1 = 
0      0.418343
1      0.611628
2      0.793579
3      0.643884
4      0.062399
5    100.000000
dtype: float64
s3 = 
0      0.418343
1      0.611628
2      0.793579
3      0.643884
4      0.062399
5    100.000000
n      0.178642
g      0.360007
j      0.287545
u      0.016724
r      0.126153
a    100.000000
dtype: float64

Process finished with exit code 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56

六、修改元素

通过索引直接修改,类似序列

import numpy as np
import pandas as pd

# 修改

s = pd.Series(np.random.rand(3), index=['a', 'b', 'c'])
print("s = \n", s)
s['a'] = 100
s[['b', 'c']] = 200
print("-" * 100)
print("s = \n", s)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

打印结果:

s = 
a    0.383475
b    0.123369
c    0.911300
dtype: float64
----------------------------------------------------------------------------------------------------
s = 
a    100.0
b    200.0
c    200.0
dtype: float64

Process finished with exit code 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

七、删除值

drop 删除元素之后返回新对象

import numpy as np
import pandas as pd

# 删除:.drop

s = pd.Series(np.random.rand(5), index=list('ngjur'))
print("s = \n", s)
print("-" * 100)
s1 = s.drop('n')
s2 = s.drop(['g', 'j'])
print("s1 = \n", s1)
print("-" * 50)
print("s2 = \n", s2)
print("-" * 50)
print("s = \n", s)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

打印结果

s = 
n    0.744795
g    0.345820
j    0.001573
u    0.275530
r    0.046669
dtype: float64
----------------------------------------------------------------------------------------------------
s1 = 
g    0.345820
j    0.001573
u    0.275530
r    0.046669
dtype: float64
--------------------------------------------------
s2 = 
n    0.744795
u    0.275530
r    0.046669
dtype: float64
--------------------------------------------------
s = 
n    0.744795
g    0.345820
j    0.001573
u    0.275530
r    0.046669
dtype: float64

Process finished with exit code 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/261869
推荐阅读
  

闽ICP备14008679号